Sunday, July 27, 2008

Help Me Out - Take a survey

Hi. I've created a 25 question survey on databases and open source. There are about 10 demographic type questions (who are you, where are you, etc) and then a bunch of questions on databases and open source.

The results of this survey, and by results I mean all of the raw data, will be released to the public at the completion of the survey.

I ask for an email ID so that I can send you the results. This is totally optional and you will still be able to get the results should you chose not enter one. At the end of the survey you will be taken to a web page. The data will be available from that page. Bookmark that page and you can check periodically for the data. I will post intermediate ongoing updates of the data.

Click here to take the survey now. The survey was created with eSurveysPro.com, a free online survey software.

Thanks in advance!

LewisC

Technorati : , ,

Wednesday, July 23, 2008

Update on EDB Open Source Database Survey

Yesterday I posted about the results of the EnterpriseDB open source survey. In that post, I said:

I'd like to see the survey again and compare the results to the survey itself.

I discovered that, as of right now, the survey is still online.

Still no information as to when, or if, the entire survey results will be released. That's what I am most interested in.

LewisC

Technorati : , , ,

Tuesday, July 22, 2008

Results of EnterpriseDB Open Source Database Survey

EnterpriseDB announced the results of the survey they did a few months ago at OSCON. Now, take the results with a grain of salt as it was done by EnterpriseDB. EnterpriseDB is based on Postgres so there is a vested interest in making Postgres sound good. Results can be skewed depending on how the survey is worded, what options are available as answers and who the respondents are.

The results summary is available for free.

Some key facts:

500 respondents. The download page says "500 corporate IT leaders". Or maybe, 500 open source developers. ;-)

Only 9% of respondents indicated that they preferred commercial solutions over open source solutions. I would guess that a majority of those responding were open source database people anyway. This is also one place where I think the wording of survey questions makes a difference. I'd like to see the survey again and compare the results to the survey itself.

The survey shows that respondents are using open source to migrate away from Oracle and SQL Server. It says that less than 1% is using open source to migrate away from DB2. Since DB2 is a major investor in EnterpriseDB, that doesn't surprise me. Again, the target users of the survey make a difference as well as the questions themselves.

Of course, Postgres was chosen more than any other open source database for transactional applications and high reliability. Again, not surprising based on who wrote the survey and what they sell.

Before I put very much value on this survey, I would want to see more than just a hand-crafted summary of the results. A spreadsheet of all the questions and the answers chosen would be, at least somewhat, valuable. Without that though, it's just marketing. I can't find anything on the site indicating the full results will be made available.

Technorati : , , , , , ,

Saturday, July 19, 2008

Not working for EDB anymore

Well, I am no longer working for EnterpriseDB. It was fun while it lasted but it's over so I am moving on. I found a new job, locally. It's pure Oracle and I will get to use Real Application Clusters in a production environment. That's something I haven't done in the past so I am looking forward to it. It's also a java, .net and Oracle Forms shop and they are doing some interesting things with telecommunications and SMS.

From now on, I will only need to travel for conferences. No more trips to New Jersey. That's kind of a drag as I was almost at elite status on Continental. I have two more conferences this year and it just might be enough. I may even take a trip on my own just to get the miles. One of the trips is from Tampa to San Francisco, and back, so that will get me very close. The other is to Virginia and back.

I am a database geek and I will be keeping an eye on EDB just as I did in the past. I will probably post here at the EDB blog about as frequently as I have been (not often). Or, I may let this one die and just do any EDB posting on my postgres blog. That actually makes the most sense. I think I will cross post this one there and make this my last dedicated posting on this blog.

I also plan to keep up with Postgres, for personal knowledge, just as I do MySQL. I actually want to install GridSQL and see how it performs for a variety of different applications.

On the upside, I can now call EnterpriseDB Postgres Plus Advanced Server, PP AS, without marketing having fit. ;-) Heck, if I think of it as Advanced Server Software, I can call it the PP ASS. heh But I would never do that.

So, I can guess I can close out this blog now. Later.

LewisC

Technorati : , ,

Tuesday, June 3, 2008

EDB Tip #5: Tablespaces and Tables in Postgres

LewisC's An Expert's Guide To Oracle Technology

The datastore in Postgres (which is what Advanced Server is based on) is quite a bit different than the way Oracle stores data. Gone are the usual logical storage components. Data blocks, extents and segments just don't exist in a Postgres database. Actually, segments do exist when a table gets bigger than 1GB but that's a story for a different entry.

Postgres stores data as files and directories. Tablespaces are directories and tables (and indexes) are files inside that directory. Postgres ships with two tablespaces installed by default: pg_default and pg_global. The tablespaces cannot be changed or dropped. pg_global contains objects that are visible across databases (i.e. clusterwide). If you don't have any other tablespaces defined, pg_default is where everything else goes.

If you look at a default installation (I'll be using Windows here but *nix is pretty much the same), Postgres creates a HOME pretty much like Oracle does.

In this directory structure, the "data" directory is the home for data. You don't have to put your data here. During install you can choose a completely different directory and you can create tablespaces to even further separate your data.

The data directory is composed of several other directories:

The base directory is the pg_default tablespace and the global directory is the pg_global tablespace. If we select the base directory, we see that there are more sub-directories beneath that one:

We're getting close to the data now. ;-)

Directories on your system may be named differently than on mine. The directory "1" contains, at least part of, the postgres data dictionary. Directory "11510" also contains part of the data dictionary. There are several system catalogs (which are defined as namespaces) in Postgres and these two directories contain data from those. You can edit the files in the directory with notepad. They are binary files but you can see some of the information. DON'T SAVE THEM!

The pgsql_tmp directory is a working directory for the database. It's usually empty unless a query is running.

If we look at one of these directories, we see that the files are named like the directories that contain them:

All objects in Postgres have a unique object ID. The directory name or file name MAY be the same as the object's ID. That's not always the case though. I haven't quite figured out what conditions make this true but it frequently is the case.

I can run a query, using some of the IDs above, to see what they might be tied to:

And finally, if I create a new tablespace, I can put it where I want it. If I run this command:

  create tablespace sample_tblspce location 'c:/temp/extradata';

I will create a new tablespace in my C:\TEMP directory in a subdirectory called extradata. I can then create a table using that tablespace.

  create table abc1 (data date) tablespace sample_tblspce;

Now, I'll do a dir command in DOS:

So, when I created the tablespace, I associated a logical name in Postgres with a physical directory in the file system. Postgres did not create the extradata directory but it did create the underlying "12695" directory which is where it actually stores data.

When I created a table in the tablespace, Postgres created a file, 16833. I can re-run my query against pg_class and see the table in the data dictionary:

Well, this has gotten quite a bit longer than I intended. I just wanted to explain that tables in Postgres are stored as files. Same for Indexes. Tablespaces are directories. There are some issues associated with this as well as some benefits. I'll write about those in the future.

LewisC

Del.icio.us : , , , , , ,

Wednesday, May 28, 2008

World's Largest Database Runs on Postgres?

LewisC's An Expert's Guide To Oracle Technology

According to an article at Computerworld, Yahoo is running a 2 PB (not GB, not TB, PB - Petabyte) database that processes 24 billion events a day. Let's put that in perspective. 24 billion events is 24,000 million events; 24,000,000,000 events. 1 petabyte is 1,000,000,000,000 bytes. Yahoo has two of those. Actually, I should be basing this on 1k which is 1024 but when you're dealing with petabytes, I don't think we need to be picky. We're talking really, really big.

Yahoo uses this database to analyze the browsing habits of it half a billion monthly visitors. How would you like to tune those queries? Do you think they allow ad-hoc access?

Get this:

And the data, all of it constantly accessed and all of it stored in a structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.

That means that it is not archived and is sitting in tables, ready to be queried.

By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150TB.

Even one TB is still a bug database. Today's 10TB database is last decade's 10GB database. I remember trying to get acceptable performance on a multi-gig database in the early 90s. That was painful. Today, I regularly have indexes bigger than that.

So the real questions are how did they do it and can just anyone do it? Don't rush out to create your own PB database with Postgres just yet.

According to the story, they used Postgres but modified it heavily. Yahoo purchased a company that wrote software to convert the postgres data store to a columnar format (think Vertica or Sybase IQ). That means they also had extensive engineering support to pull this off. They left the interface mostly alone though so that Postgres tools still work. Of course, the whole purpose of using Postgres was that it was a free SQL database. That means that they are accessing it via SQL.

The database is running on "less than 1000" PCs hosted at multiple data centers. Yahoo does not plan to sell or license the technology right now but I would be surprised if that doesn't come at some point. I wonder if they will release that code to the Postgres community? I wonder if the Postgres community would accept it if they did?

LewisC

Del.icio.us : , , ,

Tuesday, May 27, 2008

Learn EDB: Basic Encryption

LewisC's An Expert's Guide To Oracle Technology

An ITToolbox user recently asked a question on the EnterpriseDB discussion group, Oracle equilant UTL_I18N.STRING_TO_RAW IN ENTERPRISEDB.

Basically, Sreenivas asked which functions in EnterpriseDB could be used to implement dbms_crypto, hex_to_raw, string_to_raw, etc. I believe he is using EnterpriseDB Postgres Plus Advanced Server which is the product that gives Oracle Compatibility. The short answer to his question is that right now, there are no compatibility functions for those. The long answer is that you can re-implement that functionality using native PG functionality.

If you look at Sreenivas's message you can see how his existing code works. I posted a simple example in response which I am reposting below. The PG docs suggest that you use PGP based encryption rather than what they call raw encryption. I think it depends on exactly what you're doing, personally. Anyway, raw encryption was closer to what Sreenivas was doing so that was what I based my example on.

I've used DBMS_CRYPTO in Oracle quite a bit but this is my first use of encrypt/decrypt in PG. If you have any suggestions for improving it, I'd like to hear them.

Hi Sreenivas,

I saw your post on the edb forum but and planned to write a blog entry on this topic.

The thing is that there isn't a one to one translation in EDB yet. The easiest thing is to rewrite your procedure and use built-ins that are available.

It is recommended that you use the PGP function in postgres as opposed to the raw encryption functions for better security. However, raw encryption more closely matches what you are trying to do. Below is an example of using raw encryption with AES.
You don't need to convert to hex as you'll be using bytea which is easily converted from and to a string. If you really need international support, check out the pg decode function (which is different from Oracle's decode). http://www.postgresql.org/docs/current/static/funct ions-string.html

Here is a very simple example that you can use to build your procedure:

declare

  original_data text := 'I am going to be encrypted';
  data bytea;
  cipher bytea;
  crypto_type text;
  encrypted_data bytea;
  final_string text;

begin

  -- conversion to bytea, could use cast too
  data := original_data;

  --set cipher key
  cipher := 'not a strong key, use pgp instead';

  -- select type: aes or blowfish (bf)
  crypto_type := 'aes';

  -- encrypt the data
  select encrypt(data, cipher, crypto_type) into encrypted_data;

  dbms_output.put_line('Encrypted: ' || encrypted_data );

  -- decrypt the data 
  select decrypt(encrypted_data, cipher, crypto_type) into final_string;
  
  dbms_output.put_line('Decrypted: ' || final_string );

end;

Hope this helps,

LewisC

Technorati : , , ,