Wednesday, May 28, 2008

World's Largest Database Runs on Postgres?

LewisC's An Expert's Guide To Oracle Technology

According to an article at Computerworld, Yahoo is running a 2 PB (not GB, not TB, PB - Petabyte) database that processes 24 billion events a day. Let's put that in perspective. 24 billion events is 24,000 million events; 24,000,000,000 events. 1 petabyte is 1,000,000,000,000 bytes. Yahoo has two of those. Actually, I should be basing this on 1k which is 1024 but when you're dealing with petabytes, I don't think we need to be picky. We're talking really, really big.

Yahoo uses this database to analyze the browsing habits of it half a billion monthly visitors. How would you like to tune those queries? Do you think they allow ad-hoc access?

Get this:

And the data, all of it constantly accessed and all of it stored in a structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.

That means that it is not archived and is sitting in tables, ready to be queried.

By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150TB.

Even one TB is still a bug database. Today's 10TB database is last decade's 10GB database. I remember trying to get acceptable performance on a multi-gig database in the early 90s. That was painful. Today, I regularly have indexes bigger than that.

So the real questions are how did they do it and can just anyone do it? Don't rush out to create your own PB database with Postgres just yet.

According to the story, they used Postgres but modified it heavily. Yahoo purchased a company that wrote software to convert the postgres data store to a columnar format (think Vertica or Sybase IQ). That means they also had extensive engineering support to pull this off. They left the interface mostly alone though so that Postgres tools still work. Of course, the whole purpose of using Postgres was that it was a free SQL database. That means that they are accessing it via SQL.

The database is running on "less than 1000" PCs hosted at multiple data centers. Yahoo does not plan to sell or license the technology right now but I would be surprised if that doesn't come at some point. I wonder if they will release that code to the Postgres community? I wonder if the Postgres community would accept it if they did?

LewisC

Del.icio.us : , , ,

Tuesday, May 27, 2008

Learn EDB: Basic Encryption

LewisC's An Expert's Guide To Oracle Technology

An ITToolbox user recently asked a question on the EnterpriseDB discussion group, Oracle equilant UTL_I18N.STRING_TO_RAW IN ENTERPRISEDB.

Basically, Sreenivas asked which functions in EnterpriseDB could be used to implement dbms_crypto, hex_to_raw, string_to_raw, etc. I believe he is using EnterpriseDB Postgres Plus Advanced Server which is the product that gives Oracle Compatibility. The short answer to his question is that right now, there are no compatibility functions for those. The long answer is that you can re-implement that functionality using native PG functionality.

If you look at Sreenivas's message you can see how his existing code works. I posted a simple example in response which I am reposting below. The PG docs suggest that you use PGP based encryption rather than what they call raw encryption. I think it depends on exactly what you're doing, personally. Anyway, raw encryption was closer to what Sreenivas was doing so that was what I based my example on.

I've used DBMS_CRYPTO in Oracle quite a bit but this is my first use of encrypt/decrypt in PG. If you have any suggestions for improving it, I'd like to hear them.

Hi Sreenivas,

I saw your post on the edb forum but and planned to write a blog entry on this topic.

The thing is that there isn't a one to one translation in EDB yet. The easiest thing is to rewrite your procedure and use built-ins that are available.

It is recommended that you use the PGP function in postgres as opposed to the raw encryption functions for better security. However, raw encryption more closely matches what you are trying to do. Below is an example of using raw encryption with AES.
You don't need to convert to hex as you'll be using bytea which is easily converted from and to a string. If you really need international support, check out the pg decode function (which is different from Oracle's decode). http://www.postgresql.org/docs/current/static/funct ions-string.html

Here is a very simple example that you can use to build your procedure:

declare

  original_data text := 'I am going to be encrypted';
  data bytea;
  cipher bytea;
  crypto_type text;
  encrypted_data bytea;
  final_string text;

begin

  -- conversion to bytea, could use cast too
  data := original_data;

  --set cipher key
  cipher := 'not a strong key, use pgp instead';

  -- select type: aes or blowfish (bf)
  crypto_type := 'aes';

  -- encrypt the data
  select encrypt(data, cipher, crypto_type) into encrypted_data;

  dbms_output.put_line('Encrypted: ' || encrypted_data );

  -- decrypt the data 
  select decrypt(encrypted_data, cipher, crypto_type) into final_string;
  
  dbms_output.put_line('Decrypted: ' || final_string );

end;

Hope this helps,

LewisC

Technorati : , , ,

Thursday, May 15, 2008

Take an Open Source Database Survey

LewisC's An Expert's Guide To Oracle Technology

Do you know which open source feature is the most important? Do you know which open source database rocks and which one sucks? Is MySQL better than Postgres? Is Ingres worth considering? How does Firebird compare? Have you used, or have you considered using, an open source database?

Take a survey. It's only 15 questions so it takes just a few minutes.

I'll post a link where you can get the results once they have been compiled and prepared.

BTW, this isn't my survey. I'm just passing on the link.

LewisC