Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Wednesday, May 28, 2008

World's Largest Database Runs on Postgres?

LewisC's An Expert's Guide To Oracle Technology

According to an article at Computerworld, Yahoo is running a 2 PB (not GB, not TB, PB - Petabyte) database that processes 24 billion events a day. Let's put that in perspective. 24 billion events is 24,000 million events; 24,000,000,000 events. 1 petabyte is 1,000,000,000,000 bytes. Yahoo has two of those. Actually, I should be basing this on 1k which is 1024 but when you're dealing with petabytes, I don't think we need to be picky. We're talking really, really big.

Yahoo uses this database to analyze the browsing habits of it half a billion monthly visitors. How would you like to tune those queries? Do you think they allow ad-hoc access?

Get this:

And the data, all of it constantly accessed and all of it stored in a structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.

That means that it is not archived and is sitting in tables, ready to be queried.

By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150TB.

Even one TB is still a bug database. Today's 10TB database is last decade's 10GB database. I remember trying to get acceptable performance on a multi-gig database in the early 90s. That was painful. Today, I regularly have indexes bigger than that.

So the real questions are how did they do it and can just anyone do it? Don't rush out to create your own PB database with Postgres just yet.

According to the story, they used Postgres but modified it heavily. Yahoo purchased a company that wrote software to convert the postgres data store to a columnar format (think Vertica or Sybase IQ). That means they also had extensive engineering support to pull this off. They left the interface mostly alone though so that Postgres tools still work. Of course, the whole purpose of using Postgres was that it was a free SQL database. That means that they are accessing it via SQL.

The database is running on "less than 1000" PCs hosted at multiple data centers. Yahoo does not plan to sell or license the technology right now but I would be surprised if that doesn't come at some point. I wonder if they will release that code to the Postgres community? I wonder if the Postgres community would accept it if they did?

LewisC

Del.icio.us : , , ,

Tuesday, February 5, 2008

Postgres 8.3 is out

Postgres 8.3 is out and it contains plenty of new and improved features. Some of my favorites are: sql/xml support, text search, autovacuum improvements, performance improvements (significant) and some additional SQL changes. You can read the press release. You can also check out the feature list or review the simpler feature matrix which compares all of the versions since 7.4. You may also want to read the release notes. You can read about the release in the press by following along with the discussion at postgresql.org.