The Amazon AWS blog has a good story about about Soocial.com. Soocial.com is an address book on steroid. It works with (or will work with soon): outlook, blackberry, gmail, mac address book and other phones. They just need to add Yahoo and Windows Contacts.
The AWS blog post has a good overview of the overall architecture. They are using Postgres and pgpool II on EC2 and EBS. They use RabbitMQ for messaging. For high availability they are using HAProxy and memcache.
The results are not complete yet but I wanted to post some summary data as I get it. Here are the answers to a few YES/NO questions on open source usage.
Does Your Company Have an Open Source Policy
Percentage
Respondents
Yes
26.09
84
No
57.45
185
Don't Know
16.46
53
Does your company ALLOW the use of open source software
Percentage
Respondents
Yes
94.08
302
No
2.80
9
Don't Know
3.12
10
Does your company ACTIVELY use open source software
It hasn't been 2 weeks yet but the database survey I posted has hit 250 responses. I'm shooting for 400-500 responses total. While peeking at the data, I noticed that MySQL has had a good response and plenty of respondents. I didn't see any DB2 responses (for primary database) but there were plenty of Oracle responses (not as many as MySQL though). It's too bad that there hasn't been much in the way of a Postgres response.
I plan to leave it up for another 2 1/2 weeks (for a total of 4 weeks). If you haven't taken it yet, please do so if you get a few minutes. It only takes 5-10 minutes as the survey only has 25 questions.
Also, if you can blog about it, post the link on forums (without spamming), or spread the word any other way, I would appreciate it.
Hi. I've created a 25 question survey on databases and open source. There are about 10 demographic type questions (who are you, where are you, etc) and then a bunch of questions on databases and open source.
The results of this survey, and by results I mean all of the raw data, will be released to the public at the completion of the survey.
I ask for an email ID so that I can send you the results. This is totally optional and you will still be able to get the results should you chose not enter one. At the end of the survey you will be taken to a web page. The data will be available from that page. Bookmark that page and you can check periodically for the data. I will post intermediate ongoing updates of the data.
EnterpriseDB announced the results of the survey they did a few months ago at OSCON. Now, take the results with a grain of salt as it was done by EnterpriseDB. EnterpriseDB is based on Postgres so there is a vested interest in making Postgres sound good. Results can be skewed depending on how the survey is worded, what options are available as answers and who the respondents are.
500 respondents. The download page says "500 corporate IT leaders". Or maybe, 500 open source developers. ;-)
Only 9% of respondents indicated that they preferred commercial solutions over open source solutions. I would guess that a majority of those responding were open source database people anyway. This is also one place where I think the wording of survey questions makes a difference. I'd like to see the survey again and compare the results to the survey itself.
The survey shows that respondents are using open source to migrate away from Oracle and SQL Server. It says that less than 1% is using open source to migrate away from DB2. Since DB2 is a major investor in EnterpriseDB, that doesn't surprise me. Again, the target users of the survey make a difference as well as the questions themselves.
Of course, Postgres was chosen more than any other open source database for transactional applications and high reliability. Again, not surprising based on who wrote the survey and what they sell.
Before I put very much value on this survey, I would want to see more than just a hand-crafted summary of the results. A spreadsheet of all the questions and the answers chosen would be, at least somewhat, valuable. Without that though, it's just marketing. I can't find anything on the site indicating the full results will be made available.
Well, I am no longer working for EnterpriseDB. It was fun while it lasted but it's over so I am moving on. I found a new job, locally. It's pure Oracle and I will get to use Real Application Clusters in a production environment. That's something I haven't done in the past so I am looking forward to it. It's also a java, .net and Oracle Forms shop and they are doing some interesting things with telecommunications and SMS.
From now on, I will only need to travel for conferences. No more trips to New Jersey. That's kind of a drag as I was almost at elite status on Continental. I have two more conferences this year and it just might be enough. I may even take a trip on my own just to get the miles. One of the trips is from Tampa to San Francisco, and back, so that will get me very close. The other is to Virginia and back.
I am a database geek and I will be keeping an eye on EDB just as I did in the past. I will probably post here at the EDB blog about as frequently as I have been (not often). Or, I may let this one die and just do any EDB posting on my postgres blog. That actually makes the most sense. I think I will cross post this one there and make this my last dedicated posting on this blog.
I also plan to keep up with Postgres, for personal knowledge, just as I do MySQL. I actually want to install GridSQL and see how it performs for a variety of different applications.
On the upside, I can now call EnterpriseDB Postgres Plus Advanced Server, PP AS, without marketing having fit. ;-) Heck, if I think of it as Advanced Server Software, I can call it the PP ASS. heh But I would never do that.
So, I can guess I can close out this blog now. Later.
The datastore in Postgres (which is what Advanced Server is based on) is quite a bit different than the way Oracle stores data. Gone are the usual logical storage components. Data blocks, extents and segments just don't exist in a Postgres database. Actually, segments do exist when a table gets bigger than 1GB but that's a story for a different entry.
Postgres stores data as files and directories. Tablespaces are directories and tables (and indexes) are files inside that directory. Postgres ships with two tablespaces installed by default: pg_default and pg_global. The tablespaces cannot be changed or dropped. pg_global contains objects that are visible across databases (i.e. clusterwide). If you don't have any other tablespaces defined, pg_default is where everything else goes.
If you look at a default installation (I'll be using Windows here but *nix is pretty much the same), Postgres creates a HOME pretty much like Oracle does.
In this directory structure, the "data" directory is the home for data. You don't have to put your data here. During install you can choose a completely different directory and you can create tablespaces to even further separate your data.
The data directory is composed of several other directories:
The base directory is the pg_default tablespace and the global directory is the pg_global tablespace. If we select the base directory, we see that there are more sub-directories beneath that one:
We're getting close to the data now. ;-)
Directories on your system may be named differently than on mine. The directory "1" contains, at least part of, the postgres data dictionary. Directory "11510" also contains part of the data dictionary. There are several system catalogs (which are defined as namespaces) in Postgres and these two directories contain data from those. You can edit the files in the directory with notepad. They are binary files but you can see some of the information. DON'T SAVE THEM!
The pgsql_tmp directory is a working directory for the database. It's usually empty unless a query is running.
If we look at one of these directories, we see that the files are named like the directories that contain them:
All objects in Postgres have a unique object ID. The directory name or file name MAY be the same as the object's ID. That's not always the case though. I haven't quite figured out what conditions make this true but it frequently is the case.
I can run a query, using some of the IDs above, to see what they might be tied to:
And finally, if I create a new tablespace, I can put it where I want it. If I run this command:
So, when I created the tablespace, I associated a logical name in Postgres with a physical directory in the file system. Postgres did not create the extradata directory but it did create the underlying "12695" directory which is where it actually stores data.
When I created a table in the tablespace, Postgres created a file, 16833. I can re-run my query against pg_class and see the table in the data dictionary:
Well, this has gotten quite a bit longer than I intended. I just wanted to explain that tables in Postgres are stored as files. Same for Indexes. Tablespaces are directories. There are some issues associated with this as well as some benefits. I'll write about those in the future.
According to an article at Computerworld, Yahoo is running a 2 PB (not GB, not TB, PB - Petabyte) database that processes 24 billion events a day. Let's put that in perspective. 24 billion events is 24,000 million events; 24,000,000,000 events. 1 petabyte is 1,000,000,000,000 bytes. Yahoo has two of those. Actually, I should be basing this on 1k which is 1024 but when you're dealing with petabytes, I don't think we need to be picky. We're talking really, really big.
Yahoo uses this database to analyze the browsing habits of it half a billion monthly visitors. How would you like to tune those queries? Do you think they allow ad-hoc access?
Get this:
And the data, all of it constantly accessed and all of it stored in a structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.
That means that it is not archived and is sitting in tables, ready to be queried.
By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150TB.
Even one TB is still a bug database. Today's 10TB database is last decade's 10GB database. I remember trying to get acceptable performance on a multi-gig database in the early 90s. That was painful. Today, I regularly have indexes bigger than that.
So the real questions are how did they do it and can just anyone do it? Don't rush out to create your own PB database with Postgres just yet.
According to the story, they used Postgres but modified it heavily. Yahoo purchased a company that wrote software to convert the postgres data store to a columnar format (think Vertica or Sybase IQ). That means they also had extensive engineering support to pull this off. They left the interface mostly alone though so that Postgres tools still work. Of course, the whole purpose of using Postgres was that it was a free SQL database. That means that they are accessing it via SQL.
The database is running on "less than 1000" PCs hosted at multiple data centers. Yahoo does not plan to sell or license the technology right now but I would be surprised if that doesn't come at some point. I wonder if they will release that code to the Postgres community? I wonder if the Postgres community would accept it if they did?
Basically, Sreenivas asked which functions in EnterpriseDB could be used to implement dbms_crypto, hex_to_raw, string_to_raw, etc. I believe he is using EnterpriseDB Postgres Plus Advanced Server which is the product that gives Oracle Compatibility. The short answer to his question is that right now, there are no compatibility functions for those. The long answer is that you can re-implement that functionality using native PG functionality.
If you look at Sreenivas's message you can see how his existing code works. I posted a simple example in response which I am reposting below. The PG docs suggest that you use PGP based encryption rather than what they call raw encryption. I think it depends on exactly what you're doing, personally. Anyway, raw encryption was closer to what Sreenivas was doing so that was what I based my example on.
I've used DBMS_CRYPTO in Oracle quite a bit but this is my first use of encrypt/decrypt in PG. If you have any suggestions for improving it, I'd like to hear them.
Hi Sreenivas,
I saw your post on the edb forum but and planned to write a blog entry on this topic.
The thing is that there isn't a one to one translation in EDB yet. The easiest thing is to rewrite your procedure and use built-ins that are available.
It is recommended that you use the PGP function in postgres as opposed to the raw encryption functions for better security. However, raw encryption more closely matches what you are trying to do. Below is an example of using raw encryption with AES.
You don't need to convert to hex as you'll be using bytea which is easily converted from and to a string. If you really need international support, check out the pg decode function (which is different from Oracle's decode). http://www.postgresql.org/docs/current/static/funct ions-string.html
Here is a very simple example that you can use to build your procedure:
declare
original_data text := 'I am going to be encrypted';
data bytea;
cipher bytea;
crypto_type text;
encrypted_data bytea;
final_string text;
begin
-- conversion to bytea, could use cast too
data := original_data;
--set cipher key
cipher := 'not a strong key, use pgp instead';
-- select type: aes or blowfish (bf)
crypto_type := 'aes';
-- encrypt the data
select encrypt(data, cipher, crypto_type) into encrypted_data;
dbms_output.put_line('Encrypted: ' || encrypted_data );
-- decrypt the data
select decrypt(encrypted_data, cipher, crypto_type) into final_string;
dbms_output.put_line('Decrypted: ' || final_string );
end;
Do you know which open source feature is the most important? Do you know which open source database rocks and which one sucks? Is MySQL better than Postgres? Is Ingres worth considering? How does Firebird compare? Have you used, or have you considered using, an open source database?
Take a survey. It's only 15 questions so it takes just a few minutes.
I'll post a link where you can get the results once they have been compiled and prepared.
BTW, this isn't my survey. I'm just passing on the link.
I was asked a question today by someone who emailed me. I like getting the occasional question as sometimes it either makes me learn new things or sharpens my focus on something I already know. I often don't have time to answer questions but I try as often as I can.
Today's question didn't teach me anything new but it was a fun 15 minute distraction. I like SQL and coming up with solutions.
Here's the question (names removed to protect the innocent):
Subject: Postgres query
Hello,
My name is xxxxxxx and I saw your link on the web and wonder if you have any example that can do the following view.
I have a table called agent_sales, in it I have an agent_link code and a Invoicesales records. I want to create a view that will sum(invoicesales) group by agent_link then create a percentage column of the summed(invoice sales to the overall grand total of the invoiced sales)
e.g.
agent link invoice sales
1 100
2 300
1 500
2 400
When I sum by agent link I get 1= 600 and 2 = 700
Then I want to calculate a percentage or 1 and 2 to the total so it would work out to be 1 = 600/1300 & 2 = 700/1600
Can you assist me?
Regards
I have had to do these kinds of queries often so it didn't take long to put it together. The first time I had to do one if these, though, it took me quite a while to get it right.
Oracle provides some analytical queries that make these queries extremely easy to write and very performant. Since this was postgres, I took a different approach.
create table agent_sales (
agent_link int,
invoice_sale int );
CREATE VIEW ag_sales AS
select agent_link, inv_sales, inv_sales / tot as overall_precentage
from
(select agent_link, sum(invoice_sale)::float as inv_sales
from agent_sales
group by agent_link) as link_sales,
(select sum(invoice_sale)::float as tot
from agent_sales) as total_sales
I come from an Oracle background so I am used to a few rules when it comes to DDL.
Don't perform DDL in a production database until you've done it in a test instance.
DDL is permanent unless you have flashback or want to restore.
That might sound like a limitation but I don't see it that way. You *SHOULD* test everything before running it in production. And flashback is much more powerful than simply having commit and rollback for DDL.
Today I was trying out some changes to my code (that runs outside of the database) and needed to do a quick test that would delete some rows.
I did a quick
create table xyz as select * from abc;
I ran a quick test on some prep data:
select func();
The func() hit an error and rolled back.
I fixed the error and reran the function. It worked so I ran my external program. It deleted the data and committed the changes.
I then tried to put the records back in the first table from the second table.
Oops. Second table no longer exists. The rollback from the stupid func() also rolled back my create table.
In Oracle, I would have several options to recover my data. For one, the create would automatically have committed. But in Oracle, I wouldn't have needed the table, I would just flashback and restore my data.
Fortunately, this was just a test bed database and the data isn't really critical. Had it been critical, I would have taken steps to be sure it was backed up. So, no big deal really.
And, yeah, I know Ingres and SQL Server (and probably Sybase) work like Postgres. Doesn't make it a good thing.
My next whine will be transaction control is stored procedures. But at least that will be coming from EnterpriseDB at some point.
I'm in Baltimore this weekend at the Postgres East 2008 conference. I'm giving a presentation tomorrow on SQL/XML for developers. Below is the presentation and below that is the code used in the presentation.
Code to follow along:
If you aren't using javascript, you can get download the presentation or the script from scribd.
Or, to make it easy, just grab this code:
This post isn't really PostgreSQL specific. It also happens with MySQL and Ingres (and any other 32 bit driver).
I am running Vista 64 on one of my machines. I installed Postgres 8.3 and the ODBC drivers. When I tried to setup an ODBC connection, the windows ODBC administrator program ODBCAD32.exe did not list Postgres. I noticed that it also did not list MySQL or Ingres even though I knew I had installed drivers for those databases also.
After a bit of research, I tracked down the issue. My ODBC drivers did in fact install. Microsoft, in all their wisdom, has 2 versions of odbcad32. A 32 bit and a 64 bit. They are both named odbcad32.exe.
It gets better. They put the 32 bit odbcad32.exe in the syswow64 directory. They put the 64 bit odbcad32.exe in the system32 directory. 32 bit apps will pick up the 32 bit registry setting and 64 bit will pick up the 64 bit registry setting. system32 comes before syswow64 in the system path so the 64bit software runs before the 32 bit software.
So, when I manually ran odbcad32.exe in the syswow64 directory, I was able to configure my connections and everything worked hunky dory. I know have a shortcut to syswow64\odbcad32.exe on my desktop for working with 32 bit databases.
I wonder how stable the system would be if syswow64 was in the path before system32? I don't plan to find out.
I just read this in InfoWeek: Database Survey Gives Oracle The Lead In All 13 CategoriesThe survey compared Oracle with DB2, MySQL, Informix Dynamic Server, PostgreSQL, Microsoft SQL Server, and Sybase Advanced Server Enterprise.
The significant parts of that are that Postgres was included (we don't see enough of that) and that Postgres performed exceptionally well.
Some important quotes:
In a few categories, the open source system, MySQL, trailed most of the five commercial systems with which it was compared. But high user satisfaction in several categories indicates thatSun Microsystems (NSDQ: JAVA) may have gotten its money's worth when it paid $1 billion for the database's parent company, MySQL AB, last month. MySQL was second only to Oracle in multiplatform support, an important factor in hosting Web applications. When it came to the all important "performance" category, it ranked higher than Microsoft SQL Server, Informix, and Sybase.
Is MySQL really supported on more platforms than Postgres?
In scalability, Oracle lead, followed by DB2, then PostgreSQL took over the commanding position of the open source systems, followed by SQL Server, Informix, Sybase, and MySQL.
Oracle lead the security category, with its ability to automatically store data in encrypted form and to maintain an Audit Vault of information drawn from the operating system, database, and any other auditing source. DB2 was number two followed in the third position by PostgreSQL, with its "robust security layer," said Andrews. Number four was SQL Server, followed by Sybase, MySQL and Informix.
Oracle was tops in atomicity, followed by DB2, PostgreSQL, SQL Server, Sybase, MySQL and Informix.
An increasingly important category for databases that are intended to serve Web applications is XML data handling. Again, Oracle and SQL Server lead the category, followed by DB2. But the open source systems, MySQL and PostgreSQL made up the sixth and seventh ranks, respectively.
When it comes to management tools that come with the database, Oracle lead, followed by SQL Server, DB2, MySQL, Sybase, and Informix. Bringing up the rear was PostgreSQL. "PostgreSQL, which has a large community of open source developers to create tools for it, does not provide them with the database," the report noted.
All in all, I think those are good numbers for Postgres. The really important thing is getting Postgres out there and making it visible. Articles like this go a long way towards that goal.
LewisC
postgresqloracle
What an amazing place! It is incredibly huge. I asked "How big?" when I was checking in and it has 2004 rooms, 265,000 square feet of meeting space, multiple pools (including some with waterfalls), at least one hot tub and a golf course. Oh yeah, and an alarm clock that doesn't work. That's my story and I'm sticking to it.
I'm here manning the EnterpriseDB booth with Derek Rodner and, my boss, Jim Mlodgenski. Jim is giving a presentation on Friday. We're giving away copies of my book. I'm doing the signing thing. It's amazingly fun.
I have gotten to talk to so many smart people. JBoss World is not one of the largest conferences but it is a very open source friendly group. I have been amazed at how many people tell me they are already running Postgres and many have been for years. These are production systems too, not just tryouts.
A big draw (besides a free book) has been EnterpriseDB's Oracle compatibility features. Many of these companies are using Oracle and wouldn't mind saving money as long as they don't have to rewrite their apps.
Orlando is just about 65 miles or so from my house so I just drove over. Jim and Derek flew down. We'll all be leaving tomorrow.
Set aside March 11 at 6pm for the first meeting of the Suncoast Postgres User Group. The SPUG serves Tampa, St Pete, Lakeland, Bradenton and the rest of central and western Florida.
You can get additional information at http://pugs.postgresql.org/spug
There is a new wiki comparing MySQL to PostgreSQL. Because it's a wiki, hopefully it can be kept updated so that it's current AND accurate.
The wiki is MySQL vs PostgreSQL.
Personally, I'd like to see this grow into a universal comparison site that the community could keep updated.
LewisC
Amazon's cloud computing is also called the Amazon Elastic Cloud Compute or Amazon EC2. EC2, when tied with Amazon's cloud storage (Amazon S3), gives you cheap and easy scalability for your applications. Your computing power moves from your own data center and hardware to the "cloud". You control the servers but they are virtual servers running somewhere else. You control access, you control what applications are running and who can use them. If you only need a trickle, you pay for a trickle. When you need to scale to huge proportions, you pay for what you need.
Amazon recently announced Amazon SimpleDB which utilizes Amazon EC2 and Amazon S3. With SimpleDB, you can store and query data (i.e. build data web services). I really wasn't all that impressed with SimpleDB when I first read about it as it is a very simple database. It's named well. ;-)
SimpleDB gives you named value sets of data. It is the most basic of data structures. You can assign anything to the "database" by giving it a name and a value. A name can have multiple values. And so on.
You can run EnterpriseDB on a tiny server to build and test your apps and then, in minutes, scale up to huge proportions when you go live. No worries about data centers or buying hardware.
No mention if the vanilla EnterpriseDB PostgreSQL package would be offered. I would think not as EnterpriseDB Advanced Server is more geared towards this kind of solution.
I'm not sure what kind of SLAs Amazon offers. It will be interesting to see what kind of guarantees can be offered on a service like this. OLTP processing requires very reliable computing, much more so than OLAP and reporting.
Hi all. I'm going to be using Postgres and EnterpriseDB a lot more in the near future. I was wondering if there is a Tampa area, or central Florida, Postgres user group or if there might be interest in starting one. If not, maybe a general database user group. I know the local Oracle user group is very popular (and it's a group I attend and enjoy) and I was hoping to have the same kind of thing for Postgres and EnterpriseDB.
I think a general database user group might be a good idea too. It could cover all of the databases and give people who don't normally work with a particular database some exposure to it. For example, I have never worked with SQL Server but wouldn't mind an occasional presentation on SQL Server topics.
Any interest in something like this? I don't know how many people in Central Florida read my blog and would be interested in meeting and discussing various databases.
Let me know if you are already aware of something like this or would be interested in attending if something was available.
Thanks,
LewisC
I've been thinking long and hard about how I want to give away a few copies of my book. I will be at the SOUG meeting Thursday, Jan 24th here in Tampa. If you're in the area, stop by. I'm going to have a drawing for a couple of copies that night.
That doesn't really help those people who aren't near by. So, I am going to ask 3 questions and the first two people to answer all three correctly will get a copy.
They won't be hard to answer but the answers might require a little work. I will ask the questions and explain how you will need to respond.
To get the questions, you need to follow me on twitter. You can find my twitter account in various places on the net, including in my blog at ITToolbox.com.. You will need to be watching at 9am Eastern time, Jan 12, 2008. Follow and be watching for my questions. I will ask the questions and then explain what to do next.
This contest is open to anyone. If you are outside the US and you win, we can discuss the best way to get you your book.
I hope this is a fun process. That's what I'm shooting for anyway.
Let me know what you think.
LewisC
Well, it's taken over a year but it has finally arrived. I started writing the book back in Aug 2006. I finished in late Jan 2007 and the technical editor finished his work in March. Now, In Jan 2008, EnterpriseDB: The Definitive Reference is available. I'm glad I didn't wait for the movie. ;-)
It's kind of ironic. Just this morning, I posted that I am working on my second book. My wife called me at work and told me two boxes of books had arrived. 2008 is turning out to be a good year. I'll have some additional news in the near future.
Anyway, the book looks good. I've already found a few typos and grammar errors just skimming around. I like the font and the pages look really crisp. All in all, I have to say I am very happy that the book is out.
Check out some photos:
I think I want to give a couple away. Maybe some kind of contest for the blog readers. Any ideas of something fair? I am willing to pay shipping for a couple but that would only be here in the US. Overseas shipping gets expensive.
LewisC