Saturday, July 19, 2008

Not working for EDB anymore

Well, I am no longer working for EnterpriseDB. It was fun while it lasted but it's over so I am moving on. I found a new job, locally. It's pure Oracle and I will get to use Real Application Clusters in a production environment. That's something I haven't done in the past so I am looking forward to it. It's also a java, .net and Oracle Forms shop and they are doing some interesting things with telecommunications and SMS.

From now on, I will only need to travel for conferences. No more trips to New Jersey. That's kind of a drag as I was almost at elite status on Continental. I have two more conferences this year and it just might be enough. I may even take a trip on my own just to get the miles. One of the trips is from Tampa to San Francisco, and back, so that will get me very close. The other is to Virginia and back.

I am a database geek and I will be keeping an eye on EDB just as I did in the past. I will probably post here at the EDB blog about as frequently as I have been (not often). Or, I may let this one die and just do any EDB posting on my postgres blog. That actually makes the most sense. I think I will cross post this one there and make this my last dedicated posting on this blog.

I also plan to keep up with Postgres, for personal knowledge, just as I do MySQL. I actually want to install GridSQL and see how it performs for a variety of different applications.

On the upside, I can now call EnterpriseDB Postgres Plus Advanced Server, PP AS, without marketing having fit. ;-) Heck, if I think of it as Advanced Server Software, I can call it the PP ASS. heh But I would never do that.

So, I can guess I can close out this blog now. Later.

LewisC

Technorati : , ,

Tuesday, June 3, 2008

EDB Tip #5: Tablespaces and Tables in Postgres

LewisC's An Expert's Guide To Oracle Technology

The datastore in Postgres (which is what Advanced Server is based on) is quite a bit different than the way Oracle stores data. Gone are the usual logical storage components. Data blocks, extents and segments just don't exist in a Postgres database. Actually, segments do exist when a table gets bigger than 1GB but that's a story for a different entry.

Postgres stores data as files and directories. Tablespaces are directories and tables (and indexes) are files inside that directory. Postgres ships with two tablespaces installed by default: pg_default and pg_global. The tablespaces cannot be changed or dropped. pg_global contains objects that are visible across databases (i.e. clusterwide). If you don't have any other tablespaces defined, pg_default is where everything else goes.

If you look at a default installation (I'll be using Windows here but *nix is pretty much the same), Postgres creates a HOME pretty much like Oracle does.

In this directory structure, the "data" directory is the home for data. You don't have to put your data here. During install you can choose a completely different directory and you can create tablespaces to even further separate your data.

The data directory is composed of several other directories:

The base directory is the pg_default tablespace and the global directory is the pg_global tablespace. If we select the base directory, we see that there are more sub-directories beneath that one:

We're getting close to the data now. ;-)

Directories on your system may be named differently than on mine. The directory "1" contains, at least part of, the postgres data dictionary. Directory "11510" also contains part of the data dictionary. There are several system catalogs (which are defined as namespaces) in Postgres and these two directories contain data from those. You can edit the files in the directory with notepad. They are binary files but you can see some of the information. DON'T SAVE THEM!

The pgsql_tmp directory is a working directory for the database. It's usually empty unless a query is running.

If we look at one of these directories, we see that the files are named like the directories that contain them:

All objects in Postgres have a unique object ID. The directory name or file name MAY be the same as the object's ID. That's not always the case though. I haven't quite figured out what conditions make this true but it frequently is the case.

I can run a query, using some of the IDs above, to see what they might be tied to:

And finally, if I create a new tablespace, I can put it where I want it. If I run this command:

  create tablespace sample_tblspce location 'c:/temp/extradata';

I will create a new tablespace in my C:\TEMP directory in a subdirectory called extradata. I can then create a table using that tablespace.

  create table abc1 (data date) tablespace sample_tblspce;

Now, I'll do a dir command in DOS:

So, when I created the tablespace, I associated a logical name in Postgres with a physical directory in the file system. Postgres did not create the extradata directory but it did create the underlying "12695" directory which is where it actually stores data.

When I created a table in the tablespace, Postgres created a file, 16833. I can re-run my query against pg_class and see the table in the data dictionary:

Well, this has gotten quite a bit longer than I intended. I just wanted to explain that tables in Postgres are stored as files. Same for Indexes. Tablespaces are directories. There are some issues associated with this as well as some benefits. I'll write about those in the future.

LewisC

Del.icio.us : , , , , , ,

Wednesday, May 28, 2008

World's Largest Database Runs on Postgres?

LewisC's An Expert's Guide To Oracle Technology

According to an article at Computerworld, Yahoo is running a 2 PB (not GB, not TB, PB - Petabyte) database that processes 24 billion events a day. Let's put that in perspective. 24 billion events is 24,000 million events; 24,000,000,000 events. 1 petabyte is 1,000,000,000,000 bytes. Yahoo has two of those. Actually, I should be basing this on 1k which is 1024 but when you're dealing with petabytes, I don't think we need to be picky. We're talking really, really big.

Yahoo uses this database to analyze the browsing habits of it half a billion monthly visitors. How would you like to tune those queries? Do you think they allow ad-hoc access?

Get this:

And the data, all of it constantly accessed and all of it stored in a structured, ready-to-crunch form, is expected to grow into the multiple tens of petabytes by next year.

That means that it is not archived and is sitting in tables, ready to be queried.

By comparison, large enterprise databases typically grow no larger than the tens of terabytes. Large databases about which much is publicly known include the Internal Revenue Service's data warehouse, which weighs in at a svelte 150TB.

Even one TB is still a bug database. Today's 10TB database is last decade's 10GB database. I remember trying to get acceptable performance on a multi-gig database in the early 90s. That was painful. Today, I regularly have indexes bigger than that.

So the real questions are how did they do it and can just anyone do it? Don't rush out to create your own PB database with Postgres just yet.

According to the story, they used Postgres but modified it heavily. Yahoo purchased a company that wrote software to convert the postgres data store to a columnar format (think Vertica or Sybase IQ). That means they also had extensive engineering support to pull this off. They left the interface mostly alone though so that Postgres tools still work. Of course, the whole purpose of using Postgres was that it was a free SQL database. That means that they are accessing it via SQL.

The database is running on "less than 1000" PCs hosted at multiple data centers. Yahoo does not plan to sell or license the technology right now but I would be surprised if that doesn't come at some point. I wonder if they will release that code to the Postgres community? I wonder if the Postgres community would accept it if they did?

LewisC

Del.icio.us : , , ,

Tuesday, May 27, 2008

Learn EDB: Basic Encryption

LewisC's An Expert's Guide To Oracle Technology

An ITToolbox user recently asked a question on the EnterpriseDB discussion group, Oracle equilant UTL_I18N.STRING_TO_RAW IN ENTERPRISEDB.

Basically, Sreenivas asked which functions in EnterpriseDB could be used to implement dbms_crypto, hex_to_raw, string_to_raw, etc. I believe he is using EnterpriseDB Postgres Plus Advanced Server which is the product that gives Oracle Compatibility. The short answer to his question is that right now, there are no compatibility functions for those. The long answer is that you can re-implement that functionality using native PG functionality.

If you look at Sreenivas's message you can see how his existing code works. I posted a simple example in response which I am reposting below. The PG docs suggest that you use PGP based encryption rather than what they call raw encryption. I think it depends on exactly what you're doing, personally. Anyway, raw encryption was closer to what Sreenivas was doing so that was what I based my example on.

I've used DBMS_CRYPTO in Oracle quite a bit but this is my first use of encrypt/decrypt in PG. If you have any suggestions for improving it, I'd like to hear them.

Hi Sreenivas,

I saw your post on the edb forum but and planned to write a blog entry on this topic.

The thing is that there isn't a one to one translation in EDB yet. The easiest thing is to rewrite your procedure and use built-ins that are available.

It is recommended that you use the PGP function in postgres as opposed to the raw encryption functions for better security. However, raw encryption more closely matches what you are trying to do. Below is an example of using raw encryption with AES.
You don't need to convert to hex as you'll be using bytea which is easily converted from and to a string. If you really need international support, check out the pg decode function (which is different from Oracle's decode). http://www.postgresql.org/docs/current/static/funct ions-string.html

Here is a very simple example that you can use to build your procedure:

declare

  original_data text := 'I am going to be encrypted';
  data bytea;
  cipher bytea;
  crypto_type text;
  encrypted_data bytea;
  final_string text;

begin

  -- conversion to bytea, could use cast too
  data := original_data;

  --set cipher key
  cipher := 'not a strong key, use pgp instead';

  -- select type: aes or blowfish (bf)
  crypto_type := 'aes';

  -- encrypt the data
  select encrypt(data, cipher, crypto_type) into encrypted_data;

  dbms_output.put_line('Encrypted: ' || encrypted_data );

  -- decrypt the data 
  select decrypt(encrypted_data, cipher, crypto_type) into final_string;
  
  dbms_output.put_line('Decrypted: ' || final_string );

end;

Hope this helps,

LewisC

Technorati : , , ,

Thursday, May 15, 2008

Take an Open Source Database Survey

LewisC's An Expert's Guide To Oracle Technology

Do you know which open source feature is the most important? Do you know which open source database rocks and which one sucks? Is MySQL better than Postgres? Is Ingres worth considering? How does Firebird compare? Have you used, or have you considered using, an open source database?

Take a survey. It's only 15 questions so it takes just a few minutes.

I'll post a link where you can get the results once they have been compiled and prepared.

BTW, this isn't my survey. I'm just passing on the link.

LewisC

Thursday, April 24, 2008

Overall Total in a Query

I was asked a question today by someone who emailed me. I like getting the occasional question as sometimes it either makes me learn new things or sharpens my focus on something I already know. I often don't have time to answer questions but I try as often as I can.

Today's question didn't teach me anything new but it was a fun 15 minute distraction. I like SQL and coming up with solutions.

Here's the question (names removed to protect the innocent):

Subject: Postgres query

Hello,

My name is xxxxxxx and I saw your link on the web and wonder if you have any example that can do the following view.

I have a table called agent_sales, in it I have an agent_link code and a Invoicesales records. I want to create a view that will sum(invoicesales) group by agent_link then create a percentage column of the summed(invoice sales to the overall grand total of the invoiced sales)

e.g.

agent link     invoice sales
1 100
2 300
1 500
2 400


When I sum by agent link I get 1= 600 and 2 = 700

Then I want to calculate a percentage or 1 and 2 to the total so it would work out to be 1 = 600/1300 & 2 = 700/1600

Can you assist me?

Regards

I have had to do these kinds of queries often so it didn't take long to put it together. The first time I had to do one if these, though, it took me quite a while to get it right.

Oracle provides some analytical queries that make these queries extremely easy to write and very performant. Since this was postgres, I took a different approach.

create table agent_sales (
agent_link int,
invoice_sale int );
insert into agent_sales
(agent_link, invoice_sale)
values
(1, 100), (2,300), (1, 500), (2, 400);
CREATE VIEW ag_sales AS
select agent_link, inv_sales, inv_sales / tot as overall_precentage
from
(select agent_link, sum(invoice_sale)::float as inv_sales
from agent_sales
group by agent_link) as link_sales,
(select sum(invoice_sale)::float as tot
from agent_sales) as total_sales


Output:

SQL> select * from ag_sales;

AGENT_LINK INV_SALES OVERALL_PRECENTAGE
----------- ------------------------ ------------------------
2 700.0 0.5384615384615384
1 600.0 0.46153846153846156

SQL>


Obviously, there are plenty of ways to answer most any question but this is the way I answered this one. How would you have done it?

Wednesday, April 23, 2008

Postgres DDL Transaction Control Sucks

I come from an Oracle background so I am used to a few rules when it comes to DDL.

  • Don't perform DDL in a production database until you've done it in a test instance.
  • DDL is permanent unless you have flashback or want to restore.
That might sound like a limitation but I don't see it that way. You *SHOULD* test everything before running it in production. And flashback is much more powerful than simply having commit and rollback for DDL. Today I was trying out some changes to my code (that runs outside of the database) and needed to do a quick test that would delete some rows. I did a quick create table xyz as select * from abc; I ran a quick test on some prep data: select func(); The func() hit an error and rolled back. I fixed the error and reran the function. It worked so I ran my external program. It deleted the data and committed the changes. I then tried to put the records back in the first table from the second table. Oops. Second table no longer exists. The rollback from the stupid func() also rolled back my create table.

In Oracle, I would have several options to recover my data. For one, the create would automatically have committed. But in Oracle, I wouldn't have needed the table, I would just flashback and restore my data.

Fortunately, this was just a test bed database and the data isn't really critical. Had it been critical, I would have taken steps to be sure it was backed up. So, no big deal really.

And, yeah, I know Ingres and SQL Server (and probably Sybase) work like Postgres. Doesn't make it a good thing.

My next whine will be transaction control is stored procedures. But at least that will be coming from EnterpriseDB at some point.