Tuesday, June 3, 2008

EDB Tip #5: Tablespaces and Tables in Postgres

LewisC's An Expert's Guide To Oracle Technology

The datastore in Postgres (which is what Advanced Server is based on) is quite a bit different than the way Oracle stores data. Gone are the usual logical storage components. Data blocks, extents and segments just don't exist in a Postgres database. Actually, segments do exist when a table gets bigger than 1GB but that's a story for a different entry.

Postgres stores data as files and directories. Tablespaces are directories and tables (and indexes) are files inside that directory. Postgres ships with two tablespaces installed by default: pg_default and pg_global. The tablespaces cannot be changed or dropped. pg_global contains objects that are visible across databases (i.e. clusterwide). If you don't have any other tablespaces defined, pg_default is where everything else goes.

If you look at a default installation (I'll be using Windows here but *nix is pretty much the same), Postgres creates a HOME pretty much like Oracle does.

In this directory structure, the "data" directory is the home for data. You don't have to put your data here. During install you can choose a completely different directory and you can create tablespaces to even further separate your data.

The data directory is composed of several other directories:

The base directory is the pg_default tablespace and the global directory is the pg_global tablespace. If we select the base directory, we see that there are more sub-directories beneath that one:

We're getting close to the data now. ;-)

Directories on your system may be named differently than on mine. The directory "1" contains, at least part of, the postgres data dictionary. Directory "11510" also contains part of the data dictionary. There are several system catalogs (which are defined as namespaces) in Postgres and these two directories contain data from those. You can edit the files in the directory with notepad. They are binary files but you can see some of the information. DON'T SAVE THEM!

The pgsql_tmp directory is a working directory for the database. It's usually empty unless a query is running.

If we look at one of these directories, we see that the files are named like the directories that contain them:

All objects in Postgres have a unique object ID. The directory name or file name MAY be the same as the object's ID. That's not always the case though. I haven't quite figured out what conditions make this true but it frequently is the case.

I can run a query, using some of the IDs above, to see what they might be tied to:

And finally, if I create a new tablespace, I can put it where I want it. If I run this command:

  create tablespace sample_tblspce location 'c:/temp/extradata';

I will create a new tablespace in my C:\TEMP directory in a subdirectory called extradata. I can then create a table using that tablespace.

  create table abc1 (data date) tablespace sample_tblspce;

Now, I'll do a dir command in DOS:

So, when I created the tablespace, I associated a logical name in Postgres with a physical directory in the file system. Postgres did not create the extradata directory but it did create the underlying "12695" directory which is where it actually stores data.

When I created a table in the tablespace, Postgres created a file, 16833. I can re-run my query against pg_class and see the table in the data dictionary:

Well, this has gotten quite a bit longer than I intended. I just wanted to explain that tables in Postgres are stored as files. Same for Indexes. Tablespaces are directories. There are some issues associated with this as well as some benefits. I'll write about those in the future.


Del.icio.us : , , , , , ,