Friday, August 24, 2007

Create a Partitioned Table In PostgreSQL

First, we create the table structure that will act as the master table.


CREATE TABLE sales_range 
(salesman_id  integer, 
salesman_name text, 
sales_amount  integer, 
sales_date    timestamp);
Then we create each of the partitioned tables using OO style syntax. Each partitioned table "inherits" the structure of the parent table. There is a string relation between parent and child. You cannot drop the parent without dropping the child.

CREATE TABLE sales_jan2000
  ( CHECK (sales_date < TO_DATE('02/01/2000','MM/DD/YYYY')) )
  INHERITS (sales_range);

CREATE TABLE sales_feb2000
  ( CHECK (sales_date < TO_DATE('03/01/2000','MM/DD/YYYY')) )
  INHERITS (sales_range);

CREATE TABLE sales_mar2000
  ( CHECK (sales_date < TO_DATE('04/01/2000','MM/DD/YYYY')) )
  INHERITS (sales_range);

CREATE TABLE sales_apr2000
  ( CHECK (sales_date < TO_DATE('05/01/2000','MM/DD/YYYY')) )
  INHERITS (sales_range);
At this point, any inserts against the parent table (sales_range) would be inserted into sales_range. To actually use the new partitions, we need to write either a PostgreSQL Rule or a trigger. I like triggers personally. I'll create the trigger function. I'll have a separate IF for each type of operation (tg_op) and an IF for each partition within that operation. You can simplify this with dynamic SQL but I will save that for a future posting.

CREATE FUNCTION sales_range_handler() 
  RETURNS TRIGGER AS $$
DECLARE
BEGIN
  IF tg_op = 'INSERT' THEN
    IF new.sales_date < cast('2000-02-01' as timestamp)
    THEN
      INSERT INTO sales_jan2000
         (salesman_id, salesman_name, sales_amount, sales_date)
        VALUES (new.salesman_id, new.salesman_name, 
                new.sales_amount, new.sales_date);
    ELSIF new.sales_date < cast('2000-03-01' as timestamp)
    THEN
      INSERT INTO sales_feb2000
         (salesman_id, salesman_name, sales_amount, sales_date)
        VALUES (new.salesman_id, new.salesman_name, 
                new.sales_amount, new.sales_date);
    ELSIF new.sales_date < cast('2000-04-01' as timestamp)
    THEN
      INSERT INTO sales_mar2000
         (salesman_id, salesman_name, sales_amount, sales_date)
        VALUES (new.salesman_id, new.salesman_name, 
                new.sales_amount, new.sales_date);
    ELSIF new.sales_date < cast('2000-05-01' as timestamp)
    THEN
      INSERT INTO sales_apr2000
         (salesman_id, salesman_name, sales_amount, sales_date)
        VALUES (new.salesman_id, new.salesman_name, 
                new.sales_amount, new.sales_date);
    END IF;
  ELSIF tg_op = 'UPDATE' THEN
    -- Do the same for update
    NULL;
  ELSIF tg_op = 'DELETE' THEN
     -- Do the same for delete
    NULL;
  END IF;
  RETURN NULL;
END;
$$   LANGUAGE plpgsql;
Then I create the actual trigger:
CREATE TRIGGER sales_range_handler_trg 
  BEFORE INSERT OR UPDATE OR DELETE 
  ON sales_range
  FOR EACH ROW 
  EXECUTE PROCEDURE sales_range_handler();
Finally, I do an INSERT:
INSERT INTO sales_range (salesman_id, salesman_name, sales_amount, sales_date)
  VALUES (1, 'Lewis', 10.25, cast('2000-03-15' as timestamp) );
If you select from the main table, you will see one record. That record is actually being selected from the MAR2000 table. You can select directly from that table and see that the record is there. To optimize your partitioned queries, check out Constraint Exclusion. LewisC