Creating Partitioned Tables in Greenplum

A table can only be partitioned at creation time using the CREATE TABLE command.

The first step in partitioning a table is to decide on the partition design (date range, numeric range, or list of values) and choose the column(s) on which to partition the table. Decide how many levels of partitions you want. For example, you may want to date range partition a table by month and then further subpartition the monthly partitions by sales region. This section shows examples of SQL syntax for creating a table with various partition designs.

1. Defining Date Range Table Partitions
2. Defining Numeric Range Table Partitions
3. Defining List Table Partitions
4. Defining Multi-level Partitions
5. Partitioning an Existing Table


Defining Date Range Table Partitions

A date range partitioned table uses a single date or timestamp column as the partition key column. You can use the same partition key column to further subpartition a table if necessary (for example, to partition by month and then subpartition by day). When date partitioning a table, consider partitioning by the most granular level you are interested in. For example, partition by day and have 365 daily partitions, rather than partition by year then subpartition by month then subpartition by day. A multi-level design can reduce query planning time, but a flat partition design will execute faster at query run time.

You can have Greenplum Database automatically generate partitions by giving a START value, an END value, and an EVERY clause that defines the partition increment value. By default, START values are always inclusive and END values are always exclusive. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE
EVERY (INTERVAL '1 day') );

You can also declare and name each partition individually. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))
DISTRIBUTED BY (id)
PARTITION BY RANGE (date)
( PARTITION Jan08 START (date '2008-01-01') INCLUSIVE ,
PARTITION Feb08 START (date '2008-02-01') INCLUSIVE ,
PARTITION Mar08 START (date '2008-03-01') INCLUSIVE ,
PARTITION Apr08 START (date '2008-04-01') INCLUSIVE ,
PARTITION May08 START (date '2008-05-01') INCLUSIVE ,
PARTITION Jun08 START (date '2008-06-01') INCLUSIVE ,
PARTITION Jul08 START (date '2008-07-01') INCLUSIVE ,
PARTITION Aug08 START (date '2008-08-01') INCLUSIVE ,
PARTITION Sep08 START (date '2008-09-01') INCLUSIVE ,
PARTITION Oct08 START (date '2008-10-01') INCLUSIVE ,
PARTITION Nov08 START (date '2008-11-01') INCLUSIVE ,
PARTITION Dec08 START (date '2008-12-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE );

Note that you do not need to declare an END value for each partition, only the last one. In this example, Jan08 would end where Feb08 starts.


Defining Numeric Range Table Partitions

A numeric range partitioned table uses a single numeric data type column as the partition key column. For example:

CREATE TABLE rank (id int, rank int, year int, gender char(1), count int)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
( START (2001) END (2008) EVERY (1),
DEFAULT PARTITION extra );

Defining List Table Partitions

A list partitioned table can use any data type column that allows equality comparisons as its partition key column. A list partition can also have a multi-column (composite) partition key, whereas a range partition only allows a single column as the partition key. For list partitions, you must declare a partition specification for every partition (list value) you want to create. For example:

CREATE TABLE rank (id int, rank int, year int, gender char(1), count int )
DISTRIBUTED BY (id)
PARTITION BY LIST (gender)
( PARTITION girls VALUES ('F'),
PARTITION boys VALUES ('M'),
DEFAULT PARTITION other );

Defining Multi-level Partitions

It is possible to create a multi-level partition design where you have subpartitions of partitions. Using a subpartition template ensures that every partition has the same subpartition design, even partitions that are added later.

CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)
DISTRIBUTED BY (trans_id)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
( SUBPARTITION usa VALUES ('usa'),
SUBPARTITION asia VALUES ('asia'),
SUBPARTITION europe VALUES ('europe'),
DEFAULT SUBPARTITION other_regions)
( START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month') ),
DEFAULT PARTITION outlying_dates );

Below is a similar example illustrating a three-level partition design where the sales table is partitioned by year, then month, then region. The SUBPARTITION TEMPLATE clauses ensure that each yearly partition has the same subpartition structure. Also note that a DEFAULT partition is declared at each level of the hierarchy:

CREATE TABLE sales (id int, year int, month int, day int, region text)
DISTRIBUTED BY (id)
PARTITION BY RANGE (year)
SUBPARTITION BY RANGE (month)
SUBPARTITION TEMPLATE (
START (1) END (13) EVERY (1),
DEFAULT SUBPARTITION other_months )
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE (
SUBPARTITION usa VALUES ('usa'),
SUBPARTITION europe VALUES ('europe'),
SUBPARTITION asia VALUES ('asia'),
DEFAULT SUBPARTITION other_regions )
( START (2002) END (2010) EVERY (1),
DEFAULT PARTITION outlying_years );

Partitioning an Existing Table

It is not possible to partition a table that has already been created. Tables can only be partitioned at CREATE TABLE time. If you have an existing table that you want to partition, you must recreate the table as a partitioned table, reload the data into the newly partitioned table, drop the original table and rename the partitioned table to the original name. You must also regrant any table permissions. For example:

CREATE TABLE sales2 (LIKE sales)
PARTITION BY RANGE (date)
( START (date '2008-01-01') INCLUSIVE
END (date '2009-01-01') EXCLUSIVE
EVERY (INTERVAL '1 month') );

INSERT INTO sales2 SELECT * FROM sales;

DROP TABLE sales;

ALTER TABLE sales2 RENAME TO sales;

GRANT ALL PRIVILEGES ON sales TO admin;

GRANT SELECT ON sales TO guest;

Limitations of Partitioned Tables

To be able to enforce a primary key or unique constraint, the primary or unique key columns must start with the partitioning key column.

Loading Partitioned Tables

Once you have created your partitioned table structure, top-level parent tables are always empty. Data is routed to the bottom-level child table partitions only. In a multi-level partition design, only the subpartitions at the bottom of the hierarchy can contain data.

If a row cannot be mapped to a child table partition, it will be rejected and the load will fail. If you do not want unmapped rows to be rejected at load time, you can define your partition hierarchy with a DEFAULT partition. Any rows that do not match to an existing partition’s CHECK constraints will then load into the DEFAULT partition.

At runtime, the query planner scans the entire table inheritance hierarchy and uses the CHECK table constraints to determine which of the child table partitions to scan in order to satisfy the query’s conditions. The DEFAULT partition (if your hierarchy has one) is always scanned. If the DEFAULT partition contains data, this will slow down the overall scan time.

When you use COPY or INSERT to load data into a parent table, it automatically gets rerouted to the correct partition by default. Therefore, you can load a partitioned table as you would a regular table.

You can also load data into the child table partitions directly if needed. You can also create an intermediate staging table, load it, and then exchange it into your partition design.


Comments