What are different types of storage model for Greenplum database tables?

posted Apr 28, 2017, 3:52 PM by Sachchida Ojha
Greenplum supports following storage models for a table.

1. Heap Storage: Heap table storage works best with OLTP-type workloads where the data is often modified after it is initially loaded. UPDATE and DELETE operations require storing row-level versioning information to ensure reliable database transaction processing. Heap tables are best suited for smaller tables, such as dimension tables, that are often updated after they are initially loaded.By default, Greenplum Database uses the same heap storage model as PostgreSQL. Row-oriented heap tables are the default storage type. 

Example: CREATE TABLE sachi (id int, name text) DISTRIBUTED BY (id);

2. Append-only storage: Append-only storage model is good for large fact tables where after data is loaded is not updated. These types of tables are loaded using batch process. Once loaded these tables are used for read only queries. Moving large fact tables to an append-only storage model eliminates the storage overhead of the per-row update visibility information, saving about 20 bytes per row. This allows for a leaner and easier-to-optimize page structure. Append-only tables do not allow UPDATE and DELETE operations. Single row INSERT statements are not recommended.

Example: CREATE TABLE sachi (id int, name text)  WITH (appendonly=true) DISTRIBUTED BY (id);

Append only tables with compression: Two types of in-database compression available in the Greenplum Database for append-only tables. Note that compression option is available only for append-only tables. in-database compression can be table level or column level. Table level compression is applied to entire table while column level compression is applied to a specific column. You can use different column-level compression algorithms for different columns. The available supported algorithms for row oriented append only tables are  ZLIB and QUICKLZ. The available supported algorithms for column oriented append only column and table are RLE_TYPE, ZLIB, and QUICKLZ.

When choosing a compression type and level for append-only tables, consider these factors:
1. CPU usage. Your segment systems must have the available CPU power to compress and uncompress the data.

2. Compression ratio/disk size. Minimizing disk size is one factor, but also consider the time and CPU capacity required to compress and scan data. Find the optimal
settings for efficiently compressing data without causing excessively long compression times or slow scan rates.

3. Speed of compression. QuickLZ compression generally uses less CPU capacity and compresses data faster at a lower compression ratio than zlibzlib provides higher compression ratios at lower speeds. For example, at compression level 1 (compresslevel=1), QuickLZ and zlib have comparable compression ratios, though at different speeds. Using zlib with compresslevel=6 can significantly increase the compression ratio compared to QuickLZ, though with lower compression speed.

4. Speed of decompression/scan rate. Performance with compressed append-only tables depends on hardware, query tuning settings, and other factors. Perform
comparison testing to determine the actual performance in your environment.