Note: The pg_compression system catalog table describes the compression methods available Table "pg_catalog.pg_compression" Column | Type | Modifiers ------------------+---------+----------- compname | name | not null compconstructor | regproc | not null compdestructor | regproc | not null compcompressor | regproc | not null compdecompressor | regproc | not null compvalidator | regproc | not null compowner | oid | not null Indexes: "pg_compression_compname_index" UNIQUE, btree (compname) "pg_compression_oid_index" UNIQUE, btree (oid)
Compression Considerations Append Only Tables
Do not use GPDB compression on file system that use compression. For additional information refer to the GPDB Database Administrator Guide. gp_workfile_compress_algorithm Default value: none When a hash aggregation or hash join operation spills to disk, specifies the compression algorithm to use on the spill files If using zlib, it must be in your ${PATH} on all segments for user gpadmin Recommended to compress spill files if the system is I/O bound Greenplum Database supports several storage models and a mix of storage models. When you create a table, you choose how to store its data. This presentation explains the options for table storage and how to choose the best storage model for your workload. Using Compression (Append-Only Tables Only) Checking Compression and Distribution of an Append-Only Table When choosing a compression type and level for append-only tables, consider these factors: CPU usage. Your segment systems must have the available CPU power to compress and uncompress the data. Compression ratio/disk size. Minimizing disk size is one factor, but also consider the time and CPU capacity required to compress and scan data. Find the optimal settings for efficiently compressing data without causing excessively long compression times or slow scan rates. Speed of compression. QuickLZ compression generally uses less CPU capacity and compresses data faster at a lower compression ratio than zlib. zlib provides higher compression ratios at lower speeds. For example, at compression level 1 (compresslevel=1), QuickLZ and zlib have comparable compression ratios, though at different speeds. Using zlib with compresslevel=6 can significantly increase the compression ratio compared to QuickLZ, though with lower compression speed. Speed of decompression/scan rate. Performance with compressed append-only tables depends on hardware, query tuning settings, and other factors. Perform comparison testing to determine the actual performance in your environment. Do not use compressed append-only tables on file systems that use compression. If the file system on which your segment data directory resides is a compressed file system, your append-only table must not use compression. Note: Do not use compressed append-only tables on file systems that use compression. If the file system on which your segment data directory resides is a compressed file system, your append-only table must not use compression. Performance with compressed append-only tables depends on hardware, query tuning settings, and other factors. Greenplum recommends performing comparison testing to determine the actual performance in your environment. Note: QuickLZ compression level can only be set to level 1; no other options are available. Compression level with zlib can be set at any value between 1 and 9. Note: When an ENCODING clause conflicts with a WITH clause, the ENCODING clause has higher precedence than the WITH clause. The WITH clause of the CREATE TABLE command declares the table storage options. Tables that use compression must be append-only tables. For example, to create an append-only table with zlib compression at a compression level of 5: CREATE TABLE mytable (a int, b char(20)) WITH (appendonly=true, compresstype=zlib, compresslevel=5) DISTRIBUTED BY (a); Greenplum provides built-in functions to check the compression ratio and the distribution of an append-only table. The functions take either the object ID or a table name. You can qualify the table name with a schema name. The compression ratio is returned as a common ratio. For example, a returned value of 3.19, or 3.19:1, means that the uncompressed table is slightly larger than three times the size of the compressed table. sachi=# select * from pg_catalog.pg_compression; compname | compconstructor | compdestructor | compcompressor | compdecompressor | co mpvalidator | compowner ----------+----------------------------------+---------------------------------+-------------------------------+---------------------------------+----------- ---------------------+----------- zlib | gp_zlib_constructor | gp_zlib_destructor | gp_zlib_compress | gp_zlib_decompress | gp_zlib_va lidator | 10 quicklz | gp_quicklz_constructor | gp_quicklz_destructor | gp_quicklz_compress | gp_quicklz_decompress | gp_quicklz _validator | 10 rle_type | gp_rle_type_constructor | gp_rle_type_destructor | gp_rle_type_compress | gp_rle_type_decompress | gp_rle_typ e_validator | 10 none | gp_dummy_compression_constructor | gp_dummy_compression_destructor | gp_dummy_compression_compress | gp_dummy_compression_decompress | gp_dummy_c ompression_validator | 10 (4 rows) Q. What is compression ratio like for greenplum column oriented table Ans: Greenplum row compression is only for Append Only and is about 35% - 40% reduction in volume. Row compression is done at the partition level, and once, compressed it is not updatable. However, Pivotal Greenplum is working on future feature release to have it updatable. The compression using compresstype=quicklz with default=1 is about 6 times the compression ratio. Eg. for 2 TB the table compressed will be around 360GB using compress_type "quicklz" with default=1. This are approximation, so the conclusion is 6 times compressed ratio. The following built-in function is provided to check the compression ratio of an append-only compressed table, and it takes table name or object ID. The function calculates the compression ratio: get_ao_compression_ratio(name) get_ao_compression_ratio(oid) 3|uncompressed|public.foo_tab|Fri Nov 14 14:30:00 EST 2014|1415993400|1415993536|136|100000000|2602065248|2482 MB 3|zlib_5|public.foo_tab|Fri Nov 14 14:32:23 EST 2014|1415993543|1415993635|92|100000000|621586400|593 MB 3|zlib_7|public.foo_tab|Fri Nov 14 14:34:05 EST 2014|1415993645|1415993753|108|100000000|619200880|591 MB 3|quicklz|public.foo_tab|Fri Nov 14 14:36:03 EST 2014|1415993763|1415993810|47|100000000|835938248|797 MB My colleague RAM has created a table foo_tab and it has 100 million rows, and you can see from the above that actual table is 3 times larger than the quicklz compressed table, and 4 times larger than the zlib level 5 compressed table. The procedure returns similar compression ratio for quicklz and zlib level 5 compressed tables: gpdb=# select get_ao_compression_ratio('public.foo_tab_quicklz'); get_ao_compression_ratio -------------------------- 3.11 (1 row) gpdb=# select get_ao_compression_ratio('public.foo_tab_zlib5'); get_ao_compression_ratio -------------------------- 4.19 (1 row) There is little difference in compressed table size and compression ratio between zlib level 5 and level 7 append-only tables. |