GPDB Parallel Data Loading

posted Sep 12, 2012, 11:05 AM by Sachchida Ojha
One challenge of large scale, multi-terabyte data warehouses is getting large amounts of data loaded within a given maintenance window. Greenplum supports fast, parallel data loading with its external tables feature. External tables can also be accessed in ‘single row error isolation’ mode, allowing administrators to filter out bad rows during a load operation into a separate error table, while still loading properly formatted rows. Administrators can control the acceptable error threshold for a load operation, giving them control over the quality and flow of data into the database.

By using external tables in conjunction with Greenplum Database’s parallel file server (gpfdist), administrators can achieve maximum parallelism and load bandwidth from their Greenplum Database system. Greenplum has demonstrated load rates in excess of 2 TB an hour.