Backup Greenplum Database

Greenplum recommends that you take regular backups of your databases. These backups can be used to restore your data or to rebuild a Greenplum Database system in the event of a system failure or data corruption. You can also use the backups to migrate data from one Greenplum Database system to another.

Restore Greenplum Database

Parallel Backups using Greenplum backup utility

Greenplum provides a parallel dump utility called gp_dump. This utility backs up the Greenplum master instance and each active segment instance at the same time.

Because the segments are dumped in parallel, the time it takes to do a backup should scale regardless of the number of segments in your system. The dump files on the master host consist of DDL statements and the Greenplum-specific system catalog tables (such as gp_segment_configuration). On the segment hosts, there is one dump file created for each segment instance. The segment dump files contain the data for an individual segment instance. All of the dump files that comprise a total backup set are identified by a unique 14-digit timestamp key.

In order to automate routine backups, Greenplum also provides the gpcrondump utility, which is a wrapper for gp_dump that can be called directly or from a scheduled CRON job. gpcrondump also allows you to backup additional objects besides your databases and data, such as database roles and server configuration files.

a) Backing GPDB using gp_dump utility

b) Automating greenplum database backup using gpcrondump

Non-Parallel Backups using PostgreSQL backup utility
Greenplum also supports the regular PostgreSQL dump utilities: pg_dump and pg_dumpall. The PostgreSQL dump utilities (when used on Greenplum Database) will create one big dump file on the master host containing the data from all active segments. In most cases, this is probably not practical, as there is most likely not enough disk space on the master host to create a single backup file of an entire distributed database. These utilities are mostly supported for users who are migrating from regular PostgreSQL to Greenplum Database.

a) Backing Greenplum database using pg_dump utility

b)  Backing Greenplum database using pg_dumpall.

Another useful command for getting data out of a database is the COPY TO SQL command. This allows you to copy all or a portion of a table out of the database to a text-delimited file on the master host.

c) Backing Greenplum database using COPY command.

If you are migrating your data to another Greenplum Database system with a different segment configuration (for example, if the system you are migrating to has greater or fewer segment instances), Greenplum recommends using your parallel dump files created by gp_dump or gpcrondump.