Restore Greenplum Database

Greenplum gives you various options to restore the database from the backup. The restore option depends upon the type of backup files available. You can use parallel restore or non-parallel restore. If you are planning to restore your greenplum database from a backup taken from different configuration then you have to use special method to restore the database. 

Parallel Restores
To do a parallel restore, you must have a complete backup set created by gp_dump or gpcrondump. Greenplum provides a parallel restore utility called gp_restore. This utility takes the timestamp key generated by gp_dump, validates the backup set, and restores the database objects and data into a distributed database. As with a parallel dump, each segment’s data is restored in parallel.

Greenplum also provides the gpdbrestore utility, which is a wrapper for gp_restore. gpdbrestore provides additional flexibility and verification options, which are useful if you are using automated backup files produced by gpcrondump, or have moved your backup files off of the Greenplum array to an alternate location.

a) Restoring Greenplum database using gp_restore
b) Restoring Greenplum database using  gpdbrestore

The procedure for restoring a database from parallel backup files depends on a few factors. To determine the restore procedure to use, determine your answers to the following questions:

1. Where are your backup files located? If your backup files reside in their original location on the segment hosts where they were created by gp_dump, you can do a simple restore using gp_restore. If you have moved your backup files off of your Greenplum array, for example to an archive server, use gpdbrestore.

2. Do you need to restore your entire system, or just your data? If you have your Greenplum Database up and running and just need to restore your data, you can do a restore using gp_restore or gpdbrestore. If you have lost your entire array and need to rebuild the entire system from backup, use gpinitsystem.

3. Are you restoring to a system with the same number of segment instances as your backup set? If you are restoring to an array with the same number of segment hosts and segment instances per host, use gp_restore or gpdbrestore. If you are migrating to a different array configuration, you must do a non-parallel restore.

Non-Parallel Restores
Greenplum also supports the regular PostgreSQL restore utility: pg_restore. This utility is mostly supported for users who are migrating to Greenplum Database from regular PostgreSQL, and have compressed dump files created by pg_dump or pg_dumpall. Before restoring PostgreSQL dump files into Greenplum Database, make sure to modify the CREATE TABLE statements in the dump files to include the Greenplum DISTRIBUTED clause.

a) Restoring greenplum database using pg_restore

It may also sometimes be necessary to do a non-parallel restore from a parallel backup set. For example, suppose you are migrating from a Greenplum system that has four segments to one that has five segments. You cannot do a parallel restore in this case, because your backup set only has four backup files and would not be evenly distributed across the new expanded system. A non-parallel restore using parallel backup files involves collecting each backup file from the segment hosts, copying them to the master host, and loading them through the master.

Restoring Greenplum Database to a Different Greenplum System Configuration