Restoring a Database Using gpdbrestore

gpcrondump => A wrapper utility for gp_dump, which can be called directly or from a crontab entry.

gpcrondump -x database_name 
[-s schema | -t schema.table | -T schema.table] 
[-w dbid[,...]] [-u backup_directory] 
[-R post_dump_script] [-c | -o] [-z] [-r] [-f free_space_percent]
[-b] [-i] [-j | -k] [-g] [-G] [-C] [-d master_data_directory] 
[-B parallel_processes] [-a] [-q] 
[-y reportfile] [-l logfile_directory] [-D] 
{ [-E encoding] [--inserts | --column-inserts] [--oids]
[--no-owner | --use-set-session-authorization]
[--no-privileges] }

gpcrondump -?

gpcrondump -v

gpcrondump is a wrapper utility for gp_dump. By default, dump files are created in their respective master and segment data directories in a directory named db_dumps/YYYYMMDD. The data dump files are compressed by default using gzip.

gpcrondump allows you to schedule routine backups of a Greenplum database using cron (a scheduling utility for UNIX operating systems). Cron jobs that call gpcrondump should be scheduled on the master host.

Email Notifications
To have gpcrondump send out status email notifications, you must place a file named mail_contacts in the home directory of the Greenplum superuser (gpadmin) or in the same directory as the gpcrondump utility ($GPHOME/bin). This file should contain one email address per line. gpcrondump will issue a warning if it cannot locate a mail_contacts file in either location. If both locations have a mail_contacts file, then the one in $HOME takes precedence.

-a (do not prompt)
Do not prompt the user for confirmation.

-b (bypass disk space check)
Bypass disk space check. The default is to check for available disk space.

-B parallel_processes
The number of segments to check in parallel for pre/post-dump validation. If not specified, the utility will start up to 60 parallel processes depending on how many
segment instances it needs to dump.

-c (clear old dump files first)
Clear out old dump files before doing the dump. The default is not to clear out old dump files. This will remove all old dump directories in the db_dumps directory,
except for the dump directory of the current date.

-C (clean old catalog dumps)
Clean out old catalog schema dump files prior to create.

Dump data as INSERT commands with column names.

-d master_data_directory
The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.

-D (debug)
Sets logging level to debug.

-E encoding
Character set encoding of dumped data. Defaults to the encoding of the database being dumped. See “Character Set Support” on page 958 for the list of supported
character sets.

-f free_space_percent
When doing the check to ensure that there is enough free disk space to create the dump files, specifies a percentage of free disk space that should remain after the
dump completes. The default is 10 percent.

-g (copy config files)
Secure a copy of the master and segment configuration files postgresql.conf, pg_ident.conf, and pg_hba.conf. These configuration files are dumped in the
master or segment data directory to db_dumps/YYYYMMDD/config_files_<timestamp>.tar

-G (dump global objects)
Use pg_dumpall to dump global objects such as roles and tablespaces. Global objects are dumped in the master data directory to db_dumps/YYYYMMDD/gp_global_1_1_<timestamp>.

-i (ignore parameter check)
Ignore the initial parameter check phase.

Dump data as INSERT, rather than COPY commands.

-j (vacuum before dump)
Run VACUUM before the dump starts.

-k (vacuum after dump)
Run VACUUM after the dump has completed successfully.

-l logfile_directory
The directory to write the log file. Defaults to ~/gpAdminLogs.

Do not output commands to set object ownership.

Do not output commands to set object privileges (GRANT/REVOKE commands).

-o (clear old dump files only)
Clear out old dump files only, but do not run a dump. This will remove the oldest dump directory except the current date’s dump directory. All dump sets within that
directory will be removed.

Include object identifiers (oid) in dump data.

-p (primary segments only)
Dump all primary segments, which is the default behavior. Note: this option is deprecated.

-q (no screen output)
Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.

-r (no rollback on failure)
Do not rollback the dump files (delete a partial dump) if a failure is detected. The default is to rollback (delete partial dump files).

-R post_dump_script
The absolute path of a script to run after a successful dump operation. For example, you might want a script that moves completed dump files to a backup host. This script must reside in the same location on the master and all segment hosts.

-s schema_name
Dump only the named schema in the named database.

-t schema.table_name
Dump only the named table in this database. The -t option can be specified multiple times.

-T schema.table_name
A table name to exclude from the database dump. The -T option can be specified multiple times.

-u backup_directory
Specifies the absolute path where the backup files will be placed on each host. If the path does not exist, it will be created, if possible. If not specified, defaults to the data directory of each instance to be backed up. Using this option may be desirable if each segment host has multiple segment instances as it will create the dump files in a centralized location rather than the segment data directories.

Use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set object ownership.

-v (show utility version)
Displays the version, status, last updated date, and check sum of this utility. 

-w dbid[,...] (backup certain segments only)
Specifies a set of active segment instances to back up as a comma-separated list of the segments’ dbid. The master is added to the list automatically. The default is to backup all active segment instances.

-x database_name
Required. The name of the Greenplum database to dump.

-y reportfile
Specifies the full path name where the backup job log file will be placed on the master host. If not specified, defaults to the master data directory or if running
remotely, the current working directory.

-z (no compression)
Do not use compression. Default is to compress the dump files using gzip.

-? (help)
Displays the online help.

Call gpcrondump directly and dump mydatabase (and global objects):

gpcrondump -x mydatabase -c -g -G
A Linux crontab entry that runs a backup of the sales database (and global objects) nightly at one past midnight:
01 0 * * * gpadmin gpcrondump -x sales -c -g -G -a -q >> gp_salesdump.log

A Solaris crontab entry that runs a backup of the sales database (and global objects) nightly at one past midnight (no line breaks):
01 0 * * * SHELL=/bin/bash ; GPHOME=/usr/local/greenplum-db-4.0.x.x ; PATH=$PATH:$GPHOME/bin ; HOME=/export/home/gpadmin ; MASTER_DATA_DIRECTORY=/data/gpdb_p1/gp-1 ; /usr/local/greenplum-db/bin/gpcrondump -x sales -c -g -G -a -q >> gp_salesdump.log

Restoring a Database Using gpdbrestore

The gpdbrestore utility is a wrapper around gp_restore, which provides some convenience and flexibility in restoring from a set of backup files created by gpcrondump. To do a restore using gpdbrestore, you must have:

1. Backup files created by a gpcrondump operation.

2. The Greenplum Database system up and running.

3. A Greenplum Database system with the exact same number of primary segment instances as the system that was backed up.

4. The database you are restoring to is created in the system.

To restore from an archive host using gpdbrestore
(This procedure assumes your backup set has been moved off of your Greenplum array to another host in the network)

1.First, make sure that the archive host is reachable from the Greenplum master host:
$ ping archive_host

2.Make sure the database you are restoring to has been created in the system. For example:
$ createdb database_name

3.From the master, run the gpdbrestore utility. For example (where -R specifies the host name and path to a complete backup set):
$ gpdbrestore -R archive_host:/gpdb/backups/archive/20080714