gp_dump

posted Sep 12, 2012, 1:49 PM by Sachchida Ojha   [ updated Sep 17, 2012, 8:30 AM ]
gp_dump => Writes out a database to SQL script files, which can then be used to restore the database using gp_restore.

Syntax
gp_dump [-a | -s] [-c] [-d] [-D] [-n schema] [-o] [-O] [-t
table_name] [-x] [-h hostname] [-p port] [-U username] [-W] [-i]
[-v] [--gp-c] [--gp-d=backup_directory] [--gp-r=reportfile]
[--gp-s=dbid [, ...]] database_name

gp_dump -? |--help

gp_dump --version

Description
The gp_dump utility dumps the contents of a database into SQL script files, which can then be used to restore the database schema and user data at a later time using gp_restore. During a dump operation, users will still have full access to the database.

The functionality of gp_dump is analogous to PostgreSQL’s pg_dump utility, which writes out (or dumps) the content of a database into a script file. The script file contains SQL commands that can be used to restore the databases, data, and global objects such as users, groups, and access permissions.

The functionality of gp_dump is modified to accommodate the distributed nature of a Greenplum database. Keep in mind that a database in Greenplum Database is actually comprised of several PostgreSQL instances (the master and all segments), each of which must be dumped individually. The gp_dump utility takes care of dumping all of the individual instances across the system.

The gp_dump utility performs the following actions and produces the following dump files by default:

On the master host
1. Dumps the Greenplum Database system configuration tables into a SQL file in the master data directory. The default naming convention of this file is gp_catalog_1_<dbid>_<timestamp>.

2. Dumps CREATE DATABASE SQL statements into a file in the master data directory. The default naming convention of this file is gp_cdatabase_1_<dbid>_<timestamp>. This statement can be run on the master instance to recreate the user database(s).

3. Dumps the user database schema(s) into a SQL file in the master data directory. The default naming convention of this file is gp_dump_1_<dbid>_<timestamp>. This file is used by gp_restore to recreate the database schema(s).

4. Creates a log file in the master data directory named gp_dump_status_1_<dbid>_<timestamp>.

5. gp_dump launches a gp_dump_agent for each segment instance to be backed up. gp_dump_agent processes run on the segment hosts and report status back to the gp_dump process running on the master host.

On the segment hosts
1. Dumps the user data for each segment instance into a SQL file in the segment instance’s data directory. By default, only primary (or active) segment instances are backed up. The default naming convention of this file is gp_dump_0_<dbid>_<timestamp>. This file is used by gp_restore to recreate that particular segment of user data.

2. Creates a log file in each segment instance’s data directory named gp_dump_status_0_<dbid>_<timestamp>.

Note that the 14 digit timestamp is the number that uniquely identifies the backup job, and is part of the filename for each dump file created by a gp_dump operation. This timestamp must be passed to the gp_restore utility when restoring a Greenplum database.

Options
-a | --data-only
Dump only the data, not the schema (data definitions).

-s | --schema-only
Dump only the object definitions (schema), not data.

-c | --clean
Output commands to clean (drop) database objects prior to (the commands for) creating them.

-d | --inserts
Dump data as INSERT commands (rather than COPY). This will make restoration very slow; it is mainly useful for making dumps that can be loaded into
non-PostgreSQL based databases. Note that the restore may fail altogether if you have rearranged column order. The -D option is safer, though even slower.

-D | --column-inserts
Dump data as INSERT commands with explicit column names (INSERT INTO table (column, ...) VALUES ...). This will make restoration very slow; it is
mainly useful for making dumps that can be loaded into non-PostgreSQL based databases.

-n schema | --schema=schema
Dumps the contents of the named schema only. If this option is not specified, all non-system schemas in the target database will be dumped. You cannot backup
system catalog schemas (such as pg_catalog) with gp_dump.

Caution: In this mode, gp_dump makes no attempt to dump any other database objects that objects in the selected schema may depend upon. Therefore, there is no guarantee that the results of a single-schema dump can be successfully restored by themselves into a clean database.

-o | --oids
Dump object identifiers (OIDs) as part of the data for every table. Use of OIDs is not recommended in Greenplum Database, so this option should not be used if restoring data to another Greenplum Database installation.

-O | --no-owner
Do not output commands to set ownership of objects to match the original database. By default, gp_dump issues ALTER OWNER or SET SESSION AUTHORIZATION statements to set ownership of created database objects. These statements will fail when the script is run unless it is started by a superuser (or the same user that owns all of the objects in the script). To make a script that can be restored by any user, but will give that user ownership of all the objects, specify -O.

-t table | --table=table
Dump only tables (or views or sequences) matching the table pattern. Multiple tables can be selected by writing multiple -t switches. Also, the table parameter is
interpreted as a pattern according to the same rules used by psql’s \d commands, so multiple tables can also be selected by writing wildcard characters in the pattern. 

When using wildcards, be careful to quote the pattern if needed to prevent the shell from expanding the wildcards. The -n and -N switches have no effect when -t is
used, because tables selected by -t will be dumped regardless of those switches, and non-table objects will not be dumped.

Note: When -t is specified, pg_dump makes no attempt to dump any other database objects that the selected table(s) may depend upon. Therefore, there is no guarantee that the results of a specific-table dump can be successfully restored by themselves into a clean database.

Note: -t cannot be used to specify a child table partition. To dump a partitioned table, you must specify the parent table name.

-T table | --exclude-table=table
Do not dump any tables matching the table pattern. The pattern is interpreted according to the same rules as for -t. -T can be given more than once to exclude
tables matching any of several patterns. When both -t and -T are given, the behavior is to dump just the tables that match at least one -t switch but no -T
switches. If -T appears without -t, then tables matching -T are excluded from what is otherwise a normal dump.

-x | --no-privileges | --no-acl
Prevents the dumping of access privileges (GRANT/REVOKE commands).

-h hostname | --host=hostname
The host name of the Greenplum Database master host. If not provided, the value of $PGHOST or the local host is used.

-p port | --port=port
The Greenplum Database master port. If not provided, the value of $PGPORT or the port number provided at compile time is used.

-U username | --username=user
The database superuser account name, for example gpadmin. If not provided, the value of $PGUSER or the current OS user name is used.

-W (force password prompt)
Forces a password prompt. This will happen automatically if the server requires password authentication.

-i | --ignore-version
Ignores a version mismatch between gp_dump and the database server.

-v | --verbose
Specifies verbose mode. This will cause gp_dump to output detailed object comments and start/stop times to the dump file, and progress messages to standard
error.

--gp-c (use gzip)
Use gzip for inline compression.

--gp-d=directoryname
Specifies the relative or absolute path where the backup files will be placed on each host. If this is a relative path, it is considered to be relative to the data directory. If the path does not exist, it will be created, if possible. If not specified, defaults to the data directory of each instance to be backed up. Using this option may be desirable if each segment host has multiple segment instances — it will create the dump files in a centralized location.

--gp-r=reportfile
Specifies the full path name where the backup job log file will be placed on the master host. If not specified, defaults to the master data directory or if running
remotely, the current working directory. 

--gp-s=dbid (backup certain segments)
Specifies the set of active segment instances to back up with a comma-separated list of the segments dbid. The default is to backup all active segment instances.
database_name Required. The name of the database you want to dump. If not specified, the value of $PGDATABASE will be used. The database name must be stated last after all other options have been specified.

-? | --help (help)
Displays the online help.

--version (show utility version)
Displays the version of this utility.


Examples

Back up a database:
gp_dump gpdb

Back up a database, and create dump files in a centralized location on all hosts:
gp_dump --gp-d=/home/gpadmin/backups gpdb

Back up a particular schema only:
gp_dump -n myschema mydatabase

Back up a single segment instance only (by noting the dbid of the segment instance):
gp_dump --gp-s=5 gpdb


Backing Up a Database with gp_dump

To backup a Greenplum database using gp_dump

1.From the master, run the gp_dump utility. For example (where mydatabase is the name of the database you are backing up):

$ gp_dump mydatabase

The gp_dump utility dumps the contents of a Greenplum Database system into a series of SQL utility files, which can then be used to restore a Greenplum Database system configuration, database, and data. During a dump operation users will still have access to the database.

The gp_dump utility performs the following actions and produces the following dump files:

On the master host

1. Dumps the Greenplum configuration system catalog tables into a SQL file in the master data directory. The default naming convention of this file is gp_catalog_1_<dbid>_<timestamp>.

2. Dumps a CREATE DATABASE SQL statement into a file in the master data directory. The default naming convention of this file is gp_cdatabase_1_<dbid>_<timestamp>. This statement can be run on the master instance to recreate the database.

3. Dumps the database schema(s) into a SQL file in the master data directory. The default naming convention of this file is gp_dump_1_<dbid>_<timestamp>. This file is used to recreate the database objects.

4. Creates a log file in the master data directory named gp_dump_status_1_<dbid>_<timestamp>.

5. gp_dump launches a gp_dump_agent for each segment instance to be backed up. gp_dump_agent processes run on the segment hosts and report status back to the gp_dump process running on the master host.

On the segment hosts

1. Dumps the data for each segment instance into a SQL file in the segment instance’s data directory. By default, only primary (or active) segment instances are backed up. The default naming convention of this file is gp_dump_0_<dbid>_<timestamp>. This file is used to recreate that particular segment of data.

2. Creates a log file in each segment instance’s data directory named gp_dump_status_0_<dbid>_<timestamp>.

Note that the 14 digit timestamp is the number that uniquely identifies the backup job, and is part of the filename for each dump file created by a gp_dump operation. This timestamp must be passed to the gp_restore utility when restoring a Greenplum database.





Comments