greenplum-database-dba-references

Greenplum Sorting Functions like Oracle

posted Apr 29, 2017, 1:23 AM by Sachchida Ojha

create or replace function pgoramin
(
  is_val_1                varchar,
  is_val_2                varchar
)
returns varchar
as
$$
begin
  if (oracompat.nlssort(is_val_1,'C') >= oracompat.nlssort(is_val_2,'C')) then
    return is_val_2;
  else
    return is_val_1;
  end if;
end
$$
language plpgsql immutable strict;

create or replace function pgoramax
(
  is_val_1                varchar,
  is_val_2                varchar
)
returns varchar
as
$$
begin
  if (oracompat.nlssort(is_val_1,'C') >= oracompat.nlssort(is_val_2,'C')) then
    return is_val_1;
  else
    return is_val_2;
  end if;
end
$$
language plpgsql immutable strict;

drop aggregate if exists oracharmax(varchar);

create aggregate oracharmax (varchar)
(
  sfunc = pgoramax,
  stype = varchar,
  prefunc = pgoramax
);

drop aggregate if exists oracharmin(varchar);

create aggregate oracharmin (varchar)
(
  sfunc = pgoramin,
  stype = varchar,
  prefunc = pgoramin
);

How to use PSQL in greenplum

posted Apr 29, 2017, 1:22 AM by Sachchida Ojha

References
=======================

PSQL
sachi=> \?
      1 General
      2   \copyright             show PostgreSQL usage and distribution terms
      3   \g [FILE] or ;         execute query (and send results to file or |pipe)
      4   \h [NAME]              help on syntax of SQL commands, * for all commands
      5   \q                     quit psql
      6 
      7 Query Buffer
      8   \e [FILE]              edit the query buffer (or file) with external editor
      9   \ef [FUNCNAME]         edit function definition with external editor
     10   \p                     show the contents of the query buffer
     11   \r                     reset (clear) the query buffer
     12   \s [FILE]              display history or save it to file
     13   \w FILE                write query buffer to file
     14 
     15 Input/Output
     16   \copy ...              perform SQL COPY with data stream to the client host
     17   \echo [STRING]         write string to standard output
     18   \i FILE                execute commands from file
     19   \o [FILE]              send all query results to file or |pipe
     20   \qecho [STRING]        write string to query output stream (see \o)
     21 
     22 Informational
     23   (options: S = show system objects, + = additional detail)
     24   \d[S+]                 list tables, views, and sequences
     25   \d[S+]  NAME           describe table, view, sequence, or index
     26   \da[+]  [PATTERN]      list aggregates
     27   \db[+]  [PATTERN]      list tablespaces
     28   \dc[S]  [PATTERN]      list conversions
     29   \dC     [PATTERN]      list casts
     30   \dd[S]  [PATTERN]      show comments on objects
     31   \dD[S]  [PATTERN]      list domains
     32   \des[+] [PATTERN]      list foreign servers
     33   \deu[+] [PATTERN]      list user mappings
     34   \dew[+] [PATTERN]      list foreign-data wrappers
     35   \df[antw][S+] [PATRN]  list [only agg/normal/trigger/window] functions
     36   \dF[+]  [PATTERN]      list text search configurations
     37   \dFd[+] [PATTERN]      list text search dictionaries
     38   \dFp[+] [PATTERN]      list text search parsers
     39   \dFt[+] [PATTERN]      list text search templates
     40   \dg[+]  [PATTERN]      list roles (groups)
     41   \di[S+] [PATTERN]      list indexes
     42   \dl                    list large objects, same as \lo_list
     43   \dn[+]  [PATTERN]      list schemas
     44   \do[S]  [PATTERN]      list operators
     45   \dp     [PATTERN]      list table, view, and sequence access privileges
     46   \ds[S+] [PATTERN]      list sequences
     47   \dt[S+] [PATTERN]      list tables
     48   \dT[S+] [PATTERN]      list data types
     49   \du[+]  [PATTERN]      list roles (users)
     50   \dv[S+] [PATTERN]      list views
     51   \l[+]                  list all databases
     52   \z      [PATTERN]      same as \dp
     53 
     54 Formatting
     55   \a                     toggle between unaligned and aligned output mode
     56   \C [STRING]            set table title, or unset if none
     57   \f [STRING]            show or set field separator for unaligned query output
     58   \H                     toggle HTML output mode (currently off)
     59   \pset NAME [VALUE]     set table output option
     60                          (NAME := {format|border|expanded|fieldsep|footer|null|
     61                          numericlocale|recordsep|tuples_only|title|tableattr|pager})
     62   \t [on|off]            show only rows (currently off)
     63   \T [STRING]            set HTML <table> tag attributes, or unset if none
     64   \x [on|off]            toggle expanded output (currently off)
     65 
     66 Connection
     67   \c[onnect] [DBNAME|- USER|- HOST|- PORT|-]
     68                          connect to new database (currently "sachi")
     69   \encoding [ENCODING]   show or set client encoding
     70   \password [USERNAME]   securely change the password for a user
     71 
     72 Operating System
     73   \cd [DIR]              change the current working directory
     74   \timing [on|off]       toggle timing of commands (currently off)
     75   \! [COMMAND]           execute command in shell or start interactive shell
     76 
     77 Variables
     78   \prompt [TEXT] NAME    prompt user to set internal variable
     79   \set [NAME [VALUE]]    set internal variable, or list all if no parameters
     80   \unset NAME            unset (delete) internal variable
     81 
     82 Large Objects
     83   \lo_export LOBOID FILE
     84   \lo_import FILE [COMMENT]
     85   \lo_list
     86   \lo_unlink LOBOID      large object operations
sachi=> 
psql is a terminal-based front-end to Greenplum Database. It enables you to type in queries interactively, issue them to Greenplum Database, and see the query results. Alternatively, input can be from a file. In addition, it provides a number of meta-commands and various shell-like features to facilitate writing scripts and automating a wide variety of tasks.

Options
-a | --echo-all
Print all input lines to standard output as they are read. This is more useful for script processing rather than interactive mode.

-A | --no-align
Switches to unaligned output mode. (The default output mode is aligned.)

-c 'command' | --command 'command'
Specifies that psql is to execute the specified command string, and then exit. This is useful in shell scripts. command must be either a command string that is completely parseable by the server, or a single backslash command. Thus you cannot mix SQL and psql meta-commands with this option. To achieve that, you could pipe the string into psql, like this: echo '\x \\ 

SELECT * FROM foo;' | psql. (\\ is the separator meta-command.)

If the command string contains multiple SQL commands, they are processed in a single transaction, unless there are explicit BEGIN/COMMIT commands included in the string to divide it into multiple transactions. This is different from the behavior when the same string is fed to psql’s standard input.

-d dbname | --dbname dbname
Specifies the name of the database to connect to. This is equivalent to specifying dbname as the first non-option argument on the command line.
If this parameter contains an equals sign, it is treated as a conninfo string; for example you can pass 'dbname=postgres user=username password=mypass' as dbname.

-e | --echo-queries
Copy all SQL commands sent to the server to standard output as well.

-E | --echo-hidden
Echo the actual queries generated by \d and other backslash commands. You can use this to study psql’s internal operations.

-f filename | --file filename
Use a file as the source of commands instead of reading commands interactively. After the file is processed, psql terminates. If filename is - (hyphen), then standard input is read. Using this option is subtly different from writing psql < filename. In general, both will do what you expect, but using -f enables some nice features such as error messages with line numbers.

-F separator | --field-separator separator
Use the specified separator as the field separator for unaligned output.

-H | --html
Turn on HTML tabular output.

-l | --list
List all available databases, then exit. Other non-connection options are ignored.

-L filename | --log-file filename
Write all query output into the specified log file, in addition to the normal output destination.

-o filename | --output filename
Put all query output into the specified file.

-P assignment | --pset assignment
Allows you to specify printing options in the style of \pset on the command line. Note that here you have to separate name and value with an equal sign instead of a space. Thus to set the output format to LaTeX, you could write -P format=latex.

-q | --quiet
Specifies that psql should do its work quietly. By default, it prints welcome messages and various informational output. If this option is used, none of this happens. This is useful with the -c option.

-R separator | --record-separator separator
Use separator as the record separator for unaligned output.

-s | --single-step
Run in single-step mode. That means the user is prompted before each command is sent to the server, with the option to cancel execution as well. Use this to debug scripts.

-S | --single-line
Runs in single-line mode where a new line terminates an SQL command, as a semicolon does.

-t | --tuples-only
Turn off printing of column names and result row count footers, etc. This command is equivalent to \pset tuples_only and is provided for convenience.

-T table_options | --table-attr table_options
Allows you to specify options to be placed within the HTML table tag. See \pset for details.

-v assignment | --set assignment | --variable assignment
Perform a variable assignment, like the \set internal command. Note that you must separate name and value, if any, by an equal sign on the command line. To unset a variable, leave off the equal sign. To just set a variable without a value, use the equal sign but leave off the value. These assignments are done during a very early stage of start-up, so variables reserved for internal purposes might get overwritten later.

-V | --version
Print the psql version and exit.

-x | --expanded
Turn on the expanded table formatting mode.

-X | --no-psqlrc
Do not read the start-up file (neither the system-wide psqlrc file nor the user’s ~/.psqlrc file).

-1 | --single-transaction
When psql executes a script with the -f option, adding this option wraps BEGIN/COMMIT around the script to execute it as a single transaction. This ensures that either all the commands complete successfully, or no changes are applied.

If the script itself uses BEGIN, COMMIT, or ROLLBACK, this option will not have the desired effects. Also, if the script contains any command that cannot be executed inside a transaction block, specifying this option will cause that command (and hence the whole transaction) to fail.

-? | --help
Show help about psql command line arguments, and exit.
Connection Options
-h host | --host host
The host name of the machine on which the Greenplum master database server is running. If not specified, reads from the environment variable PGHOST or defaults to localhost.

-p port | --port port
The TCP port on which the Greenplum master database server is listening for connections. If not specified, reads from the environment variable PGPORT or defaults to 5432.

-U username | --username username
The database role name to connect as. If not specified, reads from the environment variable PGUSER or defaults to the current system role name.

-W | --password
Force a password prompt. psql should automatically prompt for a password whenever the server requests password authentication. However, currently password request detection is not totally reliable, hence this option to force a prompt. If no password prompt is issued and the server requires password authentication, the connection attempt will fail.

-w
--no-password
Never issue a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.
Note: This option remains set for the entire session, and so it affects uses of the meta-command \connect as well as the initial connection attempt.

Exit Status
psql returns 0 to the shell if it finished normally, 1 if a fatal error of its own (out of memory, file not found) occurs, 2 if the connection to the server went bad and the session was not interactive, and 3 if an error occurred in a script and the variable ON_ERROR_STOP was set.

Usage
Connecting To A Database
psql is a client application for Greenplum Database. In order to connect to a database you need to know the name of your target database, the host name and port number of the Greenplum master server and what database user name you want to connect as. psql can be told about those parameters via command line options, namely -d, -h, -p, and -U respectively. If an argument is found that does not belong to any option it will be interpreted as the database name (or the user name, if the database name is already given). Not all these options are required; there are useful defaults. If you omit the host name, psql will connect via a UNIX-domain socket to a master server on the local host, or via TCP/IP to localhost on machines that do not have UNIX-domain sockets. The default master port number is 5432. If you use a different port for the master, you must specify the port. The default database user name is your UNIX user
name, as is the default database name. Note that you cannot just connect to any database under any user name. Your database administrator should have informed you about your access rights.

When the defaults are not right, you can save yourself some typing by setting any or all of the environment variables PGAPPNAME, PGDATABASE, PGHOST, PGPORT, and PGUSER to appropriate values.

It is also convenient to have a ~/.pgpass file to avoid regularly having to type in passwords. This file should reside in your home directory and contain lines of the following format:
hostname:port:database:username:password

The permissions on .pgpass must disallow any access to world or group (for example: chmod 0600 ~/.pgpass). If the permissions are less strict than this, the file will be ignored. (The file permissions are not currently checked on Microsoft Windows clients, however.)

If the connection could not be made for any reason (insufficient privileges, server is not running, etc.), psql will return an error and terminate.

Entering SQL Commands
In normal operation, psql provides a prompt with the name of the database to which psql is currently connected, followed by the string => for a regular user or =# for a superuser. For example:
testdb=>
testdb=#

At the prompt, the user may type in SQL commands. Ordinarily, input lines are sent to the server when a command-terminating semicolon is reached. An end of line does not terminate a command. Thus commands can be spread over several lines for clarity. If the command was sent and executed without error, the results of the command are displayed on the screen.

Meta-Commands
Anything you enter in psql that begins with an unquoted backslash is a psql meta-command that is processed by psql itself. These commands help make psql more useful for administration or scripting. Meta-commands are more commonly called slash or backslash commands.

The format of a psql command is the backslash, followed immediately by a command verb, then any arguments. The arguments are separated from the command verb and each other by any number of whitespace characters.

To include whitespace into an argument you may quote it with a single quote. To include a single quote into such an argument, use two single quotes. Anything contained in single quotes is furthermore subject to C-like substitutions for \n (new line), \t (tab), \digits (octal), and \xdigits (hexadecimal).

If an unquoted argument begins with a colon (:), it is taken as a psql variable and the value of the variable is used as the argument instead.

Arguments that are enclosed in backquotes (`) are taken as a command line that is passed to the shell. The output of the command (with any trailing newline removed) is taken as the argument value. The above escape sequences also apply in backquotes.

Some commands take an SQL identifier (such as a table name) as argument. These arguments follow the syntax rules of SQL: Unquoted letters are forced to lowercase, while double quotes (") protect letters from case conversion and allow incorporation of whitespace into the identifier. Within double quotes, paired double quotes reduce to a single double quote in the resulting name. For example, FOO"BAR"BAZ is interpreted as fooBARbaz, and "A weird"" name" becomes A weird" name.

Parsing for arguments stops when another unquoted backslash occurs. This is taken as the beginning of a new meta-command. The special sequence \\ (two backslashes) marks the end of arguments and continues parsing SQL commands, if any. That way SQL and psql commands can be freely mixed on a line. But in any case, the arguments of a meta-command cannot continue beyond the end of the line.

The following meta-commands are defined:
\a
If the current table output format is unaligned, it is switched to aligned. If it is not unaligned, it is set to unaligned. This command is kept for backwards compatibility. See \pset for a more general solution.

\cd [directory]
Changes the current working directory. Without argument, changes to the current user’s home directory. 
To print your current working directory, use \!pwd.

\C [title]
Sets the title of any tables being printed as the result of a query or unset any such title. This command is equivalent to \pset title.

\c | \connect [dbname [username] [host] [port]]
Establishes a new connection. If the new connection is successfully made, the previous connection is closed. If any of dbname, username, host or port are omitted, the value of that parameter from the previous connection is used. If the connection attempt failed, the previous connection will only be kept if psql is in interactive mode. When executing a non-interactive script, processing will immediately stop with an error. This distinction was chosen as a user convenience against typos, and a safety mechanism that scripts are not accidentally acting on the wrong database.

\conninfo
Displays information about the current connection including the database name, the user name, the type of connection (UNIX domain socket, TCP/IP, etc.), the host, and the port.

\copy {table [(column_list)] | (query)}
{from | to} {filename | stdin | stdout | pstdin | pstdout}
[with] [binary] [oids] [delimiter [as] 'character']
[null [as] 'string'] [csv [header]
[quote [as] 'character'] [escape [as] 'character']
[force quote column_list] [force not null column_list]]

Performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

The syntax of the command is similar to that of the SQL COPY command. Note that, because of this, special parsing rules apply to the \copy command. In particular, the variable substitution rules and backslash escapes do not apply.

\copy ... from stdin | to stdout reads/writes based on the command input and output respectively. All rows are read from the same source that issued the command, continuing until \. is read or the stream reaches EOF. Output is sent to the same place as command output. To read/write from psql’s standard input or output, use pstdin or pstdout. This option is useful for populating tables in-line within a SQL script file.

This operation is not as efficient as the SQL COPY command because all data must pass through the client/server connection.

\copyright
Shows the copyright and distribution terms of PostgreSQL on which Greenplum Database is based.

\d [relation_pattern] |
\d+ [relation_pattern] |
\dS [relation_pattern]
For each relation (table, external table, view, index, or sequence) matching the relation pattern, show all columns, their types, the tablespace (if not the default) and any special attributes such as NOT NULL or defaults, if any. Associated indexes, constraints, rules, and triggers are also shown, as is the view definition if the relation is a view.

•The command form \d+ is identical, except that more information is displayed: any comments associated with the columns of the table are shown, as is the presence of OIDs in the table.

•The command form \dS is identical, except that system information is displayed as well as user information.

For example, \dt displays user tables, but not system tables; \dtS displays both user and system tables.Both these commands can take the + parameter to display additional information, as in \dt+ and \dtS+.

If \d is used without a pattern argument, it is equivalent to \dtvs which will show a list of all tables, views, and sequences.

\da [aggregate_pattern]
Lists all available aggregate functions, together with the data types they operate on. If a pattern is specified, only aggregates whose names match the pattern are shown.

\db [tablespace_pattern] | \db+ [tablespace_pattern]
Lists all available tablespaces and their corresponding filespace locations. If pattern is specified, only tablespaces whose names match the pattern are shown. If + is appended to the command name, each object is listed with its associated permissions.

\dc [conversion_pattern]
Lists all available conversions between character-set encodings. If pattern is specified, only conversions whose names match the pattern are listed.

\dC
Lists all available type casts. 

\dd [object_pattern]
Lists all available objects. If pattern is specified, only matching objects are shown.

\dD [domain_pattern]
Lists all available domains. If pattern is specified, only matching domains are shown.

\df [function_pattern] | \df+ [function_pattern ]
Lists available functions, together with their argument and return types. If pattern is specified, only functions whose names match the pattern are shown. If the form \df+ is used, additional information about each function, including language and description, is shown. To reduce clutter, \df does not show data type I/O functions. This is implemented by ignoring functions that accept or return type cstring.

\dg [role_pattern]
Lists all database roles. If pattern is specified, only those roles whose names match the pattern are listed.

\distPvxS [index | sequence | table | parent table | view
| external_table | system_object]
This is not the actual command name: the letters i, s, t, P, v, x, S stand for index, sequence, table, parent table, view, external table, and system table, respectively. You can specify any or all of these letters, in any order, to obtain a listing of all the matching objects. The letter S restricts the listing to system objects; without S, only non-system objects are shown. If + is appended to the command name, each object is listed with its associated description, if any. If a pattern is specified, only objects whose names match the pattern are listed.

\dl
This is an alias for \lo_list, which shows a list of large objects.

\dn [schema_pattern] | \dn+ [schema_pattern]
Lists all available schemas (namespaces). If pattern is specified, only schemas whose names match the pattern are listed. Non-local temporary schemas are suppressed. If + is appended to the command name, each object is listed with its associated permissions and description, if any.

\do [operator_pattern]
Lists available operators with their operand and return types. If pattern is specified, only operators whose names match the pattern are listed.

\dp [relation_pattern_to_show_privileges]
Produces a list of all available tables, views and sequences with their associated access privileges. If pattern is specified, only tables, views and sequences whose names match the pattern are listed. The GRANT and REVOKE commands are used to set access privileges.

\dT [datatype_pattern] | \dT+ [datatype_pattern]
Lists all data types or only those that match pattern. The command form \dT+ shows extra information.

\du [role_pattern]
Lists all database roles, or only those that match pattern.

\e | \edit [filename]
If a file name is specified, the file is edited; after the editor exits, its content is copied back to the query buffer. If no argument is given, the current query buffer is copied to a temporary file which is then edited in the same fashion. The new query buffer is then re-parsed according to the normal rules of psql, where the whole buffer is treated as a single line. (Thus you cannot make scripts this way. Use \i for that.) This means also that if the query ends with (or rather contains) a semicolon, it is immediately executed. In other cases it will merely wait in the query buffer.

psql searches the environment variables PSQL_EDITOR, EDITOR, and VISUAL (in that order) for an editor to use. If all of them are unset, vi is used on UNIX systems, notepad.exe on Windows systems.

\echo text [ ... ]
Prints the arguments to the standard output, separated by one space and followed by a newline. This can be useful to intersperse information in the output of scripts.
If you use the \o command to redirect your query output you may wish to use \qecho instead of this command.

\encoding [encoding]
Sets the client character set encoding. Without an argument, this command shows the current encoding.

\f [field_separator_string]
Sets the field separator for unaligned query output. The default is the vertical bar (|). See also \pset for a generic way of setting output options.

\g [{filename | |command }]
Sends the current query input buffer to the server and optionally stores the query’s output in a file or pipes the output into a separate UNIX shell executing command. A bare \g is virtually equivalent to a semicolon. A \g with argument is a one-shot alternative to the \o command.

\h | \help [sql_command]
Gives syntax help on the specified SQL command. If a command is not specified, then psql will list all the commands for which syntax help is available. Use an asterisk (*) to show syntax help on all SQL commands. To simplify typing, commands that consists of several words do not have to be quoted.

\H
Turns on HTML query output format. If the HTML format is already on, it is switched back to the default aligned text format. This command is for compatibility and convenience, but see \pset about setting other output options.

\i input_filename
Reads input from a file and executes it as though it had been typed on the keyboard. If you want to see the lines on the screen as they are read you must set the variable ECHO to all.

\l | \list | \l+ | \list+
List the names, owners, and character set encodings of all the databases in the server. If + is appended to the command name, database descriptions are also displayed.

\lo_export loid filename
Reads the large object with OID loid from the database and writes it to filename. Note that this is subtly different from the server function lo_export, which acts with the permissions of the user that the database server runs as and on the server’s file system. Use \lo_list to find out the large object’s OID.

\lo_import large_object_filename [comment]
Stores the file into a large object. Optionally, it associates the given comment with the object. Example:
mydb=> \lo_import '/home/gpadmin/pictures/photo.xcf' 'a picture of me'

lo_import 152801
The response indicates that the large object received object ID 152801 which one ought to remember if one wants to access the object ever again. For that reason it is recommended to always associate a human-readable comment with every object.

Those can then be seen with the \lo_list command. Note that this command is subtly different from the server-side lo_import because it acts as the local user on the local file system, rather than the server’s user and file system.

\lo_list
Shows a list of all large objects currently stored in the database, along with any comments provided for them.

\lo_unlink largeobject_oid
Deletes the large object of the specified OID from the database. Use \lo_list to find out the large object’s OID.

\o [ {query_result_filename | |command} ]
Saves future query results to a file or pipes them into a UNIX shell command. If no arguments are specified, the query output will be reset to the standard output. Query results include all tables, command responses, and notices obtained from the database server, as well as output of various backslash commands that query the database (such as \d), but not error messages. To intersperse text output in between query results, use \qecho.

\p
Print the current query buffer to the standard output.

\password [username]
Changes the password of the specified user (by default, the current user). This command prompts for the new password, encrypts it, and sends it to the server as an ALTER ROLE command. This makes sure that the new password does not appear in cleartext in the command history, the server log, or elsewhere.

\prompt [ text ] name
Prompts the user to set a variable name. Optionally, you can specify a prompt. Enclose prompts longer than one word in single quotes.
By default, \prompt uses the terminal for input and output. However, use the -f command line switch to specify standard input and standard output.

\pset print_option [value]
This command sets options affecting the output of query result tables. print_option describes which option is to be set. Adjustable printing options are:
•format – Sets the output format to one of unaligned, aligned, html, latex, troff-ms, or wrapped. First letter abbreviations are allowed. Unaligned writes all columns of a row on a line, separated by the currently active field separator. This is intended to create output that might be intended to be read in by other programs. Aligned mode is the standard, human-readable, nicely formatted text output that is default. The HTML and LaTeX modes put out tables that are intended to be included in documents using the respective mark-up language. They are not complete documents! (This might not be so dramatic in HTML, but in LaTeX you must have a complete document wrapper.)

The wrapped option sets the output format like the aligned parameter , but wraps wide data values across lines to make the output fit in the target column width. The target width is set with the columns option. To specify the column width and select the wrapped format, use two \pset commands; for example, to set the with to 72 columns and specify wrapped format, use the commands \pset columns 72 and then \pset format wrapped.

Note: Since psql does not attempt to wrap column header titles, the wrapped format behaves the same as aligned if the total width needed for column headers exceeds the target.

•border – The second argument must be a number. In general, the higher the number the more borders and lines the tables will have, but this depends on the particular format. In HTML mode, this will translate directly into the border=... attribute, in the others only values 0 (no border), 1 (internal dividing lines), and 2 (table frame) make sense.

•columns – Sets the target width for the wrapped format, and also the width limit for determining whether output is wide enough to require the pager. The default is zero. Zero causes the target width to be controlled by the environment variable COLUMNS, or the detected screen width if COLUMNS is not set. In addition, if columns is zero then the wrapped format affects screen output only. If columns is nonzero then file and pipe output is wrapped to that width as well.

After setting the target width, use the command \pset format wrapped to enable the wrapped format.

•expanded | x) – Toggles between regular and expanded format. When expanded format is enabled, query results are displayed in two columns, with the column name on the left and the data on the right. This mode is useful if the data would not fit on the screen in the normal horizontal mode. Expanded mode is supported by all four output formats.

•linestyle [unicode | ascii | old-ascii] – Sets the border line drawing style to one of unicode, ascii, or old-ascii. Unique abbreviations, including one letter, are allowed for the three styles. The default setting is ascii. This option only affects the aligned and wrapped output formats.
ascii – uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin. When the wrapped format wraps data from one line to the next without a newline character, a dot (.) is shown in the right-hand margin of the first line, and again in the left-hand margin of the following line.
old-ascii – style uses plain ASCII characters, using the formatting style used in PostgreSQL 8.4 and earlier. Newlines in data are shown using a : symbol in place of the left-hand column separator. When the data is wrapped from one line to the next without a newline character, a ; symbol is used in place of the left-hand column separator.
unicode – style uses Unicode box-drawing characters. Newlines in data are shown using a carriage return symbol in the right-hand margin. When the data is wrapped from one line to the next without a newline character, an ellipsis symbol is shown in the right-hand margin of the first line, and again in the left-hand margin of the following line.

When the border setting is greater than zero, this option also determines the characters with which the border lines are drawn. Plain ASCII characters work everywhere, but Unicode characters look nicer on displays that recognize them.

•null 'string' – The second argument is a string to print whenever a column is null. The default is not to print anything, which can easily be mistaken for an empty string. For example, the command \pset null '(empty)' displays (empty) in null columns.

•fieldsep – Specifies the field separator to be used in unaligned output mode. That way one can create, for example, tab- or comma-separated output, which other programs might prefer. To set a tab as field separator, type \pset fieldsep '\t'. The default field separator is '|' (a vertical bar).

•footer – Toggles the display of the default footer (x rows).

•numericlocale – Toggles the display of a locale-aware character to separate groups of digits to the left of the decimal marker. It also enables a locale-aware decimal marker.

•recordsep – Specifies the record (line) separator to use in unaligned output mode. The default is a newline character.

•title [text] – Sets the table title for any subsequently printed tables. This can be used to give your output descriptive tags. If no argument is given, the title is unset.

•tableattr | T [text] – Allows you to specify any attributes to be placed inside the HTML table tag. This could for example be cellpadding or bgcolor. Note that you probably don’t want to specify border here, as that is already taken care of by \pset border.

•tuples_only | t [no value | on | off]– The \pset tuples_only command by itselt toggles between tuples only and full display. The values on and off set the tuples display, regardless of the current setting. Full display may show extra information such as column headers, titles, and various footers. In tuples only mode, only actual table data is shown The \t command is equivalent to \pset tuples_only and is provided for convenience.

•pager – Controls the use of a pager for query and psql help output. When on, if the environment variable PAGER is set, the output is piped to the specified program. Otherwise a platform-dependent default (such as more) is used. When off, the pager is not used. When on, the pager is used only when appropriate. Pager can also be set to always, which causes the pager to be always used.

\q
Quits the psql program.

\qecho text [ ... ]
This command is identical to \echo except that the output will be written to the query output channel, as set by \o.

\r
Resets (clears) the query buffer.

\s [history_filename]
Print or save the command line history to filename. If filename is omitted, the history is written to the standard output.

\set [name [value [ ... ]]]
Sets the internal variable name to value or, if more than one value is given, to the concatenation of all of them. If no second argument is given, the variable is just set with no value. To unset a variable, use the \unset command.

Valid variable names can contain characters, digits, and underscores. See “Variables” on page 838. Variable names are case-sensitive.

Although you are welcome to set any variable to anything you want, psql treats several variables as special. They are documented in the section about variables.
This command is totally separate from the SQL command SET.

\t [no value | on | off]
The \t command by itself toggles a display of output column name headings and row count footer. The values on and off set the tuples display, regardless of the current setting.This command is equivalent to \pset tuples_only and is provided for convenience.

\T table_options
Allows you to specify attributes to be placed within the table tag in HTML tabular output mode.

\timing [no value | on | off]
The \timing command by itself toggles a display of how long each SQL statement takes, in milliseconds. The values on and off set the time display, regardless of the current setting.

\w {filename | |command}
Outputs the current query buffer to a file or pipes it to a UNIX command.

\x
Toggles expanded table formatting mode.

\z [relation_to_show_privileges]
Produces a list of all available tables, views and sequences with their associated access privileges. If a pattern is specified, only tables, views and sequences whose names match the pattern are listed. This is an alias for \dp.

\! [command]
Escapes to a separate UNIX shell or executes the UNIX command. The arguments are not further interpreted, the shell will see them as is.

\?
Shows help information about the psql backslash commands.

Patterns
The various \d commands accept a pattern parameter to specify the object name(s) to be displayed. In the simplest case, a pattern is just the exact name of the object. The characters within a pattern are normally folded to lower case, just as in SQL names; for example, \dt FOO will display the table named foo. As in SQL names, placing double quotes around a pattern stops folding to lower case. Should you need to include an actual double quote character in a pattern, write it as a pair of double quotes within a double-quote sequence; again this is in accord with the rules for SQL quoted identifiers. For example, \dt "FOO""BAR" will display the table named FOO"BAR (not foo"bar). Unlike the normal rules for SQL names, you can put double quotes around just part of a pattern, for instance \dt FOO"FOO"BAR will display the table named fooFOObar.

Within a pattern, * matches any sequence of characters (including no characters) and ? matches any single character. (This notation is comparable to UNIX shell file name patterns.) For example, \dt int* displays all tables whose names begin with int. But within double quotes, * and ? lose these special meanings and are just matched literally.

A pattern that contains a dot (.) is interpreted as a schema name pattern followed by an object name pattern. For example, \dt foo*.bar* displays all tables whose table name starts with bar that are in schemas whose schema name starts with foo. When no dot appears, then the pattern matches only objects that are visible in the current schema search path. Again, a dot within double quotes loses its special meaning and is matched literally.

Advanced users can use regular-expression notations. All regular expression special characters work as specified in the PostgreSQL documentation on regular expressions, except for . which is taken as a separator as mentioned above, * which is translated to the regular-expression notation .*, and ? which is translated to .. You can emulate these pattern characters at need by writing ? for ., (R+|) for R*, or (R|) for R?. Remember that the pattern must match the whole name, unlike the usual interpretation of regular expressions; write * at the beginning and/or end if you don’t wish the pattern to be anchored. Note that within double quotes, all regular expression special characters lose their special meanings and are matched literally. Also, the regular expression special characters are matched literally in operator name patterns (such as the argument of \do).

Whenever the pattern parameter is omitted completely, the \d commands display all objects that are visible in the current schema search path – this is equivalent to using the pattern *. To see all objects in the database, use the pattern *.*.

Advanced Features
Variables
psql provides variable substitution features similar to common UNIX command shells. Variables are simply name/value pairs, where the value can be any string of any length. To set variables, use the psql meta-command \set:
testdb=> \set foo bar
sets the variable foo to the value bar. To retrieve the content of the variable, precede the name with a colon and use it as the argument of any slash command:

testdb=> \echo :foo
bar

Note: The arguments of \set are subject to the same substitution rules as with other commands. Thus you can construct interesting references such as \set :foo 'something' and get ‘soft links’ or ‘variable variables’ of Perl or PHP fame, respectively. Unfortunately, there is no way to do anything useful with these constructs. On the other hand, \set bar :foo is a perfectly valid way to copy a variable.

If you call \set without a second argument, the variable is set, with an empty string as value. To unset (or delete) a variable, use the command \unset.

psql’s internal variable names can consist of letters, numbers, and underscores in any order and any number of them. A number of these variables are treated specially by psql. They indicate certain option settings that can be changed at run time by altering the value of the variable or represent some state of the application. Although you can use these variables for any other purpose, this is not recommended, as the program behavior might behave unexpectedly. By convention, all specially treated variables consist of all upper-case letters (and possibly numbers and underscores). To ensure maximum compatibility in the future, avoid using such variable names for your own purposes. A list of all specially treated variables are as follows:

AUTOCOMMIT
When on (the default), each SQL command is automatically committed upon successful completion. To postpone commit in this mode, you must enter a BEGIN or START TRANSACTION SQL command. When off or unset, SQL commands are not committed until you explicitly issue COMMIT or END. The autocommit-on mode works by issuing an implicit BEGIN for you, just before any command that is not already in a transaction block and is not itself a BEGIN or other transaction-control command, nor a command that cannot be executed inside a transaction block (such as VACUUM).

In autocommit-off mode, you must explicitly abandon any failed transaction by entering ABORT or ROLLBACK. Also keep in mind that if you exit the session without committing, your work will be lost.

The autocommit-on mode is PostgreSQL’s traditional behavior, but autocommit-off is closer to the SQL spec. If you prefer autocommit-off, you may wish to set it in your ~/.psqlrc file.

DBNAME
The name of the database you are currently connected to. This is set every time you connect to a database (including program start-up), but can be unset.

ECHO
If set to all, all lines entered from the keyboard or from a script are written to the standard output before they are parsed or executed. To select this behavior on program start-up, use the switch -a. If set to queries, psql merely prints all queries as they are sent to the server. The switch for this is -e.

ECHO_HIDDEN
When this variable is set and a backslash command queries the database, the query is first shown. This way you can study the Greenplum Database internals and provide similar functionality in your own programs. (To select this behavior on program start-up, use the switch -E.) If you set the variable to the value noexec, the queries are just shown but are not actually sent to the server and executed.

ENCODING
The current client character set encoding.

FETCH_COUNT
If this variable is set to an integer value > 0, the results of SELECT queries are fetched and displayed in groups of that many rows, rather than the default behavior of collecting the entire result set before display. Therefore only a limited amount of memory is used, regardless of the size of the result set. Settings of 100 to 1000 are commonly used when enabling this feature. Keep in mind that when using this feature, a query may fail after having already displayed some rows.

Although you can use any output format with this feature, the default aligned format tends to look bad because each group of FETCH_COUNT rows will be formatted separately, leading to varying column widths across the row groups. The other output formats work better.

HISTCONTROL
If this variable is set to ignorespace, lines which begin with a space are not entered into the history list. If set to a value of ignoredups, lines matching the previous history line are not entered. A value of ignoreboth combines the two options. If unset, or if set to any other value than those above, all lines read in interactive mode are saved on the history list.

HISTFILE
The file name that will be used to store the history list. The default value is ~/.psql_history. For example, putting
\set HISTFILE ~/.psql_history- :DBNAME
in ~/.psqlrc will cause psql to maintain a separate history for each database.

HISTSIZE
The number of commands to store in the command history. The default value is 500.

HOST
The database server host you are currently connected to. This is set every time you connect to a database (including program start-up), but can be unset.

IGNOREEOF
If unset, sending an EOF character (usually CTRL+D) to an interactive session of psql will terminate the application. If set to a numeric value, that many EOF characters are ignored before the application terminates. If the variable is set but has no numeric value, the default is 10.

LASTOID
The value of the last affected OID, as returned from an INSERT or lo_insert command. This variable is only guaranteed to be valid until after the result of the next SQL command has been displayed.

ON_ERROR_ROLLBACK
When on, if a statement in a transaction block generates an error, the error is ignored and the transaction continues. When interactive, such errors are only ignored in interactive sessions, and not when reading script files. When off (the default), a statement in a transaction block that generates an error aborts the entire transaction. The on_error_rollback-on mode works by issuing an implicit SAVEPOINT for you, just before each command that is in a transaction block, and rolls back to the savepoint on error.

ON_ERROR_STOP
By default, if non-interactive scripts encounter an error, such as a malformed SQL command or internal meta-command, processing continues. This has been the traditional behavior of psql but it is sometimes not desirable. If this variable is set, script processing will immediately terminate. If the script was called from another script it will terminate in the same fashion. If the outermost script was not called from an interactive psql session but rather using the -f option, psql will return error code 3, to distinguish this case from fatal error conditions (error code 1).

PORT
The database server port to which you are currently connected. This is set every time you connect to a database (including program start-up), but can be unset.

PROMPT1
PROMPT2
PROMPT3
These specify what the prompts psql issues should look like. 

QUIET
This variable is equivalent to the command line option -q. It is not very useful in interactive mode.

SINGLELINE
This variable is equivalent to the command line option -S.

SINGLESTEP
This variable is equivalent to the command line option -s.

USER
The database user you are currently connected as. This is set every time you connect to a database (including program start-up), but can be unset.

VERBOSITY
This variable can be set to the values default, verbose, or terse to control the verbosity of error reports.

SQL Interpolation
An additional useful feature of psql variables is that you can substitute (interpolate) them into regular SQL statements. The syntax for this is again to prepend the variable name with a colon (:).
testdb=> \set foo 'my_table'
testdb=> SELECT * FROM :foo;

would then query the table my_table. The value of the variable is copied literally, so it can even contain unbalanced quotes or backslash commands. You must make sure that it makes sense where you put it. Variable interpolation will not be performed into quoted SQL entities.

A popular application of this facility is to refer to the last inserted OID in subsequent statements to build a foreign key scenario. Another possible use of this mechanism is to copy the contents of a file into a table column. First load the file into a variable and then proceed as above.

testdb=> \set content '''' `cat my_file.txt` ''''
testdb=> INSERT INTO my_table VALUES (:content);

One problem with this approach is that my_file.txt might contain single quotes. These need to be escaped so that they don’t cause a syntax error when the second line is processed. This could be done with the program sed:

testdb=> \set content '''' `sed -e "s/'/''/g" < my_file.txt` ''''

If you are using non-standard-conforming strings then you’ll also need to double backslashes. This is a bit tricky:

testdb=> \set content '''' `sed -e "s/'/''/g" -e 's/\\/\\\\/g' < my_file.txt` ''''

Note the use of different shell quoting conventions so that neither the single quote marks nor the backslashes are special to the shell. Backslashes are still special to sed, however, so we need to double them.

Since colons may legally appear in SQL commands, the following rule applies: the character sequence ":name" is not changed unless "name" is the name of a variable that is currently set. In any case you can escape a colon with a backslash to protect it from substitution. (The colon syntax for variables is standard SQL for embedded query languages, such as ECPG. The colon syntax for array slices and type casts are Greenplum Database extensions, hence the conflict.)

Prompting
The prompts psql issues can be customized to your preference. The three variables PROMPT1, PROMPT2, and PROMPT3 contain strings and special escape sequences that describe the appearance of the prompt. Prompt 1 is the normal prompt that is issued when psql requests a new command. Prompt 2 is issued when more input is expected during command input because the command was not terminated with a semicolon or a quote was not closed. Prompt 3 is issued when you run an SQL COPY command and you are expected to type in the row values on the terminal.

The value of the selected prompt variable is printed literally, except where a percent sign (%) is encountered. Depending on the next character, certain other text is substituted instead. Defined substitutions are:

%M
The full host name (with domain name) of the database server, or [local] if the connection is over a UNIX domain socket, or [local:/dir/name], if the UNIX domain socket is not at the compiled in default location.

%m
The host name of the database server, truncated at the first dot, or [local] if the connection is over a UNIX domain socket.

%>
The port number at which the database server is listening.

%n
The database session user name. (The expansion of this value might change during a database session as the result of the command SET SESSION AUTHORIZATION.)

%/
The name of the current database.

%~
Like %/, but the output is ~ (tilde) if the database is your default database.

%#
If the session user is a database superuser, then a #, otherwise a >. (The expansion of this value might change during a database session as the result of the command SET SESSION AUTHORIZATION.)

%R
In prompt 1 normally =, but ^ if in single-line mode, and ! if the session is disconnected from the database (which can happen if \connect fails). In prompt 2 the sequence is replaced by -, *, a single quote, a double quote, or a dollar sign, depending on whether psql expects more input because the command wasn’t terminated yet, because you are inside a /* ... */ comment, or because you are inside a quoted or dollar-escaped string. In prompt 3 the sequence doesn’t produce anything.

%x
Transaction status: an empty string when not in a transaction block, or * when in a transaction block, or ! when in a failed transaction block, or ? when the transaction state is indeterminate (for example, because there is no connection).

%digits
The character with the indicated octal code is substituted.

%:name:
The value of the psql variable name. See “Variables” on page 838 for details.

%`command`
The output of command, similar to ordinary back-tick substitution.

%[ ... %]
Prompts may contain terminal control characters which, for example, change the color, background, or style of the prompt text, or change the title of the terminal window. In order for line editing to work properly, these non-printing control characters must be designated as invisible by surrounding them with %[ and %]. Multiple pairs of these may occur within the prompt. For example,
testdb=> \set PROMPT1 '%[%033[1;33;40m%]%n@%/%R%[%033[0m%]%# '
results in a boldfaced (1;) yellow-on-black (33;40) prompt on VT100-compatible, color-capable terminals. To insert a percent sign into your prompt, write %%. The default prompts are '%/%R%# ' for prompts 1 and 2, and '>> ' for prompt 3.

Command-Line Editing
psql supports the NetBSD libedit library for convenient line editing and retrieval. The command history is automatically saved when psql exits and is reloaded when psql starts up. Tab-completion is also supported, although the completion logic makes no claim to be an SQL parser. If for some reason you do not like the tab completion, you can turn it off by putting this in a file named .inputrc in your home directory:

$if psql
set disable-completion on

$endif

Environment
PAGER
If the query results do not fit on the screen, they are piped through this command. Typical values are more or less. The default is platform-dependent. The use of the pager can be disabled by using the \pset command. 
PGDATABASE
PGHOST
PGPORT
PGUSER

Default connection parameters.

PSQL_EDITOR
EDITOR
VISUAL
Editor used by the \e command. The variables are examined in the order listed; the first that is set is used.

SHELL
Command executed by the \! command.

TMPDIR
Directory for storing temporary files. The default is /tmp.

Files
Before starting up, psql attempts to read and execute commands from the user’s ~/.psqlrc file.
The command-line history is stored in the file ~/.psql_history.

Notes
psql only works smoothly with servers of the same version. That does not mean other combinations will fail outright, but subtle and not-so-subtle problems might come up. Backslash commands are particularly likely to fail if the server is of a different version.

Notes for Windows users
psql is built as a console application. Since the Windows console windows use a different encoding than the rest of the system, you must take special care when using 8-bit characters within psql. If psql detects a problematic console code page, it will warn you at startup. To change the console code page, two things are necessary:

Set the code page by entering cmd.exe /c chcp 1252. (1252 is a character encoding of the Latin alphabet, used by Microsoft Windows for English and some other Western languages.) If you are using Cygwin, you can put this command in /etc/profile.

Set the console font to Lucida Console, because the raster font does not work with the ANSI code page.

Examples
Start psql in interactive mode:
psql -p 54321 -U sally mydatabase

In psql interactive mode, spread a command over several lines of input. Notice the changing prompt:

testdb=> CREATE TABLE my_table (
testdb(> first integer not null default 0,
testdb(> second text)
testdb-> ;

CREATE TABLE
Look at the table definition:
testdb=> \d my_table
Table "my_table"
Attribute | Type | Modifier
-----------+---------+--------------------
first | integer | not null default 0
second | text |

Run psql in non-interactive mode by passing in a file containing SQL commands:
psql -f /home/gpadmin/test/myscript.sql

Common psql meta-commands

\l List all databases in the system.
\c <database_name> Connect to the specified database.
\dn List all schemas in the current database.
\dt List all user-created tables in the current database.
\dtS List all system catalog tables.
\d+ <object_name> Show the definition of the specified database object (table, index, etc.).
\du List all users (roles) in the system.

Creating a Table in Greenplum

posted Apr 29, 2017, 1:20 AM by Sachchida Ojha

Note
: Greenplum temp tables (created with syntax create temporary table …), they cannot be pre-created and also cannot be assigned to a specific schema. They are created dynamically within a session and get dropped when the session terminates. Temporary tables are beneficial when multiple sessions need to work in parallel with their own versions (data, definition or both) of the same temp table. Greenplum will internally ensure that these are stored, accessed and dropped separately. So, every session/connection can have their own implementation of a temp table named T1. We can create permanent tables in specific schemas and use them like temp tables, but the data content will be shared by all sessions. Depending on the application’s need, you should use either of the implementations.
CREATE TABLE
Defines a new table.
CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name (
[ { column_name data_type [ DEFAULT default_expr ] [column_constraint [ ... ]
[ ENCODING ( storage_directive [,...] ) ]
]
 | table_constraint
 | LIKE other_table [{INCLUDING | EXCLUDING}
 {DEFAULTS | CONSTRAINTS}] ...}
 [, ... ] ]
 [column_reference_storage_directive [, …] ]
 )
 [ INHERITS ( parent_table [, ... ] ) ]
 [ WITH ( storage_parameter=value [, ... ] )
 [ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ]
 [ TABLESPACE tablespace ]
 [ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
 [ PARTITION BY partition_type (column)
 [ SUBPARTITION BY partition_type (column) ]
 [ SUBPARTITION TEMPLATE ( template_spec ) ]
 [...]
 ( partition_spec )
 | [ SUBPARTITION BY partition_type (column) ]
 [...]
 ( partition_spec
 [ ( subpartition_spec
 [(...)]
 ) ]
 )
where storage_parameter is:
 APPENDONLY={TRUE|FALSE}
 BLOCKSIZE={8192-2097152}
 ORIENTATION={COLUMN|ROW}
 COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
 COMPRESSLEVEL={0-9}
 FILLFACTOR={10-100}
 OIDS[=TRUE|FALSE]
where column_constraint is:
 [CONSTRAINT constraint_name]
 NOT NULL | NULL
 | UNIQUE [USING INDEX TABLESPACE tablespace]
 [WITH ( FILLFACTOR = value )]
 | PRIMARY KEY [USING INDEX TABLESPACE tablespace]
 [WITH ( FILLFACTOR = value )]
 | CHECK ( expression )
and table_constraint is:
 [CONSTRAINT constraint_name]
 UNIQUE ( column_name [, ... ] )
 [USING INDEX TABLESPACE tablespace]
 [WITH ( FILLFACTOR=value )]
 | PRIMARY KEY ( column_name [, ... ] )
 [USING INDEX TABLESPACE tablespace]
 [WITH ( FILLFACTOR=value )]
 | CHECK ( expression )
where partition_type is:
LIST
 | RANGE
where partition_specification is:
partition_element [, ...]
and partition_element is:
 DEFAULT PARTITION name
 | [PARTITION name] VALUES (list_value [,...] )
 | [PARTITION name]
 START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
 [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
 | [PARTITION name]
 END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[column_reference_storage_directive [, …] ]
[ TABLESPACE tablespace ]
where subpartition_spec or template_spec is:
subpartition_element [, ...]
and subpartition_element is:
 DEFAULT SUBPARTITION name
 | [SUBPARTITION name] VALUES (list_value [,...] )
 | [SUBPARTITION name]
 START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
 [ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
 | [SUBPARTITION name]
 END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
 [ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[column_reference_storage_directive [, …] ]
[ TABLESPACE tablespace ]
where storage_parameter is:
 APPENDONLY={TRUE|FALSE}
 BLOCKSIZE={8192-2097152}
 ORIENTATION={COLUMN|ROW}
 COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
 COMPRESSLEVEL={0-9}
 FILLFACTOR={10-100}
 OIDS[=TRUE|FALSE]
where storage_directive is:
 COMPRESSTYPE={ZLIB | QUICKLZ | RLE_TYPE | NONE}
 | COMPRESSLEVEL={0-9}
 | BLOCKSIZE={8192-2097152}
Where column_reference_storage_directive is:
COLUMN column_name ENCODING (storage_directive [, ... ] ), ...
 |
DEFAULT COLUMN ENCODING (storage_directive [, ... ] )
CREATE TABLE AS
Defines a new table from the results of a query.
CREATE [ [GLOBAL | LOCAL] {TEMPORARY | TEMP} ] TABLE table_name
[(column_name [, ...] )]
[ WITH ( storage_parameter=value [, ... ] ) ]
[ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP}]
[TABLESPACE tablespace]
AS query
[DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY]
where storage_parameter is:
APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
COMPRESSTYPE={ZLIB|QUICKLZ}
COMPRESSLEVEL={1-9 | 1}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]

ALTER TABLE
Changes the definition of a table.
Synopsis
ALTER TABLE [ONLY] name RENAME [COLUMN] column TO new_column
ALTER TABLE name RENAME TO new_name
ALTER TABLE name SET SCHEMA new_schema
ALTER TABLE [ONLY] name SET 
DISTRIBUTED BY (column, [ ... ] ) 
| DISTRIBUTED RANDOMLY 
| WITH (REORGANIZE=true|false) 
ALTER TABLE [ONLY] name action [, ... ]
ALTER TABLE name
[ ALTER PARTITION { partition_name | FOR (RANK(number)) 
| FOR (value) } partition_action [...] ] 
partition_action
where action is one of:
ADD [COLUMN] column_name type
[column_constraint [ ... ]]
DROP [COLUMN] column [RESTRICT | CASCADE]
ALTER [COLUMN] column TYPE type [USING expression]
ALTER [COLUMN] column SET DEFAULT expression
ALTER [COLUMN] column DROP DEFAULT
ALTER [COLUMN] column { SET | DROP } NOT NULL
ALTER [COLUMN] column SET STATISTICS integer
ADD table_constraint
DROP CONSTRAINT constraint_name [RESTRICT | CASCADE]
DISABLE TRIGGER [trigger_name | ALL | USER]
ENABLE TRIGGER [trigger_name | ALL | USER]
CLUSTER ON index_name
SET WITHOUT CLUSTER
SET WITHOUT OIDS
SET (FILLFACTOR = value)
RESET (FILLFACTOR)
INHERIT parent_table
NO INHERIT parent_table
OWNER TO new_owner
SET TABLESPACE new_tablespace
where partition_action is one of:
ALTER DEFAULT PARTITION
DROP DEFAULT PARTITION [IF EXISTS]
DROP PARTITION [IF EXISTS] { partition_name | 
FOR (RANK(number)) | FOR (value) } [CASCADE]
TRUNCATE DEFAULT PARTITION
TRUNCATE PARTITION { partition_name | FOR (RANK(number)) | 
FOR (value) }
RENAME DEFAULT PARTITION TO new_partition_name
RENAME PARTITION { partition_name | FOR (RANK(number)) | 
FOR (value) } TO new_partition_name
ADD DEFAULT PARTITION name [ ( subpartition_spec ) ]
ADD PARTITION [name] partition_element
[ ( subpartition_spec ) ]
EXCHANGE PARTITION { partition_name | FOR (RANK(number)) | 
FOR (value) } WITH TABLE table_name
[ WITH | WITHOUT VALIDATION ]
EXCHANGE DEFAULT PARTITION WITH TABLE table_name
[ WITH | WITHOUT VALIDATION ]
SET SUBPARTITION TEMPLATE (subpartition_spec)
SPLIT DEFAULT PARTITION
{ AT (list_value)
| START([datatype] range_value) [INCLUSIVE | EXCLUSIVE] 
END([datatype] range_value) [INCLUSIVE | EXCLUSIVE] }
[ INTO ( PARTITION new_partition_name, 
PARTITION default_partition_name ) ]
SPLIT PARTITION { partition_name | FOR (RANK(number)) | 
FOR (value) } AT (value) 
[ INTO (PARTITION partition_name, PARTITION 
partition_name)]
where partition_element is:
VALUES (list_value [,...] )
| START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
[ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
| END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[ TABLESPACE tablespace ]
where subpartition_spec is:
subpartition_element [, ...]
and subpartition_element is:
DEFAULT SUBPARTITION subpartition_name
| [SUBPARTITION subpartition_name] VALUES (list_value [,...] )
| [SUBPARTITION subpartition_name] 
START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
[ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
[ EVERY ( [number | datatype] 'interval_value') ]
| [SUBPARTITION subpartition_name] 
END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
[ EVERY ( [number | datatype] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[ TABLESPACE tablespace ]
where storage_parameter is:
APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
COMPRESSTYPE={ZLIB|QUICKLZ|NONE}
COMPRESSLEVEL={0-9}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]

The gadget spec URL could not be found

The gadget spec URL could not be found

The gadget spec URL could not be found

The gadget spec URL could not be found

The gadget spec URL could not be found

The gadget spec URL could not be found

Removes a table
DROP TABLE [IF EXISTS] name [, ...] [CASCADE | RESTRICT]

gpdbrestore

posted Apr 29, 2017, 1:18 AM by Sachchida Ojha

[gpadmin@sachi ~]$ gpdbrestore --help
COMMAND NAME: gpdbrestore
A wrapper utility around gp_restore. Restores a database from a set of dump files generated by gpcrondump.
*****************************************************
SYNOPSIS
*****************************************************
gpdbrestore { -t <timestamp_key> [-L] 
| -b YYYYMMDD 
| -R <hostname>:<path_to_dumpset> 
| -s <database_name> } 
[-T <schema>.<table> [,...]] [-e] [-G] [-B <parallel_processes>] 
[-d <master_data_directory>] [-a] [-q] [-l <logfile_directory>] 
[-v] [-ddboost]

gpdbrestore -? 
gpdbrestore --version

*****************************************************
DESCRIPTION
*****************************************************
gpdbrestore is a wrapper around gp_restore, which provides some convenience and flexibility in restoring from a set of backup files created by gpcrondump. This utility provides the following additional functionality on top of gp_restore:

* Automatically reconfigures for compression. 
* Validates the number of dump files are correct (For primary only, mirror only, primary and mirror, or a subset consisting of some mirror and primary segment dump files). 
* If a failed segment is detected, restores to active segment instances.
* Do not need to know the complete timestamp key (-t) of the backup set to restore. Additional options are provided to instead give just a date (-b), backup set directory location (-R), or database name (-s) to restore.
* The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). Ensures that the correct dump file goes to the correct segment instance.
* Identifies the database name automatically from the backup set.
* Allows you to restore particular tables only (-T option) instead of the entire database. Note that single tables are not automatically dropped or truncated prior to restore.
* Can restore global objects such as roles and tablespaces (-G option).
* Detects if the backup set is primary segments only or primary and mirror segments and passes the appropriate options to gp_restore.
* Allows you to drop the target database before a restore in a single operation. 

Error Reporting

gpdbrestore does not report errors automatically. After the restore is completed, check the report status files to verify that there are no errors. The restore status files are stored in the db_dumps/<date>/ directory by default. 

*****************************************************
OPTIONS
*****************************************************

-a (do not prompt)
Do not prompt the user for confirmation.
-b YYYYMMDD
Looks for dump files in the segment data directories on the Greenplum Database array of hosts in db_dumps/YYYYMMDD.
If --ddboost is specified, the system looks for dump files on the DD Boost host. 

-B <parallel_processes>
The number of segments to check in parallel for pre/post-restore validation. If not specified, the utility will start up to 60 parallel processes depending on how many segment instances it needs to restore.

-d <master_data_directory>
Optional. The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.

--ddboost
Use Data Domain DD Boost for this restore, if the --ddboost option was passed when the data was dumped. Before make sure the one-time DD Boost credential setup is completed.

-e (drop target database before restore)
Drops the target database before doing the restore and then recreates it.

-G (restore global objects)

Restores global objects such as roles and tablespaces if the global object dump file db_dumps/<date>/gp_global_1_1_<timestamp> is found in the master data directory.

-l <logfile_directory>
The directory to write the log file. Defaults to ~/gpAdminLogs.

-L (list tablenames in backup set)
When used with the -t option, lists the table names that exist in the named backup set and exits. Does not do a restore.

-q (no screen output)
Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.

-R <hostname>:<path_to_dumpset>
Allows you to provide a hostname and full path to a set of dump files. The host does not have to be in the Greenplum Database array of hosts, but must be accessible from the Greenplum master.

-s <database_name>
Looks for latest set of dump files for the given database name in the segment data directories db_dumps directory on the Greenplum Database array of hosts.

-t <timestamp_key>
The 14 digit timestamp key that uniquely identifies a backup set of data to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files matching this timestamp key in the segment data directories db_dumps directory on the Greenplum Database array of hosts.


-T <schema>.<table_name>
A comma-separated list of specific table names to restore. The named table(s) must exist in the backup set of the database being restored. Existing tables are not automatically truncated before data is restored from backup. If your intention is to replace existing data in the table from backup, truncate the table prior 
to running gpdbrestore -T.

-v | --verbose
Specifies verbose mode.

--version (show utility version)
Displays the version of this utility.

-? (help)
Displays the online help.

gpcrondump, gp_restore
[gpadmin@sachi ~]$

Restoring to a Different Greenplum System Configuration

posted Apr 29, 2017, 1:17 AM by Sachchida Ojha

In order to do a parallel restore operation using gp_restore or gpdbrestore, the system you are restoring to must be the same configuration as the system that was backed up. If you want to restore your database objects and data into a different system configuration (for example, if you are expanding to a system with more segments), you can still use your parallel backup files and restore them by loading them through the Greenplum master. To do a non-parallel restore, you must have:

1.A complete backup set created by a gp_dump or gpcrondump operation. The backup file of the master contains the DDL to recreate your database objects. The backup files of the segments contain the data.

2.A Greenplum Database system up and running.

3.The database you are restoring to is created in the system.

If you look at the contents of a segment dump file, it simply contains a COPY command for each table followed by the data in delimited text format. If you collect all of the dump files for all of the segment instances and run them through the master, you will have restored all of your data and redistributed it across the new system configuration.

To restore a database to a different system configuration

1.First make sure you have a complete backup set. This includes a dump file of the master (gp_dump_1_1_<timestamp>) and one for each segment instance (gp_dump_0_2_<timestamp>, gp_dump_0_3_<timestamp>,gp_dump_0_4_<timestamp>, and so on). 

The individual dump files should all have the same timestamp key. By default, gp_dump creates the dump files in each segment instance’s data directory, so you will need to collect all of the dump files and move them to a place on the master host. If you do not have a lot of disk space on the master, you can copy each segment dump file to the master, load it, and then delete it once it has loaded successfully.

2.Make sure the database you are restoring to has been created in the system. For example:

$ createdb database_name

3.Load the master dump file to restore the database objects. For example:

$ psql database_name -f /gpdb/backups/gp_dump_1_1_20080714

4.Load each segment dump file to restore the data. For example:

$ psql database_name -f /gpdb/backups/gp_dump_0_2_20080714
$ psql database_name -f /gpdb/backups/gp_dump_0_3_20080714
$ psql database_name -f /gpdb/backups/gp_dump_0_4_20080714
$ psql database_name -f /gpdb/backups/gp_dump_0_5_20080714
...

Frequently used unix commands by Greenplum DBA's

posted Apr 29, 2017, 1:16 AM by Sachchida Ojha

Script argument: $1 is the first argument, $2 is the second argument, and so on. The variable $0 is the script's name. The total number of arguments is stored in $#. The variables $@ and $* return all the arguments
1. route - show / manipulate the IP routing table
see also:  ip route
2. nslookup
3. ifconfig -a
4. hostname
5. ping
6. ethtool eth0
7. netstat -rn
8. top
9. vmstat
10. w

--Find out total space used by primary segment databases (excluding log files and local backup files)
[gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h --exclude=*pg_log* --exclude=*db_dump* -s /data[12]/primary/gpseg*"

--Change owner of all tables in Public schema
for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname = 'public';" sachi` ; do psql -c "alter table $tbl owner to gpadmin" sachi ; done

--Move all tables from Public Schema to a specified schema.

for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname='public';" sachi`; do `psql -c "ALTER TABLE $tbl SET SCHEMA sachi;" sachi`; done

DATABASES=`psql -q -c "\l" | sed -n 4,/\eof/p | grep -v rows | grep -v template0 | awk {'print $1}' | sed 's/^://g' | sed -e '/^$/d' | grep -v '|'`

datediff() {
 d1=$(date -d "$1" +%s)
 d2=$(date -d "$2" +%s)
 echo $(( (d1 - d2) / 86400 )) days
 }

timespent() {
 d1=$(date -d "$1" +%s)
 d2=$(date -d "$2" +%s)
 echo $(( (d1 - d2) )) seconds
 }

11.uptime
12. ps
13. free
14. iostat
15.sar
16. mpstat
17.pmap
18. /proc file system - various kernet stats
# cat /proc/cpuinfo
# cat /proc/meminfo
# cat /proc/zoneinfo
# cat /proc/mounts
# cat /proc/version
19. lsof

Find out total space used by log files of primary segment databases

[gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h -s /data[12]/primary/gpseg*/pg_log*"

$(date +%s)

start_time=$(date +%s)
end_time=$(date +%s)

duration=`expr $end_time - $start_time`

echo `expr $difference / 3600`:`expr "(" $difference / 60 ")" % 60`:`expr $difference % 60`
List files in current directory and its size

for i in `ls -lh|awk {'print $9,$5}'`; do echo $i; done

20. last
21. df
22. du
23. kill
24. traceroute
25. rsync
26. rpm
27. tar
28. pwd
29. lsb_rlease -a
30. uname -a  [-prints the name, version and other details about the current machine and the operating system running on it.]

Find out total space used by backup files of primary segment databases

[gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h -s /data[12]/primary/gpseg*/db_dumps*"

Change all uppercase to lowercase with vi?
:%s/.*/\L&/

Conversely, :%s/.*/\U&/ will change all the characters to uppercase.


List all directory and subdirectory

ls -lDR | grep ':$' | head |sed -e 's/:$//'

Using grep and awk to filter our idle connections
posted Jul 8, 2015, 10:05 AM by Sachchida Ojha
[sachi@localhost ~]$ ps -ef | awk '/sachi/ && /idle/'
gpadmin  13356  1436  0 09:13 ?        00:00:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle
sachi    17449 13370  0 09:38 pts/1    00:00:00 awk /sachi/ && /idle/

[sachi@localhost ~]$ ps -ef | awk '/sachi/ && /idle/'|grep -v awk
gpadmin  13356  1436  0 09:13 ?        00:00:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle

[sachi@localhost ~]$ ps -ef | awk '/sachi/ && /idle/'|grep -v awk
gpadmin  13356  1436  0 09:13 ?        00:00:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle

[sachi@localhost ~]$ ps -ef |grep sachi
sachi     2380  2353  0 Jun26 ?        00:00:06 gnome-session --session gnome-classic
sachi     2388     1  0 Jun26 ?        00:00:00 dbus-launch --sh-syntax --exit-with-session
sachi     2396     1  0 Jun26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
sachi     2484     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfsd
sachi     2551     1  0 Jun26 ?        00:00:00 /usr/libexec//gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
sachi     2554  2380  0 Jun26 ?        00:00:02 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "env GNOME_SHELL_SESSION_MODE=classic gnome-session --session gnome-classic"
sachi     2595     1  0 Jun26 ?        00:00:00 /usr/libexec/at-spi-bus-launcher
sachi     2599  2595  0 Jun26 ?        00:00:00 /bin/dbus-daemon --config-file=/etc/at-spi2/accessibility.conf --nofork --print-address 3
sachi     2603     1  0 Jun26 ?        00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
sachi     2611  2380  0 Jun26 ?        00:00:22 /usr/libexec/gnome-settings-daemon
sachi     2627     1  0 Jun26 ?        00:00:02 /usr/bin/pulseaudio --start
sachi     2632     1  0 Jun26 ?        00:00:00 /usr/bin/gnome-keyring-daemon --start --components=ssh
sachi     2726     1  0 Jun26 ?        00:00:04 /usr/libexec/gvfs-udisks2-volume-monitor
sachi     2741     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-afc-volume-monitor
sachi     2746     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-mtp-volume-monitor
sachi     2750     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-gphoto2-volume-monitor
sachi     2754     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-goa-volume-monitor
sachi     2757     1  0 Jun26 ?        00:00:00 /usr/libexec/goa-daemon
sachi     2764     1  0 Jun26 ?        00:00:43 /usr/libexec/goa-identity-service
sachi     2766  2380  0 Jun26 ?        00:01:31 /usr/bin/gnome-shell
sachi     2773     1  0 Jun26 ?        00:00:00 /usr/libexec/dconf-service
sachi     2781     1  0 Jun26 ?        00:00:00 /usr/libexec/gsd-printer
sachi     2798     1  0 Jun26 ?        00:00:00 /usr/bin/ibus-daemon --replace --xim --panel disable
sachi     2806  2798  0 Jun26 ?        00:00:00 /usr/libexec/ibus-dconf
sachi     2808     1  0 Jun26 ?        00:00:00 /usr/libexec/ibus-x11 --kill-daemon
sachi     2816     1  0 Jun26 ?        00:00:00 /usr/libexec/gnome-shell-calendar-server
sachi     2820     1  0 Jun26 ?        00:00:06 /usr/libexec/mission-control-5
sachi     2832     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-source-registry
sachi     2840  2798  0 Jun26 ?        00:00:00 /usr/libexec/ibus-engine-simple
sachi     2942     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-addressbook-factory
sachi     2947     1  0 Jun26 ?        00:00:01 /usr/libexec/gconfd-2
sachi     2954     1  0 Jun26 ?        00:00:00 /usr/libexec/tracker-store
sachi     2970     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-calendar-factory
sachi     3010     1  0 Jun26 ?        00:00:00 /usr/libexec/gnome-session-failed --allow-logout
sachi     3014  2380  0 Jun26 ?        00:00:00 /usr/bin/seapplet
sachi     3029  2380  0 Jun26 ?        00:00:00 abrt-applet
root     13024   948  0 09:11 ?        00:00:00 sshd: sachi [priv]
sachi    13033 13024  0 09:11 ?        00:00:00 sshd: sachi@pts/0
sachi    13039 13033  0 09:11 pts/0    00:00:00 -bash
sachi    13355 13039  0 09:13 pts/0    00:00:00 psql
gpadmin  13356  1436  0 09:13 ?        00:00:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle
root     13358   948  0 09:13 ?        00:00:00 sshd: sachi [priv]
sachi    13363 13358  0 09:13 ?        00:00:00 sshd: sachi@pts/1
sachi    13370 13363  0 09:13 pts/1    00:00:00 -bash
sachi    17897 13370  0 09:40 pts/1    00:00:00 ps -ef
sachi    17898 13370  0 09:40 pts/1    00:00:00 grep --color=auto sachi

[sachi@localhost ~]$  ps -ef | awk '/sachi/ && /idle/'|grep -v awk
gpadmin  13356  1436  0 09:13 ?        00:00:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle

[sachi@localhost ~]$  ps -ax | awk '/sachi/ && /idle/'|grep -v awk
13356 ?        Ssl    0:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle

[sachi@localhost ~]$  ps -ax | awk '/sachi/ && /idlee/'|grep -v awk

[sachi@localhost ~]$  ps -ax | awk '/sachi/ && /idle/'|grep -v awk
13356 ?        Ssl    0:00 postgres: port  5432, sachi sachi [local] con120988 [local] idle

[sachi@localhost ~]$ ps -ef |grep sachi| grep -v idle
sachi     2380  2353  0 Jun26 ?        00:00:06 gnome-session --session gnome-classic
sachi     2388     1  0 Jun26 ?        00:00:00 dbus-launch --sh-syntax --exit-with-session
sachi     2396     1  0 Jun26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
sachi     2484     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfsd
sachi     2551     1  0 Jun26 ?        00:00:00 /usr/libexec//gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
sachi     2554  2380  0 Jun26 ?        00:00:02 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "env GNOME_SHELL_SESSION_MODE=classic gnome-session --session gnome-classic"
sachi     2595     1  0 Jun26 ?        00:00:00 /usr/libexec/at-spi-bus-launcher
sachi     2599  2595  0 Jun26 ?        00:00:00 /bin/dbus-daemon --config-file=/etc/at-spi2/accessibility.conf --nofork --print-address 3
sachi     2603     1  0 Jun26 ?        00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
sachi     2611  2380  0 Jun26 ?        00:00:22 /usr/libexec/gnome-settings-daemon
sachi     2627     1  0 Jun26 ?        00:00:02 /usr/bin/pulseaudio --start
sachi     2632     1  0 Jun26 ?        00:00:00 /usr/bin/gnome-keyring-daemon --start --components=ssh
sachi     2726     1  0 Jun26 ?        00:00:04 /usr/libexec/gvfs-udisks2-volume-monitor
sachi     2741     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-afc-volume-monitor
sachi     2746     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-mtp-volume-monitor
sachi     2750     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-gphoto2-volume-monitor
sachi     2754     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-goa-volume-monitor
sachi     2757     1  0 Jun26 ?        00:00:00 /usr/libexec/goa-daemon
sachi     2764     1  0 Jun26 ?        00:00:43 /usr/libexec/goa-identity-service
sachi     2766  2380  0 Jun26 ?        00:01:31 /usr/bin/gnome-shell
sachi     2773     1  0 Jun26 ?        00:00:00 /usr/libexec/dconf-service
sachi     2781     1  0 Jun26 ?        00:00:00 /usr/libexec/gsd-printer
sachi     2798     1  0 Jun26 ?        00:00:00 /usr/bin/ibus-daemon --replace --xim --panel disable
sachi     2806  2798  0 Jun26 ?        00:00:00 /usr/libexec/ibus-dconf
sachi     2808     1  0 Jun26 ?        00:00:00 /usr/libexec/ibus-x11 --kill-daemon
sachi     2816     1  0 Jun26 ?        00:00:00 /usr/libexec/gnome-shell-calendar-server
sachi     2820     1  0 Jun26 ?        00:00:06 /usr/libexec/mission-control-5
sachi     2832     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-source-registry
sachi     2840  2798  0 Jun26 ?        00:00:00 /usr/libexec/ibus-engine-simple
sachi     2942     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-addressbook-factory
sachi     2947     1  0 Jun26 ?        00:00:01 /usr/libexec/gconfd-2
sachi     2954     1  0 Jun26 ?        00:00:00 /usr/libexec/tracker-store
sachi     2970     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-calendar-factory
sachi     3010     1  0 Jun26 ?        00:00:00 /usr/libexec/gnome-session-failed --allow-logout
sachi     3014  2380  0 Jun26 ?        00:00:00 /usr/bin/seapplet
sachi     3029  2380  0 Jun26 ?        00:00:00 abrt-applet
root     13024   948  0 09:11 ?        00:00:00 sshd: sachi [priv]
sachi    13033 13024  0 09:11 ?        00:00:00 sshd: sachi@pts/0
sachi    13039 13033  0 09:11 pts/0    00:00:00 -bash
sachi    13355 13039  0 09:13 pts/0    00:00:00 psql
root     13358   948  0 09:13 ?        00:00:00 sshd: sachi [priv]
sachi    13363 13358  0 09:13 ?        00:00:00 sshd: sachi@pts/1
sachi    13370 13363  0 09:13 pts/1    00:00:00 -bash
sachi    21534 13370  0 10:02 pts/1    00:00:00 ps -ef
sachi    21535 13370  0 10:02 pts/1    00:00:00 grep --color=auto sachi
[sachi@localhost ~]$ ps -ef |grep sachi| grep -v idle|grep -v grep
sachi     2380  2353  0 Jun26 ?        00:00:06 gnome-session --session gnome-classic
sachi     2388     1  0 Jun26 ?        00:00:00 dbus-launch --sh-syntax --exit-with-session
sachi     2396     1  0 Jun26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
sachi     2484     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfsd
sachi     2551     1  0 Jun26 ?        00:00:00 /usr/libexec//gvfsd-fuse /run/user/1000/gvfs -f -o big_writes
sachi     2554  2380  0 Jun26 ?        00:00:02 /usr/bin/ssh-agent /bin/sh -c exec -l /bin/bash -c "env GNOME_SHELL_SESSION_MODE=classic gnome-session --session gnome-classic"
sachi     2595     1  0 Jun26 ?        00:00:00 /usr/libexec/at-spi-bus-launcher
sachi     2599  2595  0 Jun26 ?        00:00:00 /bin/dbus-daemon --config-file=/etc/at-spi2/accessibility.conf --nofork --print-address 3
sachi     2603     1  0 Jun26 ?        00:00:00 /usr/libexec/at-spi2-registryd --use-gnome-session
sachi     2611  2380  0 Jun26 ?        00:00:22 /usr/libexec/gnome-settings-daemon
sachi     2627     1  0 Jun26 ?        00:00:02 /usr/bin/pulseaudio --start
sachi     2632     1  0 Jun26 ?        00:00:00 /usr/bin/gnome-keyring-daemon --start --components=ssh
sachi     2726     1  0 Jun26 ?        00:00:04 /usr/libexec/gvfs-udisks2-volume-monitor
sachi     2741     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-afc-volume-monitor
sachi     2746     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-mtp-volume-monitor
sachi     2750     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-gphoto2-volume-monitor
sachi     2754     1  0 Jun26 ?        00:00:00 /usr/libexec/gvfs-goa-volume-monitor
sachi     2757     1  0 Jun26 ?        00:00:00 /usr/libexec/goa-daemon
sachi     2764     1  0 Jun26 ?        00:00:43 /usr/libexec/goa-identity-service
sachi     2766  2380  0 Jun26 ?        00:01:31 /usr/bin/gnome-shell
sachi     2773     1  0 Jun26 ?        00:00:00 /usr/libexec/dconf-service
sachi     2781     1  0 Jun26 ?        00:00:00 /usr/libexec/gsd-printer
sachi     2798     1  0 Jun26 ?        00:00:00 /usr/bin/ibus-daemon --replace --xim --panel disable
sachi     2806  2798  0 Jun26 ?        00:00:00 /usr/libexec/ibus-dconf
sachi     2808     1  0 Jun26 ?        00:00:00 /usr/libexec/ibus-x11 --kill-daemon
sachi     2816     1  0 Jun26 ?        00:00:00 /usr/libexec/gnome-shell-calendar-server
sachi     2820     1  0 Jun26 ?        00:00:06 /usr/libexec/mission-control-5
sachi     2832     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-source-registry
sachi     2840  2798  0 Jun26 ?        00:00:00 /usr/libexec/ibus-engine-simple
sachi     2942     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-addressbook-factory
sachi     2947     1  0 Jun26 ?        00:00:01 /usr/libexec/gconfd-2
sachi     2954     1  0 Jun26 ?        00:00:00 /usr/libexec/tracker-store
sachi     2970     1  0 Jun26 ?        00:00:00 /usr/libexec/evolution-calendar-factory
sachi     3010     1  0 Jun26 ?        00:00:00 /usr/libexec/gnome-session-failed --allow-logout
sachi     3014  2380  0 Jun26 ?        00:00:00 /usr/bin/seapplet
sachi     3029  2380  0 Jun26 ?        00:00:00 abrt-applet
root     13024   948  0 09:11 ?        00:00:00 sshd: sachi [priv]
sachi    13033 13024  0 09:11 ?        00:00:00 sshd: sachi@pts/0
sachi    13039 13033  0 09:11 pts/0    00:00:00 -bash
sachi    13355 13039  0 09:13 pts/0    00:00:00 psql
root     13358   948  0 09:13 ?        00:00:00 sshd: sachi [priv]
sachi    13363 13358  0 09:13 ?        00:00:00 sshd: sachi@pts/1
sachi    13370 13363  0 09:13 pts/1    00:00:00 -bash
sachi    21571 13370  0 10:02 pts/1    00:00:00 ps -ef
[sachi@localhost ~]$ 

Command to add an entry in the pg_hba.conf file of all the segments including master

1. Log into the master.
2. gpssh -f ~/gpconfigs/hostfile
3.$ps -ef | grep postgres | grep silent | grep -v grep | awk '{print $10}' | while read line ; do cp $line"/pg_hba.conf" $line"/pg_hba.conf.bk."$(date +"%m-%d-%Y-%H:%M:%S") ; done
4. ps -ef | grep postgres | grep silent | grep -v grep | awk '{print $10}' | while read line ; do echo "host all gpadmin 172.28.12.250/32 trust" >> $line"/pg_hba.conf" ; done
5.ps -ef | grep postgres | grep silent | grep -v grep | awk '{print $10}' | while read line ; do tail -1 $line/pg_hba.conf ; done

gpcrondump

posted Apr 29, 2017, 1:10 AM by Sachchida Ojha

gpcrondump

[gpadmin@sachi gpAdminLogs]$ gpcrondump --help
COMMAND NAME: gpcrondump
A wrapper utility for gp_dump, which can be called directly or from a crontab entry.

*****************************************************
SYNOPSIS
*****************************************************
gpcrondump -x <database_name> 
     [-s <schema> | -t <schema>.<table> | -T <schema>.<table>] 
     [--table-file="<filename>" | --exclude-table-file="<filename>"] 
     [-u <backup_directory>] [-R <post_dump_script>] 
     [-c] [-z] [-r] [-f <free_space_percent>] [-b] [-h] [-j | -k] 
     [-g] [-G] [-C] [-d <master_data_directory>] [-B <parallel_processes>] 
     [-a] [-q] [-y <reportfile>] [-l <logfile_directory>] [-v]
     { [-E <encoding>] [--inserts | --column-inserts] [--oids] 
       [--no-owner | --use-set-session-authorization] 
       [--no-privileges] [--rsyncable] [--ddboost] }
     
gpcrondump --ddboost-host <ddboost_hostname> --ddboost-user <ddboost_user>

gpcrondump --ddboost-config-remove

gpcrondump -o

gpcrondump -? 

gpcrondump --version
*****************************************************
DESCRIPTION
*****************************************************
gpcrondump is a wrapper utility for gp_dump. By default, dump files are created in their respective master and segment data directories in a directory named db_dumps/YYYYMMDD. The data dump files are compressed by default using gzip.

gpcrondump allows you to schedule routine backups of a Greenplum database using cron (a scheduling utility for UNIX operating systems). Cron jobs that call gpcrondump should be scheduled on the master host.

gpcrondump is used to schedule Data Domain Boost backup and restore operations. gpcrondump is also used to set or remove one-time credentials for Data Domain Boost.

**********************
Return Codes
**********************

The following is a list of the codes that gpcrondump returns.
   0 - Dump completed with no problems
   1 - Dump completed, but one or more warnings were generated
   2 - Dump failed with a fatal error

**********************
EMAIL NOTIFICATIONS
**********************
To have gpcrondump send out status email notifications, you must place a file named mail_contacts in the home directory of the Greenplum superuser (gpadmin) or in the same directory as the gpcrondump utility ($GPHOME/bin). This file should contain one email address per line. gpcrondump will issue a warning if it cannot locate a mail_contacts file in either location. If both locations have a mail_contacts file, then the one in $HOME takes precedence.
*****************************************************
OPTIONS
*****************************************************
-a (do not prompt)
 Do not prompt the user for confirmation.

-b (bypass disk space check)
 Bypass disk space check. The default is to check for available disk space.
Note: Bypassing the disk space check generates a warning message.  With a warning message, the return code for gpcrondump is 1 if the  dump is successful. (If the dump fails, the return code is 2, in all cases.)

-B <parallel_processes>
 The number of segments to check in parallel for pre/post-dump validation.  If not specified, the utility will start up to 60 parallel processes  depending on how many segment instances it needs to dump.

-c (clear old dump files first)
 Clear out old dump files before doing the dump. The default is not to  clear out old dump files. This will remove all old dump directories in  the db_dumps directory, except for the dump directory of the current date.

-C (clean old catalog dumps)
 Clean out old catalog schema dump files prior to create.

--column-inserts
 Dump data as INSERT commands with column names.

-d <master_data_directory>
 The master host data directory. If not specified, the value  set for $MASTER_DATA_DIRECTORY will be used.

--ddboost
 Use Data Domain Boost for this backup. Before using  Data Domain Boost, set up the Data Domain Boost credential,  as described in the next option below. 

 The following option is recommended if --ddboost is specified.
* -z option (uncompressed)
Backup compression (turned on by default) should be turned off  with the -z option. Data Domain Boost will deduplicate and compress the backup data before sending it to the Data Domain System. When running a mixed backup that backs up to both a local disk and to Data Domain, use the -u option to specify that the backup to the local disk does not use the default directory. 
  
The -f, -G, -g, -R, and -u options change if  --ddboost is specified. See the options for details.

Important: Never use the Greenplum Database default backup  options with Data Domain Boost.  

To maximize Data Domain deduplication benefits, retain at least 30 days of backups.

--ddboost-host <ddboost_hostname> --ddboost-user <ddboost_user>

Sets the Data Domain Boost credentials. Do not combine this options with any other gpcrondump options. Do not enter just part of this option. 

<ddboost_hostname> is the IP address of the host. There is a 30-character limit.

<ddboost_user> is the Data Domain Boost user name. There is a 30-character limit.
Example:
gpcrondump --ddboost-host 172.28.8.230 --ddboost-user ddboostusername

After running gpcrondump with these options, the system verfies the limits on the host and user names and prompts for the Data Domain Boost password. 
Enter the password when prompted; the password is not echoed on the screen. There is a 40-character limit on the password that can include lowercase 
letters (a-z), uppercase letters (A-Z), numbers (0-9), and special characters ($, %, #, +, etc.).

The system verifies the password. After the password is verified, the system creates a file .ddconfig and copies it to all segments.

Note: If there is more than one operating system user using Data Domain Boost for backup and restore operations, repeat this configuration process for each of those users.

Important: Set up the Data Domain Boost credential before running any Data Domain Boost backups with the --ddboost option, described above.

--ddboost-config-remove

Removes all Data Domain Boost credentials from the master and all segments on the system. Do not enter this option with any other gpcrondump option.

-E encoding

 Character set encoding of dumped data. Defaults to the encoding of 
 the database being dumped.

-f <free_space_percent>

 When doing the check to ensure that there is enough free disk space to  create the dump files, specifies a percentage of free disk space that  should remain after the dump completes. The default is 10 percent.

 -f is not supported if --ddboost is specified.

-g (copy config files)

 Secure a copy of the master and segment configuration files  postgresql.conf, pg_ident.conf, and pg_hba.conf. These  configuration files are dumped in the master or segment data  directory to db_dumps/YYYYMMDD/config_files_<timestamp>.tar

 If --ddboost is specified, the files are located in the  db_dumps directory on the default storage unit. 

-G (dump global objects)

Use pg_dumpall to dump global objects such as roles and tablespaces. Global objects are dumped in the master data directory to 
 db_dumps/YYYYMMDD/gp_global_1_1_<timestamp>.

 If --ddboost is specified, the files are located in the db_dumps directory on the default storage unit. 

-h (record dump details)

Record details of database dump in database table  public.gpcrondump_history in database supplied via  -x option. Utility will create table if it does not  currently exist.

--inserts
 Dump data as INSERT, rather than COPY commands.

-j (vacuum before dump)
 Run VACUUM before the dump starts.

-k (vacuum after dump)
 Run VACUUM after the dump has completed successfully.

-l <logfile_directory>
 The directory to write the log file. Defaults to ~/gpAdminLogs.

--no-owner
Do not output commands to set object ownership.

--no-privileges
Do not output commands to set object privileges (GRANT/REVOKE commands).

-o (clear old dump files only)
Clear out old dump files only, but do not run a dump. This will remove  the oldest dump directory except the current date's dump directory.  All dump sets within that directory will be removed.  If --ddboost is specified, only the old files on DD Boost are deleted.

--oids
Include object identifiers (oid) in dump data.

-q (no screen output)
 Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.

-r (rollback on failure)
Rollback the dump files (delete a partial dump) if a failure is detected. The default is to not rollback.
 -r is not supported if --ddboost is specified.

-R <post_dump_script>
 The absolute path of a script to run after a successful dump operation.  For example, you might want a script that moves completed dump files  to a backup host. This script must reside in the same location on  the master and all segment hosts.

--rsyncable
Passes the --rsyncable flag to the gpzip utility to synchronize  the output occasionally, based on the input during compression. This synchronization increases the file size by less than 1% in  most cases. When this flag is passed, the rsync(1) program can  synchronize compressed files much more efficiently. The gunzip
utility cannot differentiate between a compressed file created  with this option, and one created without it.  
-s <schema_name>
 Dump only the named schema in the named database.

-t <schema>.<table_name>
 Dump only the named table in this database.  The -t option can be specified multiple times.

-T <schema>.<table_name>
 A table name to exclude from the database dump. The -T option can be specified multiple times.
--exclude-table-file="<filename>"

Exclude all tables listed in <filename> from the database dump. The file <filename> contains any number of tables, listed one per line.

--table-file="<filename>"
Dump only the tables listed in <filename>. The file <filename> contains any number of tables, listed one per line.

-u <backup_directory>
 Specifies the absolute path where the backup files will be placed on each host. If the path does not exist, it will be created, if possible. If not specified, defaults to the data directory of each instance to be backed up. Using this option may be desirable if each segment host has multiple segment instances as it will create the dump files in a centralized location rather than the segment data directories.

 -u is not supported if --ddboost is specified.
--use-set-session-authorization

 Use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands to set object ownership.

-v | --verbose
 Specifies verbose mode.

--version (show utility version)
 Displays the version of this utility.

-x <database_name>
Required. The name of the Greenplum database to dump. Multiple databases can be specified in a comma-separated list.

-y <reportfile>
 Specifies the full path name where the backup job log file will be placed on the master host. If not specified, defaults to the master data directory or if running remotely, the current working directory.


-z (no compression)
 Do not use compression. Default is to compress the dump files using gzip.

 We recommend using this option for NFS and Data Dommain Boost backups.

-? (help)
 Displays the online help.

*****************************************************
EXAMPLES
*****************************************************

Call gpcrondump directly and dump mydatabase (and global objects):

 gpcrondump -x mydatabase -c -g -G

A crontab entry that runs a backup of the sales database (and global objects) nightly at one past midnight:

  01 0 * * * /home/gpadmin/gpdump.sh >> gpdump.log

The content of dump script gpdump.sh is:

#!/bin/bash
export GPHOME=/usr/local/greenplum-db
export MASTER_DATA_DIRECTORY=/data/gpdb_p1/gp-1
. $GPHOME/greenplum_path.sh  
gpcrondump -x sales -c -g -G -a -q 
*****************************************************
SEE ALSO
*****************************************************
gp_dump, gpdbrestore
[gpadmin@sachi gpAdminLogs]$ 

Reading EXPLAIN ANALYZE Output in Greenplum

posted Apr 29, 2017, 1:10 AM by Sachchida Ojha

EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. 
In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information:

1. The total elapsed time (in milliseconds) that it took to run the query.
2. The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
3. The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the one with the longest time to end is the one chosen.
4.. The segment id number of the segment that produced the most rows for an operation. 
5. The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that segment. The <time> to first row may be omitted if it is the same as the <time> to end.
EXPLAIN ANALYZE Example
To illustrate how to read an EXPLAIN ANALYZE query plan, we will use the same simple query we used in the “EXPLAIN Example” in my previous blog. Notice that there is some additional information in this plan that is not in a regular EXPLAIN plan. The parts of the plan in bold show the actual timing and rows returned for each plan node:
sachi=> EXPLAIN ANALYZE select * from employees where employee_id=198;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..3.34 rows=1 width=85)
   Rows out:  1 rows at destination with 1.492 ms to first row, 1.493 ms to end, start offset by 9.610 ms.
   ->  Seq Scan on employees  (cost=0.00..3.34 rows=1 width=85)
         Filter: employee_id = 198::numeric
         Rows out:  1 rows with 0.227 ms to first row, 0.242 ms to end, start offset by 11 ms.
 Slice statistics:
   (slice0)    Executor memory: 183K bytes.
   (slice1)    Executor memory: 201K bytes (seg1).
 Statement statistics:
   Memory used: 128000K bytes
 Total runtime: 11.308 ms
(11 rows)

sachi=> 
If we read the plan from the bottom up, you will see some additional information for each plan node operation. The total elapsed time it took to run this query was 11.308 milliseconds. 
The sequential scan operation had only one segment (seg0) that returned rows, and it  returned just 1 row. It took 0.227 milliseconds to find the first row and 0.242 to scan all rows. 
Notice that this is pretty close to the planner’s estimate — the query planner estimated that it would return one row for this query, which it did. The gather motion 
operation then received 1 row (segments sending up to the master). The total elapsed time for this operation was 1.493 milliseconds.

Interpretation of Explain Plan (basic) – Explained
NOTE : This is just a basic/simple document on understanding EXPLAIN PLAN, will not cover any advanced feature 

By definition EXPLAIN PLAN states " The command that displays the execution plan that the database planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table. "

So in short it means its the best path determined by the planner to run the query , based on the latest statistics and the parameter values ( like enable_seqscan , enable_hashjoin etc ).
So lets start with a simple example and break it down to what the explain plan helps or wants you to understand.

Explain 

Example 1
Lets have a look at the simple example.

sachi=# explain select * from pg_class;

QUERY PLAN
---------------------------------------------------------------
Seq Scan on pg_class (cost=0.00..40.37 rows=2359 width=204)
(1 row)

In the above example.

The above query with return 2359 rows with sequential scan ( full table scan ) on pg_class, this is just a approximate value might not be the exact row in the table , since the analyze 
is done based on row samples. Each rows with take 204 bytes on average , i.e each row returned will be of 204 bytes. 

Cost that is spend for the first row is 0 ( not exactly zero but close to zero ) 

To get the entire rows ( i.e 2359 ) it will cost 40.37.
Now the cost basically tells you "How much work it will require in reading a single database page from disk" , since its not easily measurable , you can assume it will take 40.37 units to 
complete the work.

So what did the planner / optimizer do here to get the cost of 40.37 ?

Let break it down again

When you ran the query , it called by a function estimation tool called "seq scan" ( in the above example ) .  This guy ( function ) then checked in the statistics information of the table ( pg_class in above example ) of the available statistics for the table in the catalog (pg_class) table (namely tuples / pages) 

Based on the statistics it made a calculation on the amount of work it has to do to complete the job.

So to explain what tuples / pages has an effect here , lets take in the current values 

sachi=# select relname,relpages,reltuples from pg_class where relname='pg_class';

relname | relpages | reltuples
----------+----------+-----------
pg_class | 17 | 2339
(1 row)
Time: 1.272 ms

So the estimated cost here is 
(disk pages read * seq_page_cost ) + (tuples scanned * cpu_tuple_cost ) 
Where , default 
seq_page_cost = 1.0
cpu_tuple_cost = 0.01
so the estimated cost = ( 17 * 1.0 ) + ( 2339 * 0.01 ) = 40.39
So this helps you understand why statistics are really imported for the planner to make in correct decision.


Example 2 :
Now lets take it a little further , by introducing a where condition. 

sachi=# explain select * from pg_class where relname='pg_type'; 

QUERY PLAN
------------------------------------------------------------
Seq Scan on pg_class (cost=0.00..46.21 rows=24 width=238)
Filter: relname = 'pg_type'::name
(2 rows)
Now the where condition here was able to reduce the amount of rows. But the cost now has been increased from 40 to 46 . Now when you introduce the where condition , the sequential scan still has to scan the entire rows of 2339 , then apply the where condition on top of it . so now there is a additional cost in the form of cpu_operator_cost ( default value 0.0025 ) ..
so addition of this extra cost the equation becomes 

(disk pages read * seq_page_cost ) + (tuples scanned * cpu_tuple_cost ) + (tuples scanned * cpu_operator_cost ) = 40.39 + (2339 * 0.0025 ) = 40.39 + 5.8 = 46.2 

Similarly planner has to take into consideration other cost such as random_page_cost , cpu_index_tuple_cost , effective_cache_size etc , based on its execution path.
Example 3

Now let's add a sort key into the picture.

sachi=# explain select * from pg_class where relname='pg_class' order by relname;

QUERY PLAN
------------------------------------------------------------------
Sort (cost=46.79..46.85 rows=24 width=238)
Sort Key: relname
-> Seq Scan on pg_class (cost=0.00..46.24 rows=24 width=238)
Filter: relname = 'pg_class'::name
(4 rows)
Time: 3.445 ms

When you have branches of explain plan , it's always read from the last line in the branch to the top , i.e in the above example "seq scan" was the first step and then the data was passed onto "sort". And the total cost of those steps includes the branches that are below it , let me explain in details below.

So here you can witness:

Sort gets the data from seq scan 
And then sort is done with the sort key column "relname" 
The cost of the sort is too high for the first value and the total work done is almost the same has the first value

Reason: The sort only returns values when the work is actually done, i.e. it has to scan the entire rows received from "seq scan" and then sort them up and once done, it returns the first value. So the total cost of the sort operation to get the first row is 46.79 ( sort ) - 46.24 (seq scan ) = 0.55 

The total cost of the sort operation to do its work was 46.85 - 46.24 = 0.61 

The reason why I subtracted the total work done by "seq scan" from sort scan was , it actually waited for seq scan to do its work and pass on the entry to sort so that it can start of the 
work , so in general the sort was not the one here that actually did more work based on the cost , it was "seq scan"
Example 4 
Let's introduce some joins:
sachi=# explain select * from test_emc join test_emc2 using (a); 
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 24:1 (slice1; segments: 24) (cost=690.04..2310.07 rows=834 
width=4)
-> Hash Join (cost=690.04..2310.07 rows=834 width=4)
Hash Cond: test_emc.a = test_emc2.a
-> Seq Scan on test_emc (cost=0.00..1120.00 rows=4167 width=4)
-> Hash (cost=440.02..440.02 rows=834 width=4)
-> Seq Scan on test_emc2 (cost=0.00..440.02 rows=834 width=4)


Here The Hash Join is the main branch , which is waiting for "seq scan from test_emc" and "hash operation" message. 

The Hash operation is waiting for again a "seq scan" from table test_emc2. So here the inner most is "seq scan on test_emc2" , this scan is passed on to hash operation. 
The hash operation first and total work is the same ( equal to the seq scan of test_emc2 total work ) , because the hash operation will send out information once it receive the full 
information from branches below it. 

Once the hash operation is done its passed on to Hash join. 

Seq scan also pass this information to the hash join. 

So the work done by Hash join here is ( 690.04 - 0 - 440.02 ) for the first row, since the hash join start sending out the rows to the main source as soon as it get the rows from its 
branches below. and (2310.07 - 1120 - 440.02 ) for the total work.

EXPLAIN ANALYZE
The only difference between a EXPLAIN and EXPLAIN ANALYZE is the former doesn't execute the query and gives a cost on estimates and the latter run the query and gives the actual 
results.

So lets analyze and few example of EXPLAIN ANALYZE.

Example 1
sachi=# explain analyze select * from test_emc;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 24:1 (slice1; segments: 24) (cost=0.00..1120.00 rows=4167 width=4)
Rows out: 100000 rows at destination with 0.891 ms to first row, 195 ms to end, start offset by 0.361 ms.
-> Seq Scan on test_emc (cost=0.00..1120.00 rows=4167 width=4)
Rows out: Avg 4166.7 rows x 24 workers. Max 4187 rows (seg7) with 0.220 ms to first row, 1.738 ms to end, start offset by 1.470 ms.
Slice statistics:
(slice0) Executor memory: 266K bytes.
(slice1) Executor memory: 156K bytes avg x 24 workers, 156K bytes max 
(seg0).
Statement statistics:
Memory used: 128000K bytes
Total runtime: 208.977 ms
(10 rows)


Here ,
The EXPLAIN tells you that if you run the query with the present data it will take 208.977 ms to complete the entire work. It will use up approx 128M of RAM 
There were two slices used slice 0 (sender) and a slice 1 (receiver) 

A slice is some thing where the original plans ( made by the planner ) are spilled into small plans and dispatched to the segments ( workers ) , for more information in slices refer to the link. 
slice 1 has produced 24 worker process on 24 segments each worker taking about 156K on avg , and the max was taken by seg0 which is 156k. 

slice 0 took about 266k memory. 

The query ran "seq scan" , where 24 workers returning 4166.7 rows on avg , the max was returned by set 7 which is 4187 rows. 

The "seq scan" took 0.220 ms for the first row and 1.738 ms to complete the task and took 1.470 ms to plan the execution of the query. 

The "seq scan" returned 100000 rows ( 4166.7 * 24 ) to the destination ( Gather Motion , which is the master ) with .89 ms to get the first row and the entire rows were received ( 
from all the segments ) at 195 ms and 0.361 ms to plan the start of accumulating the received rows.

Now Gather Motion is always the last step in the explain plan.

A motion is referred to the data ( tuples ) moving across interconnect , that is it (master) has to interact with the segments to get the data . Catalog data that resides on the master doesn't not need a motion.

There are basically three type of motions.

Gather Motion (N:1) - Every segment workers sends the target data to a single node (usually the master), which is then passed onto the end user. 
Redistribute Motion (N:N) - Every segment workers rehashes the target data (by join column) and redistributes each row to the appropriate segment. 
Broadcast Motion (N:N) - Every segment sends the target data to all other segments 

So here Gather Motion (24:1) means there are 24 worker processes which send data and one process to receive them.

Example 2

Let's take a example with join:
sachi=# explain analyze select * from test_emc join test_emc2 using (a);

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Gather Motion 24:1 (slice1; segments: 24) (cost=690.04..2310.07 rows=834 width=4)
Rows out: 20002 rows at destination with 45 ms to first row, 558 ms to end, start offset by 0.444 ms.
-> Hash Join (cost=690.04..2310.07 rows=834 width=4)
Hash Cond: test_emc.a = test_emc2.a
Rows out: Avg 833.4 rows x 24 workers. Max 862 rows (seg7) with 430 ms to first row, 431 ms to end, start offset by 1.580 ms.
Executor memory: 20K bytes avg, 21K bytes max (seg7).
Work_mem used: 20K bytes avg, 21K bytes max (seg7).
(seg7) Hash chain length 2.0 avg, 2 max, using 431 of 524341 buckets.
-> Seq Scan on test_emc (cost=0.00..1120.00 rows=4167 width=4)
Rows out: Avg 4166.7 rows x 24 workers. Max 4187 rows (seg7) with 0.090 ms to first row, 1.287 ms to end, start offset by 1.583 ms.
-> Hash (cost=440.02..440.02 rows=834 width=4)Rows in: Avg 833.4 rows x 24 workers. Max 862 rows (seg7) with 4.252 ms to end, start offset by 425 ms.
-> Seq Scan on test_emc2 (cost=0.00..440.02 rows=834 width=4)
Rows out: Avg 833.4 rows x 24 workers. Max 862 rows 
(seg7) with 2.314 ms to first row, 3.895 ms to end, start offset by 425 ms.
Slice statistics:
(slice0) Executor memory: 311K bytes.
(slice1) Executor memory: 8457K bytes avg x 24 workers, 8457K bytes max 
(seg0). Work_mem: 21K bytes max.
Statement statistics:
Memory used: 128000K bytes
Total runtime: 561.475 ms
(20 rows)
Time: 562.537 ms

Here most the stuff should be self explanatory if you have read my above example , only stuff to talk about is the line (seg7) Hash chain length 2.0 avg, 2 max, using 431 of 524341 buckets. 

I would not be describing much about this , if you know about hash table it basically means hash table has distributed the data in the form of buckets and here it has used 431 buckets out of 524341 ( read more on hash table from the link ) and each of length 2 on avg ( read on hash chain length at link).

Granting DDL priv to another user in Greenplum

posted Apr 29, 2017, 1:09 AM by Sachchida Ojha

Env : 

Schema1 --> Project schema
fullrole -> grants all access on schema1
rorole  -> grants usage on schema1
User1 ---> is a project DBA user and owner of the the schema object on schema 1
User2 ---> is project developer and wanted to create tables, alter tables owned by User1 

As we know, there is no ddl level grants in Greenplum.  There are many developers in the project and project dba wants to give ddl level access on project schema to some senior developers like user2.


Answer:  when I look at the grant command, I do not see any grant that gives user2 to create/alter schema objects owned by user1.  See the alter command below.

Command:     GRANT
Description: define access privileges
Syntax:
GRANT { { SELECT | INSERT | UPDATE | DELETE | REFERENCES | TRIGGER }
    [,...] | ALL [ PRIVILEGES ] }
    ON [ TABLE ] tablename [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT { { USAGE | SELECT | UPDATE }
    [,...] | ALL [ PRIVILEGES ] }
    ON SEQUENCE sequencename [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT { { CREATE | CONNECT | TEMPORARY | TEMP } [,...] | ALL [ PRIVILEGES ] }
    ON DATABASE dbname [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT { EXECUTE | ALL [ PRIVILEGES ] }
    ON FUNCTION funcname ( [ [ argmode ] [ argname ] argtype [, ...] ] ) [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT { USAGE | ALL [ PRIVILEGES ] }
    ON LANGUAGE langname [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT { { CREATE | USAGE } [,...] | ALL [ PRIVILEGES ] }
    ON SCHEMA schemaname [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT { CREATE | ALL [ PRIVILEGES ] }
    ON TABLESPACE tablespacename [, ...]
    TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ]

GRANT role [, ...] TO username [, ...] [ WITH ADMIN OPTION ]

GRANT { SELECT | INSERT | ALL [PRIVILEGES] } 
    ON PROTOCOL protocolname 
    TO username


The simple trick to solve this problem  is

grant user1 to user2;

In this way user2 can create/alter schema objects on user1 schema .

What is Disk Spill in Greenplum

posted Apr 29, 2017, 1:07 AM by Sachchida Ojha

What is Disk Spill
Running a SQL usually requires the database to allocate some working memory to the process(es) that execute a SQL. The memory is especially important for steps that sort data or build a transient in-memory hash tables (for joins or aggregations). When there are many active SQLs  (or very large SQL) query that require working memory, each will get a smaller piece of the memory (or else the system will start swapping). So, a SQL that could typically join or sort in a single memory pass might start spilling temporary results to disk during high concurrency, dramatically affecting its run time and its resource usage (extra I/O to write and later read temporary data). So, another effect of “too much” concurrency is potentially making each SQL work a lot harder (reducing overall throughput)
Greenplum Database creates work files on disk if it does not have sufficient memory to execute the query in memory. This information can be used for troubleshooting and tuning queries.

If you have very large queries that need more memory, you can change the memory policy to use more memory rather than spilling to disk. You can see a query wanting more memory by looking at the explain plan of a query by using “explain analyze”. 

For example:EXPLAIN ANALYZE SELECT * FROM .....; 
Work_mem used: 23430K bytes avg, 23430K bytes max (seg0).
Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers.
==================================================================================================================
EXPLAIN displays the query plan that the Greenplum planner generates for the supplied statement. Query plans are a tree plan of nodes. Each node in the plan 
represents a single operation, such as table scan, join, aggregation or a sort. Plans should be read from the bottom up as each node feeds rows into the node 
directly above it. The bottom nodes of a plan are usually table scan operations (sequential, index or bitmap index scans). If the query requires joins, aggregations, or 
sorts (or other operations on the raw rows) then there will be additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually the Greenplum Database motion nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the operations responsible for moving rows between the segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the following cost estimates that the planner made for the execution of that plan node:

1. cost - measured in units of disk page fetches; that is, 1.0 equals one sequential disk page read. The first estimate is the start-up cost (cost of getting to the first row) and the second is the total cost (cost of getting all rows). Note that the total cost assumes that all rows will be retrieved, which may not always be the case (if using LIMIT for example).

2. rows - the total number of rows output by this plan node. This is usually less than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally the top-level nodes estimate will approximate the number of rows actually returned, updated, or deleted by the query.

3. width - total bytes of all the rows output by this plan node. 
It is important to note that the cost of an upper-level node includes the cost of all its child nodes. The topmost node of the plan has the estimated total execution cost for the plan. This is this number that the planner seeks to minimize. It is also important to realize that the cost only reflects things that the query planner cares about. In particular, the cost does not consider the time spent transmitting result rows to the client.
EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information:

The total elapsed time (in milliseconds) that it took to run the query.
• The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
• The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the 
one with the longest time to end is the one chosen.
• The segment id number of the segment that produced the most rows for an operation.
• For relevant operations, the work_mem used by the operation. If work_mem was not sufficient to perform the operation in memory, the plan will show how much 
data was spilled to disk and how many passes over the data were required for the lowest performing segment. For example:
Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile I/O affecting 2 workers.
[seg0] pass 0: 488 groups made from 488 rows; 263 rows written to workfile
[seg0] pass 1: 263 groups made from 263 rows
• The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that 
segment. The <time> to first row may be omitted if it is the same as the <time> to end.
Very Important: Keep in mind that the statement is actually executed when EXPLAIN ANALYZE is used. Although EXPLAIN ANALYZE will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on a DML statement without letting the command affect your data, use this approach:
BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;
====================================================================================================================
Note that the bytes wanted message from EXPLAIN ANALYZE is only a hint, based on the amount of data written to work files and is not exact. The minimum work_mem needed could be more or less than the suggested value.

The output will show the plan used but a key item to look for is “Work_mem wanted”. When you see this, it means that Greenplum had to spill to disk because there wasn’t enough memory available. The best approach is likely to rewrite the query. Alternatively, you can increase the amount of memory available.

The “auto” setting allows you to increase or decrease the amount of memory a query will use by changing the “statement_mem” value. The maximum value you can set for statement_mem is determined by “max_statement_mem”. The default max_statement_mem is 2MB.

On the Master, execute the following to increase the statement_mem:gpconfig -c max_statement_mem -v 8GB gpstop -u 

Now, you can change the memory setting in your session. You can also do this with gpconfig to make the setting for all sessions.set gp_resqueue_memory_policy = auto; set statement_mem = '4GB'; 

Re-run your query and see if it executes faster and if it still has “bytes wanted” in the query plan.
Compressed Work Files
If you know you are spilling to disk when executing queries because EXPLAIN ANALYZE showed that more bytes were wanted than available, you can trade CPU for IO by compressing the work files. This is is done with “gp_workfile_compress_algorithm”. The default value is “none” but you can change this to “zlib”. It can be done at the session or with gpconfig to make it system wide.

Temporary Tables
Another way to deal with very large queries that spill to disk is to use temporary tables that are compressed. This is ideal when you use a subquery that is then joined to other tables. If you know it is spilling to disk (again from EXPLAIN ANALYZE showing more bytes wanted than available), you can populate a compressed temporary table instead. For example:CREATE TEMPORARY TABLE foo (myid int, bar text) WITH (APPENDONLY=true, COMPRESSTYPE=quicklz) ON COMMIT DROP DISTRIBUTED BY (myid); 

The gp_workfile_* views show information about all the queries that are currently using disk spill space. The information in the views can also be used to specify the values for the Greenplum Database configuration parameters gp_workfile_limit_per_query and gp_workfile_limit_per_segment. 

Greenplum Database configuration parameters 

 Parameter  Value Range Default      Description Classification
 gp_workfile_limit_per_query  kilobytes 0Sets the maximum disk size an individual query is allowed to use for creating temporary spill files at each segment. The 
default value is 0, which means a limit is not enforced. 
master 
session 
reload
 gp_workfile_limit_per_segment  kilobytes 0Sets the maximum total disk size that all running queries are allowed to use for creating temporary spill files at each 
segment. The default value is 0, which means a limit is not enforced. 
local 
system 
restart
 gp_workfile_checksumming Boolean onAdds a checksum value to each block of a work file (or spill file) used by HashAgg and HashJoin query operators. This adds an additional safeguard from faulty OS disk drivers writing corrupted blocks to disk. When a checksum operation fails, the query will cancel and rollback rather than potentially writing bad data to disk.master
session
reload
 gp_workfile_compress_algorithm nonenone 
zlib
When a hash aggregation or hash join operation spills to disk during query 
processing, specifies the compression algorithm to use on the spill files. If using zlib, it must be in your $PATH on all segments
master
session
reload

Workfile Disk Spill Space Information

In Greenplum Database 4.3 gp_workfile_* views in the gp_toolkit administrative schema contain show information about all the queries that are currently using disk spill space. Previously in 4.2.x.x releases, you created the views by running SQL scripts.
Let look into details about these view.

1. gp_workfile_entries
2. gp_workfile_usage_per_query
3. gp_workfile_usage_per_segment
1. gp_workfile_entries
This view contains one row for each operator using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

Column description of this view

command_cnt->Command ID of the query.
content->The content identifier for a  segment instance.
current_query->Current query that the process is running.
datname->Greenplum database name.
directory->Path to the work file.
optype-> The query operator type that created the work file.
procpid->Process ID of the server process.
sess_id->Session ID.
size->The size of the work file in bytes.
numfiles->The number of files created.
slice-> The query plan slice. The portion of the query plan that is being executed.
state-> The state of the query that created the work file.
usename-> Role name.
workmem->The amount of memory allocated to the operator in KB.


2. gp_workfile_usage_per_query
This view contains one row for each query using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

Column description of this view

command_cnt-> Command ID of the query.
content->  The content identifier for a segment instance.
current_query-> Current query that the process is running.
datname-> Greenplum database name.
procpid->  Process ID of the server process.
sess_id->  Session ID.
size -> The size of the work file in bytes.
numfiles->  The number of files created.
state-> The state of the query that created the work file.
usename ->  Role name.


3. gp_workfile_usage_per_segment
This view contains one row for each segment. Each row displays the total amount of disk space used for workfiles on the segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access

Column description of this view
content-> The content identifier for a segment instance.
size->  The total size of the work files on a segment.
numfiles->  The number of files created.

Oracle vs Greenplum 

Oracle introduced statement queuing in version 11g Release 2 (and later enhanced it in 11.2.0.2). However, in their case, it is bundled with a bunch of other new parallelism features (automatic DOP and in-memory parallel execution), so it is unfortunately more complex than necessary. In Oracle, the system-wide number of parallel process slaves is fixed and the engine tries to automatically find the optimal per-SQL parallelism based on the current system load before each execution. The DBA controls various parameters (globally and per resource group) to try to tame the beast.
Greenplum Database uses different model. The degree of per-SQL parallelism is fixed. The administrator simply chooses how many active SQLs are allowed per resource queue (group) – if more SQLs are submitted to a queue, they will wait until a slot is available. The administrator can also specify a minimal cost threshold (per resource queue) – to allow quick queries to bypass the queuing mechanism (and of course prioritize between queues).So, to sum it up, “too much” concurrency does hurt database performance. 

Luckily, it can be handled by proper setup in many modern databases – using statement prioritization and statement queuing.

Greenplum is pretty easy to manage memory because it has been designed to leverage the OS caching. The default Eager Free Memory Policy works very well for most of the queries in the database. However, if you do see queries still need more memory than is available, you can set the memory policy to auto and increase the statement_mem. If you are still spilling to disk because your statement needs more memory, you can have Greenplum automatically compress work files or use compressed temporary tables.

1-10 of 233