greenplum-database-dba-references
Greenplum Sorting Functions like Oracle
create or replace function pgoramin ( is_val_1 varchar, is_val_2 varchar ) returns varchar as $$ begin if (oracompat.nlssort(is_val_1,'C') >= oracompat.nlssort(is_val_2,'C')) then return is_val_2; else return is_val_1; end if; end $$ language plpgsql immutable strict; create or replace function pgoramax ( is_val_1 varchar, is_val_2 varchar ) returns varchar as $$ begin if (oracompat.nlssort(is_val_1,'C') >= oracompat.nlssort(is_val_2,'C')) then return is_val_1; else return is_val_2; end if; end $$ language plpgsql immutable strict; drop aggregate if exists oracharmax(varchar); create aggregate oracharmax (varchar) ( sfunc = pgoramax, stype = varchar, prefunc = pgoramax ); drop aggregate if exists oracharmin(varchar); create aggregate oracharmin (varchar) ( sfunc = pgoramin, stype = varchar, prefunc = pgoramin ); |
How to use PSQL in greenplum
Creating a Table in Greenplum
Removes a table DROP TABLE [IF EXISTS] name [, ...] [CASCADE | RESTRICT] |
gpdbrestore
[gpadmin@sachi ~]$ gpdbrestore --help COMMAND NAME: gpdbrestore A wrapper utility around gp_restore. Restores a database from a set of dump files generated by gpcrondump. ***************************************************** SYNOPSIS ***************************************************** gpdbrestore { -t <timestamp_key> [-L] | -b YYYYMMDD | -R <hostname>:<path_to_dumpset> | -s <database_name> } [-T <schema>.<table> [,...]] [-e] [-G] [-B <parallel_processes>] [-d <master_data_directory>] [-a] [-q] [-l <logfile_directory>] [-v] [-ddboost] gpdbrestore -? gpdbrestore --version ***************************************************** DESCRIPTION ***************************************************** gpdbrestore is a wrapper around gp_restore, which provides some convenience and flexibility in restoring from a set of backup files created by gpcrondump. This utility provides the following additional functionality on top of gp_restore: * Automatically reconfigures for compression. * Validates the number of dump files are correct (For primary only, mirror only, primary and mirror, or a subset consisting of some mirror and primary segment dump files). * If a failed segment is detected, restores to active segment instances. * Do not need to know the complete timestamp key (-t) of the backup set to restore. Additional options are provided to instead give just a date (-b), backup set directory location (-R), or database name (-s) to restore. * The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). Ensures that the correct dump file goes to the correct segment instance. * Identifies the database name automatically from the backup set. * Allows you to restore particular tables only (-T option) instead of the entire database. Note that single tables are not automatically dropped or truncated prior to restore. * Can restore global objects such as roles and tablespaces (-G option). * Detects if the backup set is primary segments only or primary and mirror segments and passes the appropriate options to gp_restore. * Allows you to drop the target database before a restore in a single operation. Error Reporting gpdbrestore does not report errors automatically. After the restore is completed, check the report status files to verify that there are no errors. The restore status files are stored in the db_dumps/<date>/ directory by default. ***************************************************** OPTIONS ***************************************************** -a (do not prompt) Do not prompt the user for confirmation. -b YYYYMMDD Looks for dump files in the segment data directories on the Greenplum Database array of hosts in db_dumps/YYYYMMDD. If --ddboost is specified, the system looks for dump files on the DD Boost host. -B <parallel_processes> The number of segments to check in parallel for pre/post-restore validation. If not specified, the utility will start up to 60 parallel processes depending on how many segment instances it needs to restore. -d <master_data_directory> Optional. The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used. --ddboost Use Data Domain DD Boost for this restore, if the --ddboost option was passed when the data was dumped. Before make sure the one-time DD Boost credential setup is completed. -e (drop target database before restore) Drops the target database before doing the restore and then recreates it. -G (restore global objects) Restores global objects such as roles and tablespaces if the global object dump file db_dumps/<date>/gp_global_1_1_<timestamp> is found in the master data directory. -l <logfile_directory> The directory to write the log file. Defaults to ~/gpAdminLogs. -L (list tablenames in backup set) When used with the -t option, lists the table names that exist in the named backup set and exits. Does not do a restore. -q (no screen output) Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file. -R <hostname>:<path_to_dumpset> Allows you to provide a hostname and full path to a set of dump files. The host does not have to be in the Greenplum Database array of hosts, but must be accessible from the Greenplum master. -s <database_name> Looks for latest set of dump files for the given database name in the segment data directories db_dumps directory on the Greenplum Database array of hosts. -t <timestamp_key> The 14 digit timestamp key that uniquely identifies a backup set of data to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files matching this timestamp key in the segment data directories db_dumps directory on the Greenplum Database array of hosts. -T <schema>.<table_name> A comma-separated list of specific table names to restore. The named table(s) must exist in the backup set of the database being restored. Existing tables are not automatically truncated before data is restored from backup. If your intention is to replace existing data in the table from backup, truncate the table prior to running gpdbrestore -T. -v | --verbose Specifies verbose mode. --version (show utility version) Displays the version of this utility. -? (help) Displays the online help. gpcrondump, gp_restore [gpadmin@sachi ~]$ |
Restoring to a Different Greenplum System Configuration
In order to do a parallel restore operation using gp_restore or gpdbrestore, the system you are restoring to must be the same configuration as the system that was backed up. If you want to restore your database objects and data into a different system configuration (for example, if you are expanding to a system with more segments), you can still use your parallel backup files and restore them by loading them through the Greenplum master. To do a non-parallel restore, you must have: 1.A complete backup set created by a gp_dump or gpcrondump operation. The backup file of the master contains the DDL to recreate your database objects. The backup files of the segments contain the data. 2.A Greenplum Database system up and running. 3.The database you are restoring to is created in the system. If you look at the contents of a segment dump file, it simply contains a COPY command for each table followed by the data in delimited text format. If you collect all of the dump files for all of the segment instances and run them through the master, you will have restored all of your data and redistributed it across the new system configuration. To restore a database to a different system configuration 1.First make sure you have a complete backup set. This includes a dump file of the master (gp_dump_1_1_<timestamp>) and one for each segment instance (gp_dump_0_2_<timestamp>, gp_dump_0_3_<timestamp>,gp_dump_0_4_<timestamp>, and so on). The individual dump files should all have the same timestamp key. By default, gp_dump creates the dump files in each segment instance’s data directory, so you will need to collect all of the dump files and move them to a place on the master host. If you do not have a lot of disk space on the master, you can copy each segment dump file to the master, load it, and then delete it once it has loaded successfully. 2.Make sure the database you are restoring to has been created in the system. For example: $ createdb database_name 3.Load the master dump file to restore the database objects. For example: $ psql database_name -f /gpdb/backups/gp_dump_1_1_20080714 4.Load each segment dump file to restore the data. For example: $ psql database_name -f /gpdb/backups/gp_dump_0_2_20080714 $ psql database_name -f /gpdb/backups/gp_dump_0_3_20080714 $ psql database_name -f /gpdb/backups/gp_dump_0_4_20080714 $ psql database_name -f /gpdb/backups/gp_dump_0_5_20080714 ... |
Frequently used unix commands by Greenplum DBA's
Script argument: $1 is the first argument, $2 is the second argument, and so on. The variable $0 is the script's name. The total number of arguments is stored in $#. The variables $@ and $* return all the arguments 1. route - show / manipulate the IP routing table see also: ip route 2. nslookup 3. ifconfig -a 4. hostname 5. ping 6. ethtool eth0 7. netstat -rn 8. top 9. vmstat 10. w --Find out total space used by primary segment databases (excluding log files and local backup files) [gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h --exclude=*pg_log* --exclude=*db_dump* -s /data[12]/primary/gpseg*" --Change owner of all tables in Public schema for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname = 'public';" sachi` ; do psql -c "alter table $tbl owner to gpadmin" sachi ; done--Move all tables from Public Schema to a specified schema. for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname='public';" sachi`; do `psql -c "ALTER TABLE $tbl SET SCHEMA sachi;" sachi`; done DATABASES=`psql -q -c "\l" | sed -n 4,/\eof/p | grep -v rows | grep -v template0 | awk {'print $1}' | sed 's/^://g' | sed -e '/^$/d' | grep -v '|'` datediff() { d1=$(date -d "$1" +%s) d2=$(date -d "$2" +%s) echo $(( (d1 - d2) / 86400 )) days } timespent() { d1=$(date -d "$1" +%s) d2=$(date -d "$2" +%s) echo $(( (d1 - d2) )) seconds } 11.uptime 12. ps 13. free 14. iostat 15.sar 16. mpstat 17.pmap 18. /proc file system - various kernet stats # cat /proc/cpuinfo# cat /proc/meminfo # cat /proc/zoneinfo # cat /proc/mounts # cat /proc/version 19. lsof Find out total space used by log files of primary segment databases [gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h -s /data[12]/primary/gpseg*/pg_log*" $(date +%s) start_time=$(date +%s) end_time=$(date +%s) duration=`expr $end_time - $start_time` echo `expr $difference / 3600`:`expr "(" $difference / 60 ")" % 60`:`expr $difference % 60` List files in current directory and its size for i in `ls -lh|awk {'print $9,$5}'`; do echo $i; done 20. last 21. df 22. du 23. kill 24. traceroute 25. rsync 26. rpm 27. tar 28. pwd 29. lsb_rlease -a 30. uname -a [-prints the name, version and other details about the current machine and the operating system running on it.] Find out total space used by backup files of primary segment databases [gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h -s /data[12]/primary/gpseg*/db_dumps*" Change all uppercase to lowercase with vi? :%s/.*/\L&/ Conversely, :%s/.*/\U&/ will change all the characters to uppercase. List all directory and subdirectory ls -lDR | grep ':$' | head |sed -e 's/:$//' Using grep and awk to filter our idle connections posted Jul 8, 2015, 10:05 AM by Sachchida Ojha
|
gpcrondump
gpcrondump
|
Reading EXPLAIN ANALYZE Output in Greenplum
EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information: 1. The total elapsed time (in milliseconds) that it took to run the query. 2. The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted. 3. The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the one with the longest time to end is the one chosen. 4.. The segment id number of the segment that produced the most rows for an operation. 5. The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that segment. The <time> to first row may be omitted if it is the same as the <time> to end. EXPLAIN ANALYZE Example To illustrate how to read an EXPLAIN ANALYZE query plan, we will use the same simple query we used in the “EXPLAIN Example” in my previous blog. Notice that there is some additional information in this plan that is not in a regular EXPLAIN plan. The parts of the plan in bold show the actual timing and rows returned for each plan node: sachi=> EXPLAIN ANALYZE select * from employees where employee_id=198; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..3.34 rows=1 width=85) Rows out: 1 rows at destination with 1.492 ms to first row, 1.493 ms to end, start offset by 9.610 ms. -> Seq Scan on employees (cost=0.00..3.34 rows=1 width=85) Filter: employee_id = 198::numeric Rows out: 1 rows with 0.227 ms to first row, 0.242 ms to end, start offset by 11 ms. Slice statistics: (slice0) Executor memory: 183K bytes. (slice1) Executor memory: 201K bytes (seg1). Statement statistics: Memory used: 128000K bytes Total runtime: 11.308 ms (11 rows) sachi=> If we read the plan from the bottom up, you will see some additional information for each plan node operation. The total elapsed time it took to run this query was 11.308 milliseconds. The sequential scan operation had only one segment (seg0) that returned rows, and it returned just 1 row. It took 0.227 milliseconds to find the first row and 0.242 to scan all rows. Notice that this is pretty close to the planner’s estimate — the query planner estimated that it would return one row for this query, which it did. The gather motion operation then received 1 row (segments sending up to the master). The total elapsed time for this operation was 1.493 milliseconds. Interpretation of Explain Plan (basic) – Explained NOTE : This is just a basic/simple document on understanding EXPLAIN PLAN, will not cover any advanced feature By definition EXPLAIN PLAN states " The command that displays the execution plan that the database planner generates for the supplied statement. The execution plan shows how the table(s) referenced by the statement will be scanned — by plain sequential scan, index scan, etc. — and if multiple tables are referenced, what join algorithms will be used to bring together the required rows from each input table. " So in short it means its the best path determined by the planner to run the query , based on the latest statistics and the parameter values ( like enable_seqscan , enable_hashjoin etc ). So lets start with a simple example and break it down to what the explain plan helps or wants you to understand. Explain Example 1 Lets have a look at the simple example. sachi=# explain select * from pg_class; QUERY PLAN --------------------------------------------------------------- Seq Scan on pg_class (cost=0.00..40.37 rows=2359 width=204) (1 row) In the above example. The above query with return 2359 rows with sequential scan ( full table scan ) on pg_class, this is just a approximate value might not be the exact row in the table , since the analyze is done based on row samples. Each rows with take 204 bytes on average , i.e each row returned will be of 204 bytes. Cost that is spend for the first row is 0 ( not exactly zero but close to zero ) To get the entire rows ( i.e 2359 ) it will cost 40.37. Now the cost basically tells you "How much work it will require in reading a single database page from disk" , since its not easily measurable , you can assume it will take 40.37 units to complete the work. So what did the planner / optimizer do here to get the cost of 40.37 ? Let break it down again When you ran the query , it called by a function estimation tool called "seq scan" ( in the above example ) . This guy ( function ) then checked in the statistics information of the table ( pg_class in above example ) of the available statistics for the table in the catalog (pg_class) table (namely tuples / pages) Based on the statistics it made a calculation on the amount of work it has to do to complete the job. So to explain what tuples / pages has an effect here , lets take in the current values sachi=# select relname,relpages,reltuples from pg_class where relname='pg_class'; relname | relpages | reltuples ----------+----------+----------- pg_class | 17 | 2339 (1 row) Time: 1.272 ms So the estimated cost here is (disk pages read * seq_page_cost ) + (tuples scanned * cpu_tuple_cost ) Where , default seq_page_cost = 1.0 cpu_tuple_cost = 0.01 so the estimated cost = ( 17 * 1.0 ) + ( 2339 * 0.01 ) = 40.39 So this helps you understand why statistics are really imported for the planner to make in correct decision. Example 2 : Now lets take it a little further , by introducing a where condition. sachi=# explain select * from pg_class where relname='pg_type'; QUERY PLAN ------------------------------------------------------------ Seq Scan on pg_class (cost=0.00..46.21 rows=24 width=238) Filter: relname = 'pg_type'::name (2 rows) Now the where condition here was able to reduce the amount of rows. But the cost now has been increased from 40 to 46 . Now when you introduce the where condition , the sequential scan still has to scan the entire rows of 2339 , then apply the where condition on top of it . so now there is a additional cost in the form of cpu_operator_cost ( default value 0.0025 ) .. so addition of this extra cost the equation becomes (disk pages read * seq_page_cost ) + (tuples scanned * cpu_tuple_cost ) + (tuples scanned * cpu_operator_cost ) = 40.39 + (2339 * 0.0025 ) = 40.39 + 5.8 = 46.2 Similarly planner has to take into consideration other cost such as random_page_cost , cpu_index_tuple_cost , effective_cache_size etc , based on its execution path. Example 3 Now let's add a sort key into the picture. sachi=# explain select * from pg_class where relname='pg_class' order by relname; QUERY PLAN ------------------------------------------------------------------ Sort (cost=46.79..46.85 rows=24 width=238) Sort Key: relname -> Seq Scan on pg_class (cost=0.00..46.24 rows=24 width=238) Filter: relname = 'pg_class'::name (4 rows) Time: 3.445 ms When you have branches of explain plan , it's always read from the last line in the branch to the top , i.e in the above example "seq scan" was the first step and then the data was passed onto "sort". And the total cost of those steps includes the branches that are below it , let me explain in details below. So here you can witness: Sort gets the data from seq scan And then sort is done with the sort key column "relname" The cost of the sort is too high for the first value and the total work done is almost the same has the first value Reason: The sort only returns values when the work is actually done, i.e. it has to scan the entire rows received from "seq scan" and then sort them up and once done, it returns the first value. So the total cost of the sort operation to get the first row is 46.79 ( sort ) - 46.24 (seq scan ) = 0.55 The total cost of the sort operation to do its work was 46.85 - 46.24 = 0.61 The reason why I subtracted the total work done by "seq scan" from sort scan was , it actually waited for seq scan to do its work and pass on the entry to sort so that it can start of the work , so in general the sort was not the one here that actually did more work based on the cost , it was "seq scan" Example 4 Let's introduce some joins: sachi=# explain select * from test_emc join test_emc2 using (a); QUERY PLAN ------------------------------------------------------------------------------------- Gather Motion 24:1 (slice1; segments: 24) (cost=690.04..2310.07 rows=834 width=4) -> Hash Join (cost=690.04..2310.07 rows=834 width=4) Hash Cond: test_emc.a = test_emc2.a -> Seq Scan on test_emc (cost=0.00..1120.00 rows=4167 width=4) -> Hash (cost=440.02..440.02 rows=834 width=4) -> Seq Scan on test_emc2 (cost=0.00..440.02 rows=834 width=4) Here The Hash Join is the main branch , which is waiting for "seq scan from test_emc" and "hash operation" message. The Hash operation is waiting for again a "seq scan" from table test_emc2. So here the inner most is "seq scan on test_emc2" , this scan is passed on to hash operation. The hash operation first and total work is the same ( equal to the seq scan of test_emc2 total work ) , because the hash operation will send out information once it receive the full information from branches below it. Once the hash operation is done its passed on to Hash join. Seq scan also pass this information to the hash join. So the work done by Hash join here is ( 690.04 - 0 - 440.02 ) for the first row, since the hash join start sending out the rows to the main source as soon as it get the rows from its branches below. and (2310.07 - 1120 - 440.02 ) for the total work. EXPLAIN ANALYZE The only difference between a EXPLAIN and EXPLAIN ANALYZE is the former doesn't execute the query and gives a cost on estimates and the latter run the query and gives the actual results. So lets analyze and few example of EXPLAIN ANALYZE. Example 1 sachi=# explain analyze select * from test_emc; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 24:1 (slice1; segments: 24) (cost=0.00..1120.00 rows=4167 width=4) Rows out: 100000 rows at destination with 0.891 ms to first row, 195 ms to end, start offset by 0.361 ms. -> Seq Scan on test_emc (cost=0.00..1120.00 rows=4167 width=4) Rows out: Avg 4166.7 rows x 24 workers. Max 4187 rows (seg7) with 0.220 ms to first row, 1.738 ms to end, start offset by 1.470 ms. Slice statistics: (slice0) Executor memory: 266K bytes. (slice1) Executor memory: 156K bytes avg x 24 workers, 156K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Total runtime: 208.977 ms (10 rows) Here , The EXPLAIN tells you that if you run the query with the present data it will take 208.977 ms to complete the entire work. It will use up approx 128M of RAM There were two slices used slice 0 (sender) and a slice 1 (receiver) A slice is some thing where the original plans ( made by the planner ) are spilled into small plans and dispatched to the segments ( workers ) , for more information in slices refer to the link. slice 1 has produced 24 worker process on 24 segments each worker taking about 156K on avg , and the max was taken by seg0 which is 156k. slice 0 took about 266k memory. The query ran "seq scan" , where 24 workers returning 4166.7 rows on avg , the max was returned by set 7 which is 4187 rows. The "seq scan" took 0.220 ms for the first row and 1.738 ms to complete the task and took 1.470 ms to plan the execution of the query. The "seq scan" returned 100000 rows ( 4166.7 * 24 ) to the destination ( Gather Motion , which is the master ) with .89 ms to get the first row and the entire rows were received ( from all the segments ) at 195 ms and 0.361 ms to plan the start of accumulating the received rows. Now Gather Motion is always the last step in the explain plan. A motion is referred to the data ( tuples ) moving across interconnect , that is it (master) has to interact with the segments to get the data . Catalog data that resides on the master doesn't not need a motion. There are basically three type of motions. Gather Motion (N:1) - Every segment workers sends the target data to a single node (usually the master), which is then passed onto the end user. Redistribute Motion (N:N) - Every segment workers rehashes the target data (by join column) and redistributes each row to the appropriate segment. Broadcast Motion (N:N) - Every segment sends the target data to all other segments So here Gather Motion (24:1) means there are 24 worker processes which send data and one process to receive them. Example 2 Let's take a example with join: sachi=# explain analyze select * from test_emc join test_emc2 using (a); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 24:1 (slice1; segments: 24) (cost=690.04..2310.07 rows=834 width=4) Rows out: 20002 rows at destination with 45 ms to first row, 558 ms to end, start offset by 0.444 ms. -> Hash Join (cost=690.04..2310.07 rows=834 width=4) Hash Cond: test_emc.a = test_emc2.a Rows out: Avg 833.4 rows x 24 workers. Max 862 rows (seg7) with 430 ms to first row, 431 ms to end, start offset by 1.580 ms. Executor memory: 20K bytes avg, 21K bytes max (seg7). Work_mem used: 20K bytes avg, 21K bytes max (seg7). (seg7) Hash chain length 2.0 avg, 2 max, using 431 of 524341 buckets. -> Seq Scan on test_emc (cost=0.00..1120.00 rows=4167 width=4) Rows out: Avg 4166.7 rows x 24 workers. Max 4187 rows (seg7) with 0.090 ms to first row, 1.287 ms to end, start offset by 1.583 ms. -> Hash (cost=440.02..440.02 rows=834 width=4)Rows in: Avg 833.4 rows x 24 workers. Max 862 rows (seg7) with 4.252 ms to end, start offset by 425 ms. -> Seq Scan on test_emc2 (cost=0.00..440.02 rows=834 width=4) Rows out: Avg 833.4 rows x 24 workers. Max 862 rows (seg7) with 2.314 ms to first row, 3.895 ms to end, start offset by 425 ms. Slice statistics: (slice0) Executor memory: 311K bytes. (slice1) Executor memory: 8457K bytes avg x 24 workers, 8457K bytes max (seg0). Work_mem: 21K bytes max. Statement statistics: Memory used: 128000K bytes Total runtime: 561.475 ms (20 rows) Time: 562.537 ms Here most the stuff should be self explanatory if you have read my above example , only stuff to talk about is the line (seg7) Hash chain length 2.0 avg, 2 max, using 431 of 524341 buckets. I would not be describing much about this , if you know about hash table it basically means hash table has distributed the data in the form of buckets and here it has used 431 buckets out of 524341 ( read more on hash table from the link ) and each of length 2 on avg ( read on hash chain length at link). |
Granting DDL priv to another user in Greenplum
Env : Schema1 --> Project schema fullrole -> grants all access on schema1 rorole -> grants usage on schema1 User1 ---> is a project DBA user and owner of the the schema object on schema 1 User2 ---> is project developer and wanted to create tables, alter tables owned by User1 As we know, there is no ddl level grants in Greenplum. There are many developers in the project and project dba wants to give ddl level access on project schema to some senior developers like user2. Answer: when I look at the grant command, I do not see any grant that gives user2 to create/alter schema objects owned by user1. See the alter command below. Command: GRANT Description: define access privileges Syntax: GRANT { { SELECT | INSERT | UPDATE | DELETE | REFERENCES | TRIGGER } [,...] | ALL [ PRIVILEGES ] } ON [ TABLE ] tablename [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT { { USAGE | SELECT | UPDATE } [,...] | ALL [ PRIVILEGES ] } ON SEQUENCE sequencename [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT { { CREATE | CONNECT | TEMPORARY | TEMP } [,...] | ALL [ PRIVILEGES ] } ON DATABASE dbname [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT { EXECUTE | ALL [ PRIVILEGES ] } ON FUNCTION funcname ( [ [ argmode ] [ argname ] argtype [, ...] ] ) [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT { USAGE | ALL [ PRIVILEGES ] } ON LANGUAGE langname [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT { { CREATE | USAGE } [,...] | ALL [ PRIVILEGES ] } ON SCHEMA schemaname [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT { CREATE | ALL [ PRIVILEGES ] } ON TABLESPACE tablespacename [, ...] TO { username | GROUP groupname | PUBLIC } [, ...] [ WITH GRANT OPTION ] GRANT role [, ...] TO username [, ...] [ WITH ADMIN OPTION ] GRANT { SELECT | INSERT | ALL [PRIVILEGES] } ON PROTOCOL protocolname TO username The simple trick to solve this problem is grant user1 to user2; In this way user2 can create/alter schema objects on user1 schema . |
What is Disk Spill in Greenplum
What is Disk Spill Running a SQL usually requires the database to allocate some working memory to the process(es) that execute a SQL. The memory is especially important for steps that sort data or build a transient in-memory hash tables (for joins or aggregations). When there are many active SQLs (or very large SQL) query that require working memory, each will get a smaller piece of the memory (or else the system will start swapping). So, a SQL that could typically join or sort in a single memory pass might start spilling temporary results to disk during high concurrency, dramatically affecting its run time and its resource usage (extra I/O to write and later read temporary data). So, another effect of “too much” concurrency is potentially making each SQL work a lot harder (reducing overall throughput)Greenplum Database creates work files on disk if it does not have sufficient memory to execute the query in memory. This information can be used for troubleshooting and tuning queries. For example:EXPLAIN ANALYZE SELECT * FROM .....; Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers. ================================================================================================================== EXPLAIN displays the query plan that the Greenplum planner generates for the supplied statement. Query plans are a tree plan of nodes. Each node in the plan represents a single operation, such as table scan, join, aggregation or a sort. Plans should be read from the bottom up as each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations (sequential, index or bitmap index scans). If the query requires joins, aggregations, or sorts (or other operations on the raw rows) then there will be additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually the Greenplum Database motion nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the operations responsible for moving rows between the segment instances during query processing. The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the following cost estimates that the planner made for the execution of that plan node: 1. cost - measured in units of disk page fetches; that is, 1.0 equals one sequential disk page read. The first estimate is the start-up cost (cost of getting to the first row) and the second is the total cost (cost of getting all rows). Note that the total cost assumes that all rows will be retrieved, which may not always be the case (if using LIMIT for example). 2. rows - the total number of rows output by this plan node. This is usually less than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally the top-level nodes estimate will approximate the number of rows actually returned, updated, or deleted by the query. 3. width - total bytes of all the rows output by this plan node. It is important to note that the cost of an upper-level node includes the cost of all its child nodes. The topmost node of the plan has the estimated total execution cost for the plan. This is this number that the planner seeks to minimize. It is also important to realize that the cost only reflects things that the query planner cares about. In particular, the cost does not consider the time spent transmitting result rows to the client. EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information: The total elapsed time (in milliseconds) that it took to run the query. • The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted. • The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the one with the longest time to end is the one chosen. • The segment id number of the segment that produced the most rows for an operation. • For relevant operations, the work_mem used by the operation. If work_mem was not sufficient to perform the operation in memory, the plan will show how much data was spilled to disk and how many passes over the data were required for the lowest performing segment. For example: Work_mem used: 64K bytes avg, 64K bytes max (seg0). Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile I/O affecting 2 workers. [seg0] pass 0: 488 groups made from 488 rows; 263 rows written to workfile [seg0] pass 1: 263 groups made from 263 rows • The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that segment. The <time> to first row may be omitted if it is the same as the <time> to end. Very Important: Keep in mind that the statement is actually executed when EXPLAIN ANALYZE is used. Although EXPLAIN ANALYZE will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on a DML statement without letting the command affect your data, use this approach: BEGIN; EXPLAIN ANALYZE ...; ROLLBACK; ==================================================================================================================== Note that the bytes wanted message from EXPLAIN ANALYZE is only a hint, based on the amount of data written to work files and is not exact. The minimum work_mem needed could be more or less than the suggested value. The output will show the plan used but a key item to look for is “Work_mem wanted”. When you see this, it means that Greenplum had to spill to disk because there wasn’t enough memory available. The best approach is likely to rewrite the query. Alternatively, you can increase the amount of memory available. The “auto” setting allows you to increase or decrease the amount of memory a query will use by changing the “statement_mem” value. The maximum value you can set for statement_mem is determined by “max_statement_mem”. The default max_statement_mem is 2MB. On the Master, execute the following to increase the statement_mem:gpconfig -c max_statement_mem -v 8GB gpstop -u Now, you can change the memory setting in your session. You can also do this with gpconfig to make the setting for all sessions.set gp_resqueue_memory_policy = auto; set statement_mem = '4GB'; Re-run your query and see if it executes faster and if it still has “bytes wanted” in the query plan. Compressed Work Files If you know you are spilling to disk when executing queries because EXPLAIN ANALYZE showed that more bytes were wanted than available, you can trade CPU for IO by compressing the work files. This is is done with “gp_workfile_compress_algorithm”. The default value is “none” but you can change this to “zlib”. It can be done at the session or with gpconfig to make it system wide. Temporary Tables Another way to deal with very large queries that spill to disk is to use temporary tables that are compressed. This is ideal when you use a subquery that is then joined to other tables. If you know it is spilling to disk (again from EXPLAIN ANALYZE showing more bytes wanted than available), you can populate a compressed temporary table instead. For example:CREATE TEMPORARY TABLE foo (myid int, bar text) WITH (APPENDONLY=true, COMPRESSTYPE=quicklz) ON COMMIT DROP DISTRIBUTED BY (myid); The gp_workfile_* views show information about all the queries that are currently using disk spill space. The information in the views can also be used to specify the values for the Greenplum Database configuration parameters gp_workfile_limit_per_query and gp_workfile_limit_per_segment. Greenplum Database configuration parameters
Workfile Disk Spill Space Information In Greenplum Database 4.3 gp_workfile_* views in the gp_toolkit administrative schema contain show information about all the queries that are currently using disk spill space. Previously in 4.2.x.x releases, you created the views by running SQL scripts. Let look into details about these view. 1. gp_workfile_entries 2. gp_workfile_usage_per_query 3. gp_workfile_usage_per_segment 1. gp_workfile_entries This view contains one row for each operator using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access. Column description of this view command_cnt->Command ID of the query. content->The content identifier for a segment instance. current_query->Current query that the process is running. datname->Greenplum database name. directory->Path to the work file. optype-> The query operator type that created the work file. procpid->Process ID of the server process. sess_id->Session ID. size->The size of the work file in bytes. numfiles->The number of files created. slice-> The query plan slice. The portion of the query plan that is being executed. state-> The state of the query that created the work file. usename-> Role name. workmem->The amount of memory allocated to the operator in KB. 2. gp_workfile_usage_per_query This view contains one row for each query using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access. Column description of this view command_cnt-> Command ID of the query. content-> The content identifier for a segment instance. current_query-> Current query that the process is running. datname-> Greenplum database name. procpid-> Process ID of the server process. sess_id-> Session ID. size -> The size of the work file in bytes. numfiles-> The number of files created. state-> The state of the query that created the work file. usename -> Role name. 3. gp_workfile_usage_per_segment This view contains one row for each segment. Each row displays the total amount of disk space used for workfiles on the segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access Column description of this view content-> The content identifier for a segment instance. size-> The total size of the work files on a segment. numfiles-> The number of files created. Oracle vs Greenplum Oracle introduced statement queuing in version 11g Release 2 (and later enhanced it in 11.2.0.2). However, in their case, it is bundled with a bunch of other new parallelism features (automatic DOP and in-memory parallel execution), so it is unfortunately more complex than necessary. In Oracle, the system-wide number of parallel process slaves is fixed and the engine tries to automatically find the optimal per-SQL parallelism based on the current system load before each execution. The DBA controls various parameters (globally and per resource group) to try to tame the beast. Greenplum Database uses different model. The degree of per-SQL parallelism is fixed. The administrator simply chooses how many active SQLs are allowed per resource queue (group) – if more SQLs are submitted to a queue, they will wait until a slot is available. The administrator can also specify a minimal cost threshold (per resource queue) – to allow quick queries to bypass the queuing mechanism (and of course prioritize between queues).So, to sum it up, “too much” concurrency does hurt database performance. Luckily, it can be handled by proper setup in many modern databases – using statement prioritization and statement queuing. Greenplum is pretty easy to manage memory because it has been designed to leverage the OS caching. The default Eager Free Memory Policy works very well for most of the queries in the database. However, if you do see queries still need more memory than is available, you can set the memory policy to auto and increase the statement_mem. If you are still spilling to disk because your statement needs more memory, you can have Greenplum automatically compress work files or use compressed temporary tables. |
1-10 of 233