Greenplum DBA Quick Reference website will no longer be available on public domain. Email us at for further details.  Become a registered member of and you will get full access to all of our blog posts, white papers, Greenplum Database eBooks and much more.

Fixing gp_bloat_diag view - showing bloated tables even there are no bloats on those tables

posted Aug 30, 2018, 11:12 AM by Sachchida Ojha

Recently we found that gp_toolkit.gp_bloat_diag view shows some bloated tables which were recently reorg (vacuum full) and analyzed .

Problem was due to incorrect data reported by gp_toolkit.gp_bloat_expected_pages called by gp_toolkit.gp_bloat_diag.  Fix is to recreate gp_toolkit.gp_bloat_expected_pages using following SQL. This fix came from Pivotal support and will be part of next release.

create or replace view  gp_toolkit.gp_bloat_expected_pages as  SELECT subq.btdrelid, subq.btdrelpages,
            WHEN subq.btdexppages < subq.numsegments::numeric THEN subq.numsegments::numeric
            ELSE subq.btdexppages
        END AS btdexppages
   FROM ( SELECT pgc.oid AS btdrelid, pgc.relpages AS btdrelpages, ceil((pgc.reltuples * (25::double precision + btwcols.width))::numeric / current_setting('block_size'::text)::numeric) AS btdexppages, ( SELECT __gp_number_of_segments.numsegments
                   FROM gp_toolkit.__gp_number_of_segments) AS numsegments
           FROM ( SELECT pgc.oid, pgc.reltuples, pgc.relpages
                   FROM pg_class pgc
                  WHERE NOT EXISTS ( SELECT __gp_is_append_only.iaooid
                           FROM gp_toolkit.__gp_is_append_only
                          WHERE __gp_is_append_only.iaooid = pgc.oid AND __gp_is_append_only.iaotype = true)
                 AND NOT EXISTS
                    SELECT parrelid
                    FROM pg_partition
                    WHERE parrelid = pgc.oid
                          ) pgc
      LEFT JOIN ( SELECT pgs.starelid, sum(pgs.stawidth::double precision * (1.0::double precision - pgs.stanullfrac)) AS width
                   FROM pg_statistic pgs
                  GROUP BY pgs.starelid) btwcols ON pgc.oid = btwcols.starelid
     WHERE btwcols.starelid IS NOT NULL) subq;

Dynamic UNIX script to change the owner of tables in Greenplum

posted Oct 25, 2014, 4:57 PM by Sachchida Ojha   [ updated Oct 26, 2014, 6:34 AM ]

-- Change owner of all tables of a given schema

for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname = 'public';" sachi` ; do  psql -c "alter table $tbl owner to new_user"  sachi; done

-- list all tables of a schema

for tbl in `psql -qAt -c "select tablename from pg_tables where schemaname = 'prod';" sachi` ; do  echo $tbl; done

PGOPTIONS='--client-min-messages=warning' psql -X -q -a -1 -v ON_ERROR_STOP=1 --pset pager=off -d mydb -f script.sql

Checking bloats with customized scripts

posted Oct 23, 2014, 6:08 PM by Sachchida Ojha   [ updated Oct 23, 2014, 6:08 PM ]

gp_bloat_diag view helps to identify tables that need routine table maintenance (VACUUM and/or ANALYZE).

See also : Creating a test case for bloat and understanding how it works in Greenplum

gpadmin-# \d gp_toolkit.gp_bloat_diag
View "gp_toolkit.gp_bloat_diag"
Column | Type | Modifiers
bdirelid | oid |
bdinspname | name |
bdirelname | name |
bdirelpages | integer |
bdiexppages | numeric |
bdidiag | text |
View definition:
SELECT bloatsummary.btdrelid AS bdirelid, bloatsummary.fnnspname AS bdinspname, bloatsummary.fnrelname AS bdirelname, bloatsummary.btdrelpages AS bdirelpages, bloatsummary.btdexppages AS bdiexppages, ( AS bdidiag
FROM ( SELECT fn.fnoid, fn.fnnspname, fn.fnrelname, beg.btdrelid, beg.btdrelpages, beg.btdexppages, gp_toolkit.gp_bloat_diag(beg.btdrelpages, beg.btdexppages::integer, iao.iaotype) AS bd
FROM gp_toolkit.gp_bloat_expected_pages beg, pg_class pgc, gp_toolkit.__gp_fullname fn, gp_toolkit.__gp_is_append_only iao
WHERE beg.btdrelid = pgc.oid AND pgc.oid = fn.fnoid AND iao.iaooid = pgc.oid) bloatsummary
WHERE ( > 0;

bdirelid=>Table object id.
bdinspname=>Schema name.
bdirelname=>Table name.
bdirelpages=>Actual number of pages on disk.
bdiexppages=>Expected number of pages given the table data.
bdidiag=>Bloat diagnostic message.

gpadmin=# select * from gp_toolkit.gp_bloat_diag;
bdirelid | bdinspname | bdirelname | bdirelpages | bdiexppages | bdidiag
(0 rows)

This view shows tables that have bloat (the actual number of pages on disk exceeds the expected number of pages given the table statistics). Tables that are bloated require a VACUUM or a VACUUM FULL in order to reclaim disk space occupied by deleted or obsolete rows. This view is accessible to all users, however non-superusers will only be able to see the tables that they have permission to access.

You can also use following SQL  to check the bloat.


current_database() as dbname
, relid
, schemaname
, tablename
, round(bloat,1) as bloat_ratio
, pg_size_pretty(expbytes) as expected_size
, pg_size_pretty(relbytes) as relation_size
, pg_size_pretty(wastedbytes) as wasted_space
, round(1.0 * relbytes / nullif((sum(relbytes) over ())::bigint,0),4) as relative_size
, round((bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)), 8) as concern
, schemaname
, tablename
, CASE WHEN otta=0 THEN 
END AS bloat
, reltuples::bigint
, relpages::bigint
, otta
, (bs*otta)::bigint as expbytes
, CASE WHEN relpages < otta THEN 0 ELSE 
END AS wastedbytes
, CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages
, (bs*relpages)::bigint as relbytes
, tablename
, cc.oid as relid
, cc.reltuples
, cc.relpages
, bs
, CEIL((cc.reltuples*((datahdr+ma-(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta 
, bs
, schemaname
, tablename
, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr
, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
, tablename
, hdr
, ma
, bs
, SUM((1-s.null_frac)*avg_width) AS datawidth
, MAX(s.null_frac) AS maxfracsum
, hdr+(1+(count(case when s.null_frac<>0 then 1 end))/8) as nullhdr
pg_stats s
cross join 
current_setting('block_size')::numeric AS bs
, CASE WHEN substring(version(),12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr
, CASE WHEN version() ~ 'mingw32' THEN 8 ELSE 4 END AS ma
) AS constants
GROUP BY schemaname, tablename, hdr, ma, bs
) AS foo
) AS rs
JOIN pg_class cc 
cc.relname = rs.tablename 
and cc.relkind = 'r'
JOIN pg_namespace nn 
cc.relnamespace = nn.oid 
AND nn.nspname = rs.schemaname 
) AS sml
) wrapper
where wastedbytes > 2*1024*1024 and bloat >= 1.4order by (bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)) desc;

Greenplum gpconfig command to get and set GUC parameters

posted Oct 11, 2014, 8:42 AM by Sachchida Ojha

[gpadmin@sachi ~]$ gpconfig -?
COMMAND NAME: gpconfig

Sets server configuration parameters on all segments within 
a Greenplum Database system.


gpconfig -c <param_name> -v <value> [-m <master_value> | --masteronly]
       | -r <param_name> [--masteronly]
       | -l 
   [--skipvalidation] [--verbose] [--debug]

gpconfig -s <param_name> [--verbose] [--debug]

gpconfig --help


The gpconfig utility allows you to set, unset, or view configuration parameters from the postgresql.conf files of all instances (master, segments, and mirrors) in your Greenplum Database system. When setting a parameter, you can also specify a different value for the master if necessary. For example, parameters such as max_connections require 
a different setting on the master than what is used for the segments. 

If you want to set or unset a global or master only parameter, use the --masteronly option.

gpconfig can only be used to manage certain parameters. For example, you cannot use it to set parameters such as port, which is required to be distinct for every segment instance. Use the -l (list) option to see a complete list of configuration parameters supported by gpconfig.

When gpconfig sets a configuration parameter in a segment postgresql.conf file, the new parameter setting always displays at the bottom of the file. When you use gpconfig to remove a configuration parameter setting, gpconfig comments out the parameter in all segment postgresql.conf files, thereby restoring the system default setting. 
For example, if you use gpconfig to remove (comment out) a parameter and later add it back (set a new value), there will be two instances of the parameter; one that is commented out, and one that is enabled and inserted at the bottom of the postgresql.conf file.

After setting a parameter, you must restart your Greenplum Database system or reload the postgresql.conf files in order for the change to take effect. Whether you require a restart or a reload depends on the parameter. See the Server Configuration Parameters reference for more information about the server configuration parameters.

To show the currenly set values for a parameter across the system, 
use the -s option.

gpconfig uses the following environment variables to connect to the Greenplum Database master instance and obtain system configuration information: 


-c | --change <param_name>

Changes a configuration parameter setting by adding the new setting to the bottom of the postgresql.conf files.

-v | --value value

The value to use for the configuration parameter you specified with the -c option. By default, this value is applied to all segments, their mirrors, the master, and the standby master.

-m | --mastervalue master_value

The master value to use for the configuration parameter you specified with the -c option. If specified, this value only applies to the master and standby master. The option can only be used with -v. 


When specified, gpconfig only edits the master postgresql.conf file.

-r | --remove <param_name>

Specifies the configuration parameter name to unset or remove by commenting out the entry in the postgresql.conf files.

-l | --list

Lists all configuration parameters supported by the gpconfig utility.

-s | --show <param_name>

Shows the value for a configuration parameter used on all instances (master and segments) in the Greenplum Database system. If there is a discrepancy in a parameter value between segment instances, the gpconfig utility displays an error message.

Note that the gpconfig utility reads parameter values directly from the database, and not the postgresql.conf file. If you are using gpconfig to set configuration parameters across all segments, then running gpconfig -s to verify the changes, you might still see the previous (old) values. You must reload the configuration files (gpstop -u) or restart the system (gpstop -r) for changes to take effect.


Overrides the system validation checks of gpconfig and allows you to operate on any server configuration parameter, including hidden parameters and restricted parameters that cannot be changed by gpconfig. When used with the -l option (list), it shows the list of restricted parameters. This option should only be used to set parameters when directed by Greenplum Customer Support.


Displays additional log information during gpconfig command execution.


Sets logging output to debug level. 

-? | -h | --help

Displays the online help.


Set the work_mem setting to 120MB on the master only:

gpconfig -c work_mem -v 120MB --masteronly

Set the max_connections setting to 100 on all segments and 10 on the master:

gpconfig -c max_connections -v 100 -m 10

Comment out all instances of the default_statistics_target configuration parameter, and restore the system default:

gpconfig -r default_statistics_target

List all configuration parameters supported by gpconfig:

gpconfig -l

Show the values of a particular configuration parameter across the system:

gpconfig -s max_connections



Troubleshoot and fix gpfdist process dying abruptly when multiple gpload session starts from Informatica power center

posted Oct 11, 2014, 8:27 AM by Sachchida Ojha

Debugging little more and looking at the log we found that Informatica is launching 3 gpload in parallel and each gpload process is launching have 10 gpfdist processes. The gpload processes and their associated gpfdist processes will all run on the same server that happens to be running the PowerCenter job. Informatica writes to a named pipe and gpfdist reads from the named pipe. We suspect the 30 gpload processes are stressing the kernel or network buffers/devices which is causing the connection fault. Typically when loading data with this many processes we would spread the gpfdist process across multiple servers. We may only need to tweak the kernel OS params but we need the logs to know for sure.

To get the complete debugging click here

Greenplum database maintenance and monitoring - do it yourself script

posted Oct 9, 2014, 8:07 AM by Sachchida Ojha

A custom designed, must have  script for all greenplum DBA. To get the script email us

==Greenplum database maintenance and monitoring Option ==
 Please read the menu options carefully before selection as our menu option may have changed. #
Choose one of the following option:
1  - Run Daily Maintenance
2  - Run Weekly Maintenance
3  - Run Catalog Maintenance
4  - Check and fix catalog corruption
5  - DCA Health Report
6  - Database State Monitoring Report
7  - Database Alert Log Monitoring Report
8  - DCA Space Monitoring Report
9  - Missing Table Statistics Report
10 - Bloat Check 
11 - Skew Check 
12 - Check locks on table
13 - Check tables with RANDOM distribution policy
14 - Check GUC parameter value
15 - Backup database using ddboost
16 - Archive log files older than threshold limit
17 - Cleanup orphaned temp schema
18 - Display DCA master Disk Usage
19 - Display DCA master CPU Utilization
20 - Display DCA master Memory Utilization
21 - Schema Space Monitoring Report 
22 - DCA, Database, OS version Report 
23 - Check tables distribution policy
24 - Check table skew
25 - Check running queries
26 - Check resource queue settings
27 - TOP 20 resource intensive SQL 2 days
28 - Most active users
29 - Most used resource queues
97 - Help and Support
98 - Show configuration setting
99 - Logout 
Enter your option: 

Reading QUERY PLAN output in Greenplum

posted Feb 9, 2014, 12:54 PM by Sachchida Ojha

Query plans are a tree plan of nodes. Each node in the plan represents a single operation, such as table scan, join, aggregation or a sort. Plans should be read from the bottom up as each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations (sequential, index or bitmap index scans). If the query requires joins, aggregations, or sorts (or other operations on the raw rows) then there will be additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually the Greenplum Database motion nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the operations responsible for moving rows between the segment instances during query processing.

*** Read more about different type of motion such as redistribute, explicit redistribute, broadcast, or gather motions, visit

The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the following cost estimates that the planner made for the execution of that plan node:

1. cost - measured in units of disk page fetches; that is, 1.0 equals one sequential disk page read. The first estimate is the start-up cost (cost of getting to the first
row) and the second is the total cost (cost of getting all rows). Note that the total cost assumes that all rows will be retrieved, which may not always be the case (if
using LIMIT for example).

2. rows - the total number of rows output by this plan node. This is usually less than the actual number of rows processed or scanned by the plan node, reflecting the
estimated selectivity of any WHERE clause conditions. Ideally the top-level nodes estimate will approximate the number of rows actually returned, updated, or
deleted by the query.

3. width - total bytes of all the rows output by this plan node.

It is important to note that the cost of an upper-level node includes the cost of all its child nodes. The topmost node of the plan has the estimated total execution cost for the plan. This is this number that the planner seeks to minimize. It is also important to realize that the cost only reflects things that the query planner cares about. In particular, the cost does not consider the time spent transmitting result rows to the client.

If a query is performing poorly, looking at its query plan can help identify problem areas. Here are some things to look for:

1. Is there one operation in the plan that is taking exceptionally long? When looking through the query plan, is there one operation that is consuming the majority of the query processing time? For example, if an index scan is taking longer than expected, perhaps the index is out-of-date and needs to be reindexed. You could also temporarily experiment with the enable_ parameters to see if you can force the planner to choose a different (and potentially better) plan by disabling a particular query plan operator for that query.

2. Are the planner’s estimates close to reality? Run an EXPLAIN ANALYZE and see if the number of rows estimated by the planner is close to the number of rows actually returned by the query operation. If there is a huge discrepancy, you may need to collect more statistics on the relevant columns. 

3.Are selective predicates applied early in the plan? The most selective filters should be applied early in the plan so that less rows move up the plan tree. If the query plan is not doing a good job at estimating the selectivity of a query predicate, you may need to collect more statistics on the relevant columns. You can also try reordering the WHERE clause of your SQL statement.

4. Is the planner choosing the best join order? When you have a query that joins multiple tables, make sure that the planner is choosing the most selective join
order. Joins that eliminate the largest number of rows should be done earlier in the plan so that less rows move up the plan tree. If the plan is not choosing the optimal join order, you can set join_collapse_limit=1 and use explicit JOIN syntax in your SQL statement to force the planner to the specified join order. You can also collect more statistics on the relevant join columns.

5. Is the planner selectively scanning partitioned tables? If you are using table partitioning, is the planner selectively scanning only the child tables required to satisfy the query predicates? Do scans of the parent tables return 0 rows (they should, since the parent tables should not contain any data).

6. Is the planner choosing hash aggregate and hash join operations where applicable? Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting is done in memory rather than reading/writing from disk. In order for hash operations to be chosen, there has to be sufficient work memory available to hold the number of estimated rows. Try increasing work memory to see if you can get better performance for a given query. If possible run an EXPLAIN ANALYZE for the query, which will show you which plan operations spilled to disk, how much work memory they used, and how much was required to not spill to disk. For example:

Work_mem used: 23730K bytes avg, 23470K bytes max (seg0).
Work_mem wanted: 23659K bytes avg, 23659K bytes max (seg0) to lessen workfile I/O affecting 2 workers.

Note that the bytes wanted message from EXPLAIN ANALYZE is only a hint, based on the amount of data written to work files and is not exact. The minimum work_mem needed could be more or less than the suggested value.

Identifying Statistics Problems in Query Plans
When looking at the query plan for a query using EXPLAIN or EXPLAIN ANALYZE, it helps to know your data in order to identify possible statistics problems. Check the plan for the following indicators of inaccurate statistics:
1. Are the planner’s estimates close to reality? Run an EXPLAIN ANALYZE and see if the number of rows estimated by the planner is close to the number of rows actually returned by the query operation.
2. Are selective predicates applied early in the plan? The most selective filters should be applied early in the plan so that less rows move up the plan tree.
3. Is the planner choosing the best join order? When you have a query that joins multiple tables, make sure that the planner is choosing the most selective join order. Joins that eliminate the largest number of rows should be done earlier in the plan so that less rows move up the plan tree.

Reclaim all expired row space in Greenplum

posted Feb 9, 2014, 12:39 PM by Sachchida Ojha

A VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed Greenplum Database tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table. A VACUUM FULL is not recommended in Greenplum Database.

Option 1.
This leaves any indexes etc. in tact but rewrites the underlying table and, hence, eliminates all dead data.

Option 2
SET gp_auto_stats_mode = none;

This option writes the data one time.

Option 3
Create temp table t1 as select * from <Table Name>;
Truncate <Table Name>;
set gp_autostats_mode = none;
Insert into <Table Name> select * from t1;
Drop table t1;
analyze <Table Name>;

This option writes the data 2 times.

What happens when you run a SQL statement in Greenplum?

posted Feb 8, 2014, 7:46 AM by Sachchida Ojha

When a user runs a SQL query in greenplum, it is the master who receives the SQL. Master then parse the query , optimize the query, creates execution plan of the query also called query plan. 

Query plan can be either parallel or targeted. Master then dispatched the plan to the segments for execution. Each segment is then responsible for executing local database operations on its own particular set of data. 

Most database operations—such as table scans, joins, aggregations, and sorts—execute in parallel across the segments simultaneously. Each operation is performed on a segment database independent of the data associated with the other segment databases.

There are certain queries that may only access data on a single segment, such as single-row INSERT, UPDATE, DELETE or SELECT operations, or queries that return a small set of rows and filter on the table distribution key column(s). In queries such as these, the query plan is not dispatched to all segments, but is targeted to the segment that contains the affected row(s).

A query plan is the set of operations that Greenplum Database will perform to produce the answer to a given query. Each node or step in the plan represents a database operation such as a table scan, join, aggregation or sort. Plans are read and executed from bottom to top.

In addition to the typical database operations (tables scans, joins, etc.), Greenplum Database has an additional operation type called a motion. 

A motion operation involves moving tuples between the segments during query processing. Note that not every query requires a motion. For example, a query of the system catalog tables on the master does not require data to move across the interconnect.

In order to achieve maximum parallelism during query execution, Greenplum divides the work of the query plan into slices.

A slice is a portion of the plan that can be worked on independently at the segment-level. A query plan is sliced wherever a motion operation occurs in the plan, one slice on each side of the motion.

Some commonly used terms in greenplum query execution plan.

1. Gather motion: A gather motion is when the segments send results back up to the master for presentation to the client.

2. Redistribute motion: redistribute motion moves tuples between the segments in order to complete the join.

3. Query dispatcher (QD): This process is created in the master. Query dispatcher also called query worker process is responsible for creating and dispatching the query plan, and for accumulating and presenting the final results.

4. Query executor (QE): This process is created on the segments. Query executor also called query worker process is responsible for completing its portion of work and communicating its intermediate results to the other worker processes. For each slice of the query plan there is at least one worker process assigned. A worker process works on its assigned portion of the query plan independently. During query execution, each segment will have a number of processes working on the query in parallel.

5. Gangs: Related processes that are working on the same portion of the query plan are referred to as gangs.

6. Interconnect: As a portion of work is completed, tuples flow up the query plan from one gang of processes to the next. This inter-process communication between the segments is what is referred to as the interconnect component of Greenplum Database.

1-10 of 14