Greenplum database - GPDB

Greenplum Database is a massively parallel processing (MPP) database server based on PostgreSQL open-source technology. MPP (also known as a shared nothing architecture) refers to systems with two or more processors which cooperate to carry out an operation - each processor with its own memory, operating system and disks.

Greenplum is a distributed database system. This means that the data stored in the database system is physically located on more than one database server (referred to as segments in Greenplum). These individual database servers are connected by a communications network (referred to as the interconnect in Greenplum). An essential feature of a true distributed database is that users and client programs work as if they were accessing one single database on a local machine (in Greenplum, this entry-point database is referred to as the master). The fact that the database is distributed across several machines is seamless to the users of the system.


Greenplum leverages this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and is able to use all of a system’s resources in parallel to process a query.

High-Level Greenplum Database Architecture

Greenplum Database is essentially several PostgreSQL database instances acting together as one cohesive database management system. It is based on PostgreSQL8.2.15, and in most cases is very similar to PostgreSQL with regards to SQL support,features, configuration options, and end-user functionality. Database users interact with Greenplum Database as they would a regular PostgreSQL DBMS. The internals of PostgreSQL have been modified or supplemented to support the parallel structure of Greenplum Database. For example the system catalog, query planner, optimizer, query executor, and transaction manager components have been modified and enhanced to be able to execute queries in parallel across all of the PostgreSQL database instances at once. The Greenplum interconnect (the networking layer) enables communication between the distinct PostgreSQL instances and allows the system to behave as one logical database.

Greenplum Database also includes features designed to optimize PostgreSQL for business intelligence (BI) workloads. For example, Greenplum has added parallel data loading (external tables), resource management, query optimizations and storage enhancements which are not found in regular PostgreSQL. Many features and optimizations developed by Greenplum do make their way back into the PostgreSQL community. For example, table partitioning is a feature developed by Greenplum which is now in standard PostgreSQL.



SQL query is performing slow in Greenplum database, What to look?

posted Sep 14, 2012, 5:00 PM by Sachchida Ojha   [ updated Sep 14, 2012, 5:00 PM ]

If your SQL query is performing slow, looking at its query plan can help identify problem areas. Here are some things to look for:

1. Is there one operation in the plan that is taking exceptionally long? When looking through the query plan, is there one operation that is consuming the majority of the query processing time? For example, if an index scan is taking longer than expected, perhaps the index is out-of-date and needs to be reindexed. You could also temporarily experiment with the enable_ parameters to see if you can force the planner to choose a different (and potentially better) plan by disabling a particular query plan operator for that query.

2. Are the planner’s estimates close to reality? Run an EXPLAIN ANALYZE and see if the number of rows estimated by the planner is close to the number of rows actually returned by the query operation. If there is a huge discrepancy, you may need to collect more statistics on the relevant columns. 

3. Are selective predicates applied early in the plan? The most selective filters should be applied early in the plan so that less rows move up the plan tree. If the query plan is not doing a good job at estimating the selectivity of a query predicate, you may need to collect more statistics on the relevant columns.  You can also try reordering the WHERE clause of your SQL statement.

4. Is the planner choosing the best join order? When you have a query that joins multiple tables, make sure that the planner is choosing the most selective join order. Joins that eliminate the largest number of rows should be done earlier in the plan so that less rows move up the plan tree. If the plan is not choosing the optimal join order, you can set join_collapse_limit=1 and use explicit JOIN syntax in your SQL statement to force the planner to the specified join order. You can also collect more statistics on the relevant join columns. 

5. Is the planner selectively scanning partitioned tables? If you are using table partitioning, is the planner selectively scanning only the child tables required to satisfy the query predicates? Do scans of the parent tables return 0 rows (they should, since the parent tables should not contain any data). 

6. Is the planner choosing hash aggregate and hash join operations where applicable? Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting is done in memory rather than reading/writing from disk. In order for hash operations to be chosen, there has to be sufficient work memory available to hold the number of estimated rows. Try increasing work memory to see if you can get better performance for a given query. If possible run an EXPLAIN ANALYZE for the query, which will show you which plan operations spilled to disk, how much work memory they used, and how much was required to not spill to disk. For example:

Work_mem used: 23430K bytes avg, 23430K bytes max (seg0).
Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers.

Note that the bytes wanted message from EXPLAIN ANALYZE is only a hint, based on the amount of data written to work files and is not exact. The minimum work_mem needed could be more or less than the suggested value.


Reading EXPLAIN ANALYZE Output in Greenplum database

posted Sep 14, 2012, 4:53 PM by Sachchida Ojha

EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information:

1. The total elapsed time (in milliseconds) that it took to run the query.

2. The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.

3. The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the one with the longest time to end is the one chosen.

4. The segment id number of the segment that produced the most rows for an operation.

5. For relevant operations, the work_mem used by the operation. If work_mem was not sufficient to perform the operation in memory, the plan will show how much data was spilled to disk and how many passes over the data were required for the lowest performing segment. For example:

Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile I/O affecting 2 workers.
[seg0] pass 0: 488 groups made from 488 rows; 263 rows written to workfile
[seg0] pass 1: 263 groups made from 263 rows

6. The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that segment. The <time> to first row may be omitted if it is the same as the <time> to end.

EXPLAIN ANALYZE Example

To illustrate how to read an EXPLAIN ANALYZE query plan, we will use the same simple query we used in the “EXPLAIN Example” on page 130. Notice that there is some additional information in this plan that is not in a regular EXPLAIN plan. The parts of the plan in bold show the actual timing and rows returned for each plan node:

EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
recv: Total 1 rows with 0.305 ms to first row, 0.537 ms to end.
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
Total 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end.
Filter: name::text ~~ 'Joelle'::text
22.548 ms elapsed

If we read the plan from the bottom up, you will see some additional information for each plan node operation. The total elapsed time it took to run this query was 22.548 milliseconds.

The sequential scan operation had only one segment (seg0) that returned rows, and it returned just 1 row. It took 0.255 milliseconds to find the first row and 0.486 to scan all rows. Notice that this is pretty close to the planner’s estimate — the query planner estimated that it would return one row for this query, which it did. The gather motion operation then received 1 row (segments sending up to the master). The total elapsed time for this operation was 0.537 milliseconds.

Reading EXPLAIN output in Greenplum database

posted Sep 14, 2012, 4:36 PM by Sachchida Ojha

Query plans are a tree plan of nodes. Each node in the plan represents a single operation, such as table scan, join, aggregation or a sort.

Plans should be read from the bottom up as each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations (sequential, index or bitmap index scans). If the query requires joins, aggregations, or sorts (or other operations on the raw rows) then there will be additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually the Greenplum Database motion nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the operations responsible for moving rows between the segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the following cost estimates that the planner made for the execution of that plan node:

1. cost - measured in units of disk page fetches; that is, 1.0 equals one sequential disk page read. The first estimate is the start-up cost (cost of getting to the first row) and the second is the total cost (cost of getting all rows). Note that the total cost assumes that all rows will be retrieved, which may not always be the case (if using LIMIT for example).

2. rows - the total number of rows output by this plan node. This is usually less than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally the top-level nodes estimate will approximate the number of rows actually returned, updated, or deleted by the query.

3. width - total bytes of all the rows output by this plan node.
It is important to note that the cost of an upper-level node includes the cost of all its child nodes. The topmost node of the plan has the estimated total execution cost for the plan. This is this number that the planner seeks to minimize. It is also important to realize that the cost only reflects things that the query planner cares about. In particular, the cost does not consider the time spent transmitting result rows to the client.

EXPLAIN Example

To illustrate how to read an EXPLAIN query plan, consider the following example for a very simple query:
EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
Filter: name::text ~~ 'Joelle'::text

If we read the plan from the bottom up, the query planner starts by doing a sequential scan of the names table. Notice that the WHERE clause is being applied as a filter condition. This means that the scan operation checks the condition for each row it scans, and outputs only the ones that pass the condition.

The results of the scan operation are passed up to a gather motion operation. In Greenplum Database, a gather motion is when segments send rows up to the master. In this case we have 2 segment instances sending to 1 master instance (2:1). This operation is working on slice1 of the parallel query execution plan. In Greenplum Database a query plan is divided into slices so that portions of the query plan can be worked on in parallel by the segments.

The estimated startup cost for this plan is 00.00 (no cost) and a total cost of 20.88 disk page fetches. The planner is estimating that this query will return one row.

Query Profiling in Greenplum database

posted Sep 14, 2012, 4:32 PM by Sachchida Ojha

Greenplum Database devises a query plan for each query it is given. Choosing the right query plan to match the query and data structure is absolutely critical for good performance. A query plan defines how the query will be executed in Greenplum Database’s parallel execution environment. By examining the query plans of poorly performing queries, you can identify possible performance tuning opportunities.

The query planner uses the database statistics it has to choose a query plan with the lowest possible cost. Cost is measured in disk I/O and CPU effort (shown as units of disk page fetches). The goal is to minimize the total execution cost for the plan.

You can view the plan for a given query using the EXPLAIN command. This will show the query planner’s estimated plan for the query. 

For example:

EXPLAIN SELECT * FROM names WHERE id=32;

EXPLAIN ANALYZE causes the statement to be actually executed, not only planned. This is useful for seeing whether the planner’s estimates are close to reality. 

For example:

EXPLAIN ANALYZE SELECT * FROM names WHERE id=32;

Troubleshooting Performance Problem in Greenplum

posted Sep 14, 2012, 4:29 PM by Sachchida Ojha

1. Checking System State:  

The gpstate utility can be used to identify failed segments. A Greenplum Database system will incur performance degradation when it has segment instances down because it requires some hosts to pick up the processing responsibilities of the downed segments.

Failed segments can be an indicator of some type of hardware failure, such as a failed disk drive or network card. Greenplum Database provides the hardware verification tool gpcheckperf to help identify segment hosts that have hardware issues.

2. Checking for Active Sessions (Workload): 

The pg_stat_activity system catalog view shows one row per server process, showing database OID, database name, process ID, user OID, user name, current query, time at which the current query began execution, time at which the process was started, and client address and port number. Querying this view can provide more information about the current workload on the system. For example:

SELECT * FROM pg_stat_activity;

This view should be queried as the database superuser to obtain the most information possible. Also note that the information does not update instantaneously.

3. Checking for Locks (Contention): 

If a transaction is holding a lock on an object, there may be other queries that are waiting for that lock to be released before they can continue. This may appear to the user as if their query is hanging. The pg_locks system catalog view allows you to view information about outstanding locks. Examining pg_locks for ungranted locks can help identify contention between database client sessions. pg_locks provides a global view of all locks in the database system, not only those relevant to the current database. Although its relation column can be joined against pg_class.oid to identify locked relations (such as tables), this will only work correctly for relations in the current database. The pid column can be joined to the pg_stat_activity.procpid to get more information on the session holding or waiting to hold a lock. For example:

SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query
FROM pg_locks l, pg_class c, pg_stat_activity a WHERE l.relation=c.oid AND l.pid=a.procpid
ORDER BY c.relname;

If you are using resource queues for workload management, queries that are waiting in a queue will also show in pg_locks. To see how many queries are waiting to run from a particular resource queue, use the pg_resqueue_status - Deprecated system catalog view. For example:
SELECT * FROM gp_tookit.gp_resqueue_status;


4. Checking Query Status and System Utilization

System monitoring utilities such as ps, top, iostat, vmstat, netstat and so on can be used to monitor database activity on the hosts in your Greenplum Database array. These tools can be used to help identify Greenplum Database processes (postgres processes) currently running on the system and the most resource intensive tasks with regards to CPU, memory, disk I/O, or network activity. Looking at these system statistics can help identify queries that are overloading the system by consuming excessive resources and thereby degrading database performance. Greenplum Database comes with an management tool called gpssh, which allows you to run these system monitoring commands on several hosts at once.

The Greenplum Performance Monitor also collects query and system utilization metrics. See the Greenplum Database Performance Monitor Guide for information on enabling Greenplum Performance Monitor.

Checking for Failed Segments

posted Sep 13, 2012, 7:42 PM by Sachchida Ojha

With mirroring enabled, you may have failed segments in the system without interruption of service or any indication that a failure has occurred. One way to verify the status of your system is to use the gpstate utility. This utility provides the status of each individual component of a Greenplum Database system (primary segments, mirror segments, master, and standby master).

To check for failed segments

1.On the master, run the gpstate utility with the -e option. This will show any segments with error conditions:
$ gpstate -e

2.Segments that are in Change Tracking mode indicate that the corresponding mirror segment is down.
A segment that is not in its preferred role means that the segment is not operating in the role it was assigned at system initialization time. This means that the system is in a potentially unbalanced state, as some segment hosts may have more active segments that is optimal for top system performance. See “To return all segments to their preferred role” on page 184 for instructions on how to fix this situation.

3.To get detailed information about a failed segment, look it up in the gp_segment_configuration catalog table. For example:
$ psql -c "SELECT * FROM gp_segment_configuration WHERE status='d';"

4.For failed segment instances, note the host, port, preferred role, and data directory. This will help you determine the host and segment instances to troubleshoot.

5.To see the primary to mirror segment instance mapping, run:
$ gpstate -m

Knowing When a Segment is Down in Greenplum System

posted Sep 13, 2012, 7:41 PM by Sachchida Ojha

If mirroring is enabled, Greenplum Database will automatically failover to a mirror segment when a primary segment goes down. As long as one segment instance is alive per portion of data, it will not typically be apparent to users of the system that a segment instance is down. If a transaction is in progress when a fault occurs, the in progress transaction will roll back, and then automatically restart on the reconfigured set of segments.

If the entire Greenplum Database system becomes unoperational due to a segment failure (for example if mirroring is not enabled or there are not enough segments alive to serve all the portions of user data), users will see errors when trying to connect to a database. The errors returned to the client program may give some indication of the failure. For example:

ERROR: All segment databases are unavailable

Enabling Alerts and Notifications

Greenplum Database administrators can enable email and/or SNMP alerts to be notified on system events such as segment failures.

Viewing GPDB Settings of Server Configuration Parameters

posted Sep 13, 2012, 1:18 PM by Sachchida Ojha

The SHOW SQL command allows you to see the settings of the server configuration parameters used by the Greenplum Database system. For example, to see the settings for all parameters:

$ psql -c 'SHOW ALL;'

Running SHOW will show the settings for the master instance only. If you want to see the value of a particular parameter across the entire system (master and all segments), you can use the gpconfig utility. For example:

$ gpconfig --show max_connections

Setting GPDB Parameters in a Session

posted Sep 13, 2012, 1:17 PM by Sachchida Ojha

Any session parameter can also be set in an active database session using the SET command. That parameter setting is then valid for the rest of that session (or until a RESET command is issued). Settings at the session level override those at the role level. For example:

=# SET work_mem TO '200MB';
=# RESET work_mem;

Setting GPDB Parameters at the Role Level

posted Sep 13, 2012, 1:16 PM by Sachchida Ojha

When a session parameter is set at the role level, every session initiated by that role will pick up that parameter setting. Settings at the role level override those at the database level. Use the ALTER ROLE command to set a parameter at the role level. For example:

=# ALTER ROLE bob SET search_path TO bobschema;

1-10 of 56