GPDB FAQ

What is GPDB Segment Hosts?

posted Sep 12, 2012, 9:01 AM by Sachchida Ojha

In Greenplum Database, the segments are where the database data is stored and where the majority of query processing takes place. User-defined tables and their indexes are distributed across the available number of segments in the Greenplum Database system, each segment containing a distinct portion of the data. Segment instances are the database server processes that serve segments. Users and administrators do not interact directly with the segments in a Greenplum Database system, but do so through the master.

Data Redundancy - Mirror Segments

Greenplum Database provides data redundancy by deploying mirror segments. Mirror segments allow database queries to fail over to a backup segment if the primary segment becomes unavailable. A mirror segment always resides on a different host than its corresponding primary segment. A Greenplum Database system can remain operational if a segment host, network interface or interconnect switch goes down as long as all portions of data are available on the remaining active segments.

During database operations, only the primary segment is active. Changes to a primary segment are copied over to its mirror using a file block replication process. Until a failure occurs on the primary segment, there is no live segment instance running on the mirror host -- only the replication process.

In the event of a segment failure, the file replication process is stopped and the mirror segment is automatically brought up as the active segment instance. All database operations then continue using the mirror. While the mirror is active, it is also logging all transactional changes made to the database. When the failed segment is ready to be brought back online, administrators initiate a recovery process to bring it back into operation.

What is GPDB Master Hosts?

posted Sep 12, 2012, 9:00 AM by Sachchida Ojha

The master is the entry point to the Greenplum Database system from the public LAN. For systems that wish to use the automated master server failover, a virtual IP will be configured - client tools should point to this IP. It is the database process that accepts client connections and processes the SQL commands issued by the users of the system. Users connect to Greenplum Database through the master using PostgreSQL-compatible client programs such as psql or ODBC.
The master maintains the system catalog (a set of system tables that contain metadata about the Greenplum Database system itself), however the master does not contain any user data. Data resides only on the segments. The master does the work of authenticating client connections, processing and planning the incoming SQL commands, distributing the work load between the segments, coordinating the results returned by each of the segments, and presenting the final results to the client program.

Master Redundancy - The Standby Master

Greenplum DCA also has a standby master host to serve as a backup in case the primary master becomes unoperational. The standby master can be setup to automatically promote itself to the primary master in the event of a failure. By default, automatic master server failover is turned off.
The standby master is kept up to date by a transaction log replication process, which runs on the standby master host and keeps the data between the primary and standby master hosts synchronized. If the primary master fails, the log replication process is shutdown, and the standby master can be activated in its place. Upon activation of the standby master, the replicated logs are used to reconstruct the state of the master host at the time of the last successfully committed transaction.

What is GPDB?

posted Sep 12, 2012, 8:58 AM by Sachchida Ojha

Greenplum Database is a massively parallel processing (MPP) database management system (DBMS). Greenplum Database 4.2 uses MPP as the backbone to its database architecture. MPP refers to a distributed system that has two or more individual servers, which carry out an operation in parallel. Each server has its own processor(s),memory, operating system and storage. All servers communicate with each other over a common network. In this instance a single database system can effectively use the combined computational performance of all individual MPP servers to provide a powerful, scalable database system. Greenplum uses this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and is able to use all of a system’s resources in parallel to process a query.

Greenplum Database is based on PostgreSQL 8.2.14, and in most cases is very similar to PostgreSQL with regards to SQL support, features, configuration options, and end-user functionality. Database users interact with Greenplum Database as they would a regular PostgreSQL DBMS.

Greenplum Database is able to handle the storage and processing of large amounts of data by distributing the load across several servers or hosts. The master is the entry point to the Greenplum Database system. It is the database instance where clients connect and submit SQL statements. Greenplum DCA comes with two master hosts — one primary master and a standby master.

The master coordinates the work across the other database instances in the system, the segments, which handle data processing and storage. Greenplum DCA comes with a configurable number of segment hosts. Each segment host serves 6 primary and 6 mirror Greenplum segment instances.

The segments communicate with each other and with the master over the interconnect, which is the networking layer of Greenplum Database. The DCA interconnect is configured on a private LAN and utilizes two high-speed network switches, offering each segment host 20 Gb non-blocking duplex bandwidth. The Greenplum primary and mirror segments are configured to use different interconnect switches in order to provide redundancy in the event of a single switch failure.

In addition the interconnect switches, Greenplum DCA comes with an additional administration switch. Each master and segment server has a dedicated interface for remote system administration. This controller has its own processor, memory, battery, and network connection. This allows administrators to access the individual Greenplum DCA servers as if they were at the local console (terminal).

1-3 of 3