What is GPDB?

posted Sep 12, 2012, 8:58 AM by Sachchida Ojha
Greenplum Database is a massively parallel processing (MPP) database management system (DBMS). Greenplum Database 4.2 uses MPP as the backbone to its database architecture. MPP refers to a distributed system that has two or more individual servers, which carry out an operation in parallel. Each server has its own processor(s),memory, operating system and storage. All servers communicate with each other over a common network. In this instance a single database system can effectively use the combined computational performance of all individual MPP servers to provide a powerful, scalable database system. Greenplum uses this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and is able to use all of a system’s resources in parallel to process a query.

Greenplum Database is based on PostgreSQL 8.2.14, and in most cases is very similar to PostgreSQL with regards to SQL support, features, configuration options, and end-user functionality. Database users interact with Greenplum Database as they would a regular PostgreSQL DBMS.

Greenplum Database is able to handle the storage and processing of large amounts of data by distributing the load across several servers or hosts. The master is the entry point to the Greenplum Database system. It is the database instance where clients connect and submit SQL statements. Greenplum DCA comes with two master hosts — one primary master and a standby master.

The master coordinates the work across the other database instances in the system, the segments, which handle data processing and storage. Greenplum DCA comes with a configurable number of segment hosts. Each segment host serves 6 primary and 6 mirror Greenplum segment instances.

The segments communicate with each other and with the master over the interconnect, which is the networking layer of Greenplum Database. The DCA interconnect is configured on a private LAN and utilizes two high-speed network switches, offering each segment host 20 Gb non-blocking duplex bandwidth. The Greenplum primary and mirror segments are configured to use different interconnect switches in order to provide redundancy in the event of a single switch failure.

In addition the interconnect switches, Greenplum DCA comes with an additional administration switch. Each master and segment server has a dedicated interface for remote system administration. This controller has its own processor, memory, battery, and network connection. This allows administrators to access the individual Greenplum DCA servers as if they were at the local console (terminal).