GREENPLUM DATABASE ARCHITECTURE

posted Jul 27, 2012, 11:52 AM by Sachchida Ojha
DATABASE ARCHITECTURE 

Core Massively Parallel Processing Architecture - The Greenplum Database architecture provides automatic parallelization of data and queries-all data is automatically partitioned across all nodes of the system, and queries are planned and executed using all nodes working together in a highly coordinated fashion. 

Petabyte-Scale Loading - High-performance loading uses MPP Scatter/Gather Streaming technology. Loading speeds scale with each additional node to greater than 10 terabytes per hour, per rack. 

Polymorphic Data Storage and Execution - Using Greenplum's Polymorphic Data Storage technology, the DBA can select the storage, execution, and compression settings that suit the way that table will be accessed. With this feature, customers have the choice of row- or column-oriented storage and processing for any table or partition.
Anywhere Data Access - Anywhere data access enables queries to be executed from the database against external data sources, returning data in parallel regardless of location, format, or storage medium. 

In-Database Compression - In-database compression uses industry-leading compression technology to increase performance and dramatically reduce the space required to store data. Customers can expect to see a three- to 10-time disk space reduction with a corresponding increase in effective I/O performance.
Multi-level Partitioning - Flexible partitioning of tables is based on date, range, or value.

DATABASE MANAGEMENT TOOLS 

Online System Expansion
- Add servers to increase storage capacity, processing performance, and loading performance. The database can remain online and fully available while the expansion process takes place in the background. 

Workload Management - With administrative control over system resources and their allocation to queries, users can be assigned to resource queues that manage the inflow of work to the database. 

Dynamic Query Prioritization - Greenplum's Advanced Workload Management is extended with patent-pending technology that provides continuous real-time balancing of the resources of the entire cluster across all running queries. 

Database Performance Monitor Tool - The Greenplum Database's Performance Monitor data collection agents gather metrics to help administrators analyze network patterns of Greenplum Database. 

Simple and Fast Parallel Installation - The parallel installation utility allows system administrators to install the Greenplum Database software on multiple hosts at once.

HIGH AVAILABILITY SUPPORT 

Self-Healing Fault Tolerance - Traditional MPP database fault-tolerance techniques were suitable for environments with less than 100 servers, but TCO has increased dramatically beyond that scale. 

Post-Recovery Online Segment Rebalancing - After segment recovery, the EMC Greenplum Database segments can be rebalanced while the system is online. All client sessions remain connected to allow no down time. The database remain functional while the system is recovered back into an optimal state
Comments