DCA - EMC Greenplum Data Computing Appliance

EMC Greenplum DCA Advantages : a)Fastest Data Loading in the industry b)Price Performance leadership and lowest total cost of ownership c)Private Cloud ready for first best step to virtualized data warehouse and analytic infrastructure.

Combine a shared-nothing, MPP relational database with enterprise-class Apache Hadoop through the revolutionary modular architecture of the industry's first complete Big Data analytics platform. 

What's new in Greenplum Data Computing Appliance 1.2.1.0

posted May 9, 2013, 10:21 AM by Sachchida Ojha

EMC Greenplum Data Computing Appliance (DCA) is a self-contained data warehouse solution that integrates all of the database software, servers, and switches necessary to perform big data analytics. It is a turn-key, easy-to-install data warehouse solution that provides rapid query and loading performance for analyzing large data sets stored in the Greenplum Database.
DCA 1.2.1.0 software and earlier, run on hardware from Dell, Broadcom, and Allied Telesis. DCA 1.x.x.x software does not run on DCA UAP Edition hardware. DCA 2.0.x.x software
supports DCA UAP Edition hardware.

Resolved Issues in DCA Software 1.2.1.0
This section lists issues that are resolved in DCA software 1.2.1.0.

DCA-5198 Version 1.2.0.1 upgrade
Upgrades to version 1.2.0.1 was not supported on Hadoop servers.
The 1.2.0.1 DCA software upgrade was not supported in DCAs that contained servers imaged and configured as Hadoop servers. The upgrade pre-check failed if Hadoop servers are specified in the hostfile during the upgrade.

DCA-5537, Missing mailx rpm component

DCA-5044 The 1.2.0.0 upgrade removed the mailx rpm (software component) from the distribution. This resulted in errors when using the gpcrondump utility with mail notification. The package has been re-added to the distribution with this release.

DCA-5387 Hadoop / JAVE JRE version incompatibility issue  After the JAVA JRE version was upgraded to 1.6.0_31, there was an incompatibility issue with Hadoop. The Hadoop configuration file has been updated to use the correct version Java JRE (1.6.0_31)

Known Issue in DCA 1.2.1.0

DCA-6096 Pre and Post login banners are not ported to new hosts.  After setting the pre and post login banners (Security Settings), if you then expand the DCA, those messages are not automatically set on the new hosts.

DCA-6081 Misleading URL given for new Command Center instance.

6138 After successfully setting up a new instance of Command Center you will see a message including the URL to the Command Center Console. The URL will not work, the correct URL for your new instance must be a fully qualified domain name.

DCA-6133 Error when setting up a new instance of Command Center In rare circumstances when keys have been manually removed or altered in the known hosts file, attempts to set up a new instance of Command Center will fail. Workaround: Perform Exchange Keys from the DCA_SETUP menu before setting up the new instance.

DCA-6009, Incompatibility with Greenplum Database 

MPP-19551 Due to the upgrade to RHEL 5.9, this release of DCA is not compatible with Greenplum Database version 4.2.1; you must upgrade to Greenplum Database version 4.2.4.

DCA-6078 Enhanced security login feature only applies to new users The new dca_setup feature for enhanced security logins only applies to new users. The change in settings does not impact old users.

For further details read  EMC Greenplum Data Computing Appliance 1.2.1.0 Release Notes


What are the key features of the EMC Greenplum DCA?

posted Sep 14, 2012, 7:34 AM by Sachchida Ojha   [ updated Sep 21, 2012, 12:23 PM ]

The EMC Greenplum Data Computing Appliance (DCA) is a purpose-built appliance that delivers a fast-loading, highly scalable and  parallel computing platform for next generation data warehousing and analytics. The Appliance architecturally integrates database, computing, storageand network resources into an enterprise-class, easy-to-implement system.

The DCA offers the power of MPP architecture, delivers the fastest data loading capacity in the industry, and the best price to performance ratio without the complexity and constraints of the propitiatory hardware.

Key Feature of the DCA

1. DCA uses Greenplum database software, which is based on MPP architecture. The MPP harnesses the combined power of all available compute servers to insure maximum performance.

2. Greenplum database software supports incremental growth (scale-out) of the data warehouse through its ability to automatically redistribute existing data across newly added computing resources.

3. The base architecture of the Greenplum DCA is designed with scalability and growth in mind.  This enables organisations to easily extent their DW/BI capability in a modular way; linear gains in capability and performance are achieved with minimal downtime. All modules are linked via a high-speed, high-performance, low-latency interconnect.

4. The DCA employs a high-speed interconnect Bus that provides database-level communications between all servers in the DCA. It is designed to accommodate access for rapid backup and recovery, and data load(ingest) rate.

5. Excellent performance is provided by effective use of combined power of server, software, network, and storage resources.

6. The DCA can be installed and available on-site with in 24 hours of the customer receiving delivery.

7. The DCA uses cutting-edge, industry-standard commodity hardware rather than specialized or propitiatory hardware.

8. The DCA is offered in multiple-rack-appliance configuration to achieve the maximum flexibility and scalability for organisations  faced with terabyte to petabyte-  scale data opportunities.


High Capacity DCA Configurations

posted Sep 12, 2012, 12:41 PM by Sachchida Ojha   [ updated Sep 12, 2012, 12:48 PM ]

  GP10C Quarter Rack GP100C Half Rack GP1000C Full Rack
  Master Servers 2 2 2
  Segment Servers 4 8 16
  Total CPU core 48 96 192
  Total Memory  192 GB  384 GB  768 GB
 Segment HDD’s
(SAS)
 48 96 192
 Usable Capacity
(uncompressed)
 31 TB 62 TB 124 TB
 Usable Capacity
(compressed)
 124 TB 248 TB 496 TB
  Scan Rate   3.5 GB/Sec  7 GB/Sec  14 GB/Sec
  Data Load Rate   2.5TB/Hour  5TB/Hour   10TB/Hour
   Physical Dimensions   Height: 75 in – 190 cm
Width: 24in – 61cm
Depth: 39.3 in – 100 cm
  Height: 75 in – 190cm
Width: 24in – 61cm
Depth: 39.3 in –100 cm
  Height: 75 in – 190cm
Width: 24in – 61cm
Depth: 39.3 in –100 cm
  Kilos   Weight: 940 lbs – 427 Kgs  Weight: 1,200 lbs– 545 Kgs  Weight: 1,700 lbs– 773  Kgs
  Power VA   2,478   3,980   6,980
  Cooling (BTU/HR)   8,450   13,600   23,800

DCA CONFIGURATIONS

posted Sep 12, 2012, 12:32 PM by Sachchida Ojha   [ updated Sep 12, 2012, 12:33 PM ]

  GP10 Quarter Rack GP100 Half Rack GP1000 Full Rack
  Master Servers 2 2 2
  Segment Servers 4 8 16
  Total CPU core 48 96 192
  Total Memory  192 GB  384 GB  768 GB
 Segment HDD’s
(SAS)
 48 96 192
 Usable Capacity
(uncompressed)
 9 TB 18 TB 36 TB
 Usable Capacity
(compressed)
 36 TB 72 TB 144 TB
  Scan Rate   6 GB/Sec  12 GB/Sec  24 GB/Sec
  Data Load Rate   2.5TB/Hour  5TB/Hour   10TB/Hour
  Physical Dimensions  Height: 75 in – 190 cm
Width: 24in – 61cm
Depth: 39.3 in – 100 cm
 Height: 75 in – 190cm
Width: 24in – 61cm
Depth: 39.3 in –100 cm
 Height: 75 in – 190cm
Width: 24in – 61cm
Depth: 39.3 in –100 cm
  Kilos  Weight: 940 lbs –  427 Kgs  Weight: 1,200 lbs– 545 Kgs  Weight: 1,700 lbs– 773 Kgs
  Power VA   2,478   3,980   6,980
  Cooling (BTU/HR)   8,450   13,600   23,800

DCA FAMILY SPECIFICATIONS OVERVIEW

posted Sep 12, 2012, 12:20 PM by Sachchida Ojha


  DCA GP1000 High Capacity DCA GP1000C
  Master Servers 2 2
  Segment Servers  16 16
  Total CPU core 192 192
  Total Memory 768 GB 768 GB
  Segment HDD’s SSDs 192 192
 Usable Capacity
(uncompressed)
  36 TB  124 TB
 Usable Capacity
(compressed)
  144 TB  496TB
  Maximum Expansion  6 racks  6 racks
  Scan Rate  24GB/Sec  14 GB/Sec
  Data Load Rate   10TB/Hour   10TB/Hour
   


SNMP on the DCA

posted Sep 12, 2012, 10:23 AM by Sachchida Ojha

The Greenplum DCA has a SNMP version 2 management information base (MIB). The MIB can be used by enterprise monitoring systems to identify issues with components and services in the DCA.


gppkg

posted Sep 12, 2012, 10:06 AM by Sachchida Ojha

The Greenplum Package Manager (gppkg) utility installs Greenplum Database extensions, along with any dependencies, on all hosts across a cluster. It will also automatically install extensions on new hosts in the case of system expansion and segment recovery.

First, download one or more of the available packages from the EMC Download Center then copy it to the master host. Use the Greenplum Package Manager to install each package using the options described below.

Usage:

gppkg [-i package | -u package | -r name-version | -c]
[-d master_data_directory] [-a] [-v]
gppkg --migrate GPHOME_1 GPHOME_2 [-a] [-v]
gppkg [-q | --query] query_option
gppkg -? | --help | -h
gppkg --version

Note: After a major upgrade to Greenplum Database, you must download and install all extensions again.
The following packages are available for download from the EMC Download Center.
•PostGIS
•PL/Java
•PL/R
•PL/Perl
•Pgcrypto


Options
-a (do not prompt)
Do not prompt the user for confirmation.
-c | --clean
Reconciles the package state of the cluster to match the state of the master host. Running this option after a failed or partial install/uninstall ensures that the package installation state is consistent across the cluster.
-d master_data_directory
The master data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.
-i package | --install=package
Installs the given package. This includes any pre/post installation steps and installation of any dependencies.
--migrate GPHOME_1 GPHOME_2
Migrates packages from a separate $GPHOME. Carries over packages from one version of Greenplum Database to another.
For example: gppkg --migrate /usr/local/greenplum-db-4.2.0.1 /usr/local/greenplum-db-4.2.1.0
This option is automatically invoked by the installer during minor upgrades. This option is given here for cases when the user wants to migrate packages manually.
Migration can only proceed if gppkg is executed from the installation directory to which packages are being migrated. That is, GPHOME_2 must match the $GPHOME from which the currently executing gppkg is being run.

-q | --query query_option
Provides information specified by query_option about the installed packages. Only one query_option can be specified at a time. The following table lists the possible values for query_option. <package_file> is the name of a package.
=====================================================================
query_option   Returns
<package_file> Whether the specified package is installed.
--info <package_file> The name, version, and other information about the specified package.
--list <package_file> The file contents of the specified package.
--all List of all installed packages.
=====================================================================

-r name-version | --remove=name-version
Removes the specified package.
-u package | --update=package
Updates the given package.
--version (show utility version)
Displays the version of this utility.
-v | --verbose
Sets the logging level to verbose.
-? | -h | --help
Displays the online help.

dca_shutdown

posted Sep 12, 2012, 10:02 AM by Sachchida Ojha

The dca_shutdown utility will safely power down all servers in a DCA. The utility can be run with no parameters, and will use the system inventory generated by DCA Setup during an installation or Regenerate DCA Config Files operation. If the utility is run with a hostfile or hostname specified, only those hosts will be shutdown. This utility will not shut down the administration, Interconnect or aggregation switches.
The utility should be run as the user root. Prior to running the dca_shutdown, the following steps should be performed to ensure a clean shutdown:
1.Stop Greenplum Database:
$ gpstop -af
2.Stop Command Center:
$ gpcmdr --stop
3.Stop health monitoring as the user root:
$ su -
# dca_healthmon_ctl -d


Usage:

dca_shutdown { -f hostfile | -h hostname } [ --ignoredb ] [ --password= password ] [ --passfile= password_file ] [--statusonly]
dca_shutdown
dca_shutdown --help


Options
-?, --help
Print usage and help information
-i, --ignoredb
Do not check if Greenplum Database, health monitoring or Command Center are running. Shut down all servers immediately.
-h, --host hostname
Perform a shutdown on the host specified.
-f, --hostfile hostfile
Perform a shutdown on the hosts listed in the hostfile. This option can not be used with the --host option.
-p, --password password
Specify a password to connect to the server’s IPMI (iDRAC) to perform the shutdown. The password is originally set during installation with DCA Setup - if an installation through DCA Setup has never been run, the user will be prompted for a password.
-s, --passfile password_file
Specify a file containing the password to use to connect to the server’s IPMI (iDRAC) to perform the shutdown. This file is generated during installation with DCA Setup, and is located in /opt/dca/etc/ipmipasswd.
-o, --statusonly
Print the power status (ON | OFF) of all servers. This will not power off any servers.

Examples
Shut down all servers in a DCA:
dca_shutdown
Shut down servers listed in the file hostfile:
dca_shutdown -f /home/gpadmin/gpconfigs/hostfile

dcaperfcheck

posted Sep 12, 2012, 9:59 AM by Sachchida Ojha

The dcaperfcheck utility is used to test performance of the hardware in a DCA. This test is run to validate network, disk and memory are performing as expected. This is a useful tool to determine hardware failures, or mis-cabling. This utility can be run as the user gpadmin or root. If the utility is run as the user gpadmin, this user must have permissions to write and read from the test directory.

Usage:

dcaperfcheck { -f hostfile | -h hostname } { -r [d | s | n | N | M ] } [-B size ] [ -S size ] {-d test_dir | --device } {-v | -V } [ -D ] [ --duration seconds ] [ --netperf ]
dcaperfcheck -?

Options

-d test_directory
Directory where data will be written to and read from. Multiplt -d flags may be specified for multiple directories on each host. During network and memory tests, this can be the /tmp directory. During disk tests, use operating system mount points that will exercise each drive.
-v
Enable verbose output.
-V
Enable very verbose output.
-D
Print statistics for each host. The default output will print only the hosts with lowest and highest values.
-rd, -rs, -rn, -rN, -rM
Specify type of test to run, d - disk, s - stream (memory), n - serial netperf, N - parallel netperf, or M - full matrix netperf. These options can be conbined, for example, -rds. The default is dsn. Typically, disk and network tests are seperated, because disk tests require more test directories specified, where network tests only require a single temporary directory.
-B size
Specify the block size for disk performance tests. The default is 32kb. Examples: 1KB, 4MB.
-S size
Specify the file size for disk performance tests. The default is 2x server memory. On a DCA, there is 48GB of memory, so the default is 96GB. Examples: 500MB, 16GB.
-h hostname
Specify a host to run the utility. Multiple hosts can be specified.
-f hostfile
Specify a file with a list of hosts to run the utility. The hostfile will differ based on the test (disk or network) you are running. A network test will typically be run against once interconnect, so hostnames should reflect only interfaces on this interconnect
--duration seconds
Specify a length of time to run the network test. Time specified is in seconds.
--netperf
Use the netperf network test instead of gpnetbenchServer/gpnetbenchClient. This option can only be run if the network test is specified.
--device
Use a raw device instead of a test directory, for example /dev/sda1, /dev/sda2. Multiple devices may be specified. This option requires dcaperfcheck be run as the user root. WARNING THIS WILL CAUASE DATA LOSS FOR SPECIFIED DEVICES.
-?
Print online help.

Examples
Run a parallel network and stream test on Interconnect 1:
# dcaperfcheck -f /home/gpadmin/gpconfigs/hostfile_gpdb_ic1 -rsN -d /tmp
Run a disk test, using all the data directories on a segment server, sdw1:

dcacheck

posted Sep 12, 2012, 9:14 AM by Sachchida Ojha

The dcacheck utility validates DCA operating system and hardware configuration settings. The dcacheck utility can use a host file or a file previously created with the --zipout option to validate settings. At the end of a successful validation process, DCACHECK_NORMAL message displays. If DCACHECK_ERROR displays, one or more validation checks failed. You can use also dcacheck to gather and view platform settings on hosts without running validation checks.
Greenplum recommends that you run dcacheck as the user root. If you do not run dcacheck as root, the utility displays a warning message and will not be able to validate all configuration settings; Only some of these settings will be validated.

If dcacheck is run with no parameters, it will validate settings in the following file:
/opt/dca/etc/dcacheck/dcacheck_config

Different configuration parameters are validated by DCA software release.

Usage

dcacheck { -f hostfile | -h hostname } { --stdout | --zipout } [ --config config_file ]
dcacheck --zipin dcacheck_zipfile
dcacheck -?


--config config_file
The name of a configuration file to use instead of the default file /opt/dca/etc/dcacheck/dcacheck_config.
-f hostfile
The name of a file that contains a list of hosts dcahceck uses validate settings. This file should contain a single host name for all hosts in the DCA.
-h hostname
The name of a host that dcacheck will validate platform-specific settings.
--stdout
Display collected host information from dcacheck. No checks or validations are performed.
--zipout
Save all collected data to a .zip file in the current working directory. dcacheck automatically creates the .zip file and names it dcacheck_timestamp.tar.gz. No checks or valications are performed.
--zipin file
Use this option to decompress and check a .zip file created with the --zipout option. dcacheck performs validation tasks against the file you specify in this option.
-?
Print the online help.

Examples
Verify and validate the DCA settings on specific servers:
# dcacheck -f /home/gpadmin/gpconfigs/hostfile
Verify custom settings on all DCA servers:
# dcacheck --config my_config_file

1-10 of 19