EMC Greenplum DCA Advantages : a)Fastest Data Loading in the industry b)Price Performance leadership and lowest total cost of ownership c)Private Cloud ready for first best step to virtualized data warehouse and analytic infrastructure. Combine a shared-nothing, MPP relational database with enterprise-class Apache Hadoop through the revolutionary modular architecture of the industry's first complete Big Data analytics platform. |
DCA - EMC Greenplum Data Computing Appliance
What's new in Greenplum Data Computing Appliance 1.2.1.0
EMC Greenplum Data Computing Appliance (DCA) is a self-contained data warehouse solution that integrates all of the database software, servers, and switches necessary to perform big data analytics. It is a turn-key, easy-to-install data warehouse solution that provides rapid query and loading performance for analyzing large data sets stored in the Greenplum Database. DCA 1.2.1.0 software and earlier, run on hardware from Dell, Broadcom, and Allied Telesis. DCA 1.x.x.x software does not run on DCA UAP Edition hardware. DCA 2.0.x.x software supports DCA UAP Edition hardware. Resolved Issues in DCA Software 1.2.1.0 This section lists issues that are resolved in DCA software 1.2.1.0. DCA-5198 Version 1.2.0.1 upgrade Upgrades to version 1.2.0.1 was not supported on Hadoop servers. The 1.2.0.1 DCA software upgrade was not supported in DCAs that contained servers imaged and configured as Hadoop servers. The upgrade pre-check failed if Hadoop servers are specified in the hostfile during the upgrade. DCA-5537, Missing mailx rpm component DCA-5044 The 1.2.0.0 upgrade removed the mailx rpm (software component) from the distribution. This resulted in errors when using the gpcrondump utility with mail notification. The package has been re-added to the distribution with this release. DCA-5387 Hadoop / JAVE JRE version incompatibility issue After the JAVA JRE version was upgraded to 1.6.0_31, there was an incompatibility issue with Hadoop. The Hadoop configuration file has been updated to use the correct version Java JRE (1.6.0_31) Known Issue in DCA 1.2.1.0 DCA-6096 Pre and Post login banners are not ported to new hosts. After setting the pre and post login banners (Security Settings), if you then expand the DCA, those messages are not automatically set on the new hosts. DCA-6081 Misleading URL given for new Command Center instance. 6138 After successfully setting up a new instance of Command Center you will see a message including the URL to the Command Center Console. The URL will not work, the correct URL for your new instance must be a fully qualified domain name. DCA-6133 Error when setting up a new instance of Command Center In rare circumstances when keys have been manually removed or altered in the known hosts file, attempts to set up a new instance of Command Center will fail. Workaround: Perform Exchange Keys from the DCA_SETUP menu before setting up the new instance. DCA-6009, Incompatibility with Greenplum Database MPP-19551 Due to the upgrade to RHEL 5.9, this release of DCA is not compatible with Greenplum Database version 4.2.1; you must upgrade to Greenplum Database version 4.2.4. DCA-6078 Enhanced security login feature only applies to new users The new dca_setup feature for enhanced security logins only applies to new users. The change in settings does not impact old users. For further details read EMC Greenplum Data Computing Appliance 1.2.1.0 Release Notes |
What are the key features of the EMC Greenplum DCA?
The EMC Greenplum Data Computing Appliance (DCA) is a purpose-built appliance that delivers a fast-loading, highly scalable and parallel computing platform for next generation data warehousing and analytics. The Appliance architecturally integrates database, computing, storage, and network resources into an enterprise-class, easy-to-implement system. The DCA offers the power of MPP architecture, delivers the fastest data loading capacity in the industry, and the best price to performance ratio without the complexity and constraints of the propitiatory hardware. Key Feature of the DCA 1. DCA uses Greenplum database software, which is based on MPP architecture. The MPP harnesses the combined power of all available compute servers to insure maximum performance. 2. Greenplum database software supports incremental growth (scale-out) of the data warehouse through its ability to automatically redistribute existing data across newly added computing resources. 3. The base architecture of the Greenplum DCA is designed with scalability and growth in mind. This enables organisations to easily extent their DW/BI capability in a modular way; linear gains in capability and performance are achieved with minimal downtime. All modules are linked via a high-speed, high-performance, low-latency interconnect. 4. The DCA employs a high-speed interconnect Bus that provides database-level communications between all servers in the DCA. It is designed to accommodate access for rapid backup and recovery, and data load(ingest) rate. 5. Excellent performance is provided by effective use of combined power of server, software, network, and storage resources. 6. The DCA can be installed and available on-site with in 24 hours of the customer receiving delivery. 7. The DCA uses cutting-edge, industry-standard commodity hardware rather than specialized or propitiatory hardware. 8. The DCA is offered in multiple-rack-appliance configuration to achieve the maximum flexibility and scalability for organisations faced with terabyte to petabyte- scale data opportunities. |
High Capacity DCA Configurations
|
DCA CONFIGURATIONS
|
DCA FAMILY SPECIFICATIONS OVERVIEW
|
SNMP on the DCA
The Greenplum DCA has a SNMP version 2 management information base (MIB). The MIB can be used by enterprise monitoring systems to identify issues with components and services in the DCA. |
gppkg
The Greenplum Package Manager (gppkg) utility installs Greenplum Database extensions, along with any dependencies, on all hosts across a cluster. It will also automatically install extensions on new hosts in the case of system expansion and segment recovery. First, download one or more of the available packages from the EMC Download Center then copy it to the master host. Use the Greenplum Package Manager to install each package using the options described below. Usage: gppkg [-i package | -u package | -r name-version | -c] [-d master_data_directory] [-a] [-v] gppkg --migrate GPHOME_1 GPHOME_2 [-a] [-v] gppkg [-q | --query] query_option gppkg -? | --help | -h gppkg --version Note: After a major upgrade to Greenplum Database, you must download and install all extensions again. The following packages are available for download from the EMC Download Center. •PostGIS •PL/Java •PL/R •PL/Perl •Pgcrypto Options -a (do not prompt) Do not prompt the user for confirmation. -c | --clean Reconciles the package state of the cluster to match the state of the master host. Running this option after a failed or partial install/uninstall ensures that the package installation state is consistent across the cluster. -d master_data_directory The master data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used. -i package | --install=package Installs the given package. This includes any pre/post installation steps and installation of any dependencies. --migrate GPHOME_1 GPHOME_2 Migrates packages from a separate $GPHOME. Carries over packages from one version of Greenplum Database to another. For example: gppkg --migrate /usr/local/greenplum-db-4.2.0.1 /usr/local/greenplum-db-4.2.1.0 This option is automatically invoked by the installer during minor upgrades. This option is given here for cases when the user wants to migrate packages manually. Migration can only proceed if gppkg is executed from the installation directory to which packages are being migrated. That is, GPHOME_2 must match the $GPHOME from which the currently executing gppkg is being run. -q | --query query_option Provides information specified by query_option about the installed packages. Only one query_option can be specified at a time. The following table lists the possible values for query_option. <package_file> is the name of a package. ===================================================================== query_option Returns <package_file> Whether the specified package is installed. --info <package_file> The name, version, and other information about the specified package. --list <package_file> The file contents of the specified package. --all List of all installed packages. ===================================================================== -r name-version | --remove=name-version Removes the specified package. -u package | --update=package Updates the given package. --version (show utility version) Displays the version of this utility. -v | --verbose Sets the logging level to verbose. -? | -h | --help Displays the online help. |
dca_shutdown
The dca_shutdown utility will safely power down all servers in a DCA. The utility can be run with no parameters, and will use the system inventory generated by DCA Setup during an installation or Regenerate DCA Config Files operation. If the utility is run with a hostfile or hostname specified, only those hosts will be shutdown. This utility will not shut down the administration, Interconnect or aggregation switches. The utility should be run as the user root. Prior to running the dca_shutdown, the following steps should be performed to ensure a clean shutdown: 1.Stop Greenplum Database: $ gpstop -af 2.Stop Command Center: $ gpcmdr --stop 3.Stop health monitoring as the user root: $ su - # dca_healthmon_ctl -d Usage: dca_shutdown { -f hostfile | -h hostname } [ --ignoredb ] [ --password= password ] [ --passfile= password_file ] [--statusonly] dca_shutdown dca_shutdown --help Options -?, --help Print usage and help information -i, --ignoredb Do not check if Greenplum Database, health monitoring or Command Center are running. Shut down all servers immediately. -h, --host hostname Perform a shutdown on the host specified. -f, --hostfile hostfile Perform a shutdown on the hosts listed in the hostfile. This option can not be used with the --host option. -p, --password password Specify a password to connect to the server’s IPMI (iDRAC) to perform the shutdown. The password is originally set during installation with DCA Setup - if an installation through DCA Setup has never been run, the user will be prompted for a password. -s, --passfile password_file Specify a file containing the password to use to connect to the server’s IPMI (iDRAC) to perform the shutdown. This file is generated during installation with DCA Setup, and is located in /opt/dca/etc/ipmipasswd. -o, --statusonly Print the power status (ON | OFF) of all servers. This will not power off any servers. Examples Shut down all servers in a DCA: dca_shutdown Shut down servers listed in the file hostfile: dca_shutdown -f /home/gpadmin/gpconfigs/hostfile |
dcaperfcheck
The dcaperfcheck utility is used to test performance of the hardware in a DCA. This test is run to validate network, disk and memory are performing as expected. This is a useful tool to determine hardware failures, or mis-cabling. This utility can be run as the user gpadmin or root. If the utility is run as the user gpadmin, this user must have permissions to write and read from the test directory. Usage: dcaperfcheck { -f hostfile | -h hostname } { -r [d | s | n | N | M ] } [-B size ] [ -S size ] {-d test_dir | --device } {-v | -V } [ -D ] [ --duration seconds ] [ --netperf ] dcaperfcheck -? Options -d test_directory Directory where data will be written to and read from. Multiplt -d flags may be specified for multiple directories on each host. During network and memory tests, this can be the /tmp directory. During disk tests, use operating system mount points that will exercise each drive. -v Enable verbose output. -V Enable very verbose output. -D Print statistics for each host. The default output will print only the hosts with lowest and highest values. -rd, -rs, -rn, -rN, -rM Specify type of test to run, d - disk, s - stream (memory), n - serial netperf, N - parallel netperf, or M - full matrix netperf. These options can be conbined, for example, -rds. The default is dsn. Typically, disk and network tests are seperated, because disk tests require more test directories specified, where network tests only require a single temporary directory. -B size Specify the block size for disk performance tests. The default is 32kb. Examples: 1KB, 4MB. -S size Specify the file size for disk performance tests. The default is 2x server memory. On a DCA, there is 48GB of memory, so the default is 96GB. Examples: 500MB, 16GB. -h hostname Specify a host to run the utility. Multiple hosts can be specified. -f hostfile Specify a file with a list of hosts to run the utility. The hostfile will differ based on the test (disk or network) you are running. A network test will typically be run against once interconnect, so hostnames should reflect only interfaces on this interconnect --duration seconds Specify a length of time to run the network test. Time specified is in seconds. --netperf Use the netperf network test instead of gpnetbenchServer/gpnetbenchClient. This option can only be run if the network test is specified. --device Use a raw device instead of a test directory, for example /dev/sda1, /dev/sda2. Multiple devices may be specified. This option requires dcaperfcheck be run as the user root. WARNING THIS WILL CAUASE DATA LOSS FOR SPECIFIED DEVICES. -? Print online help. Examples Run a parallel network and stream test on Interconnect 1: # dcaperfcheck -f /home/gpadmin/gpconfigs/hostfile_gpdb_ic1 -rsN -d /tmp Run a disk test, using all the data directories on a segment server, sdw1: |
dcacheck
The dcacheck utility validates DCA operating system and hardware configuration settings. The dcacheck utility can use a host file or a file previously created with the --zipout option to validate settings. At the end of a successful validation process, DCACHECK_NORMAL message displays. If DCACHECK_ERROR displays, one or more validation checks failed. You can use also dcacheck to gather and view platform settings on hosts without running validation checks. Greenplum recommends that you run dcacheck as the user root. If you do not run dcacheck as root, the utility displays a warning message and will not be able to validate all configuration settings; Only some of these settings will be validated. If dcacheck is run with no parameters, it will validate settings in the following file: /opt/dca/etc/dcacheck/dcacheck_config Different configuration parameters are validated by DCA software release. Usage dcacheck { -f hostfile | -h hostname } { --stdout | --zipout } [ --config config_file ] dcacheck --zipin dcacheck_zipfile dcacheck -? --config config_file The name of a configuration file to use instead of the default file /opt/dca/etc/dcacheck/dcacheck_config. -f hostfile The name of a file that contains a list of hosts dcahceck uses validate settings. This file should contain a single host name for all hosts in the DCA. -h hostname The name of a host that dcacheck will validate platform-specific settings. --stdout Display collected host information from dcacheck. No checks or validations are performed. --zipout Save all collected data to a .zip file in the current working directory. dcacheck automatically creates the .zip file and names it dcacheck_timestamp.tar.gz. No checks or valications are performed. --zipin file Use this option to decompress and check a .zip file created with the --zipout option. dcacheck performs validation tasks against the file you specify in this option. -? Print the online help. Examples Verify and validate the DCA settings on specific servers: # dcacheck -f /home/gpadmin/gpconfigs/hostfile Verify custom settings on all DCA servers: # dcacheck --config my_config_file |
1-10 of 19