Informatica Server Configuration for Greenplum (Connect to greenplum from informatica using ODBC)

posted Sep 11, 2012, 1:45 PM by Sachchida Ojha
Greenplum Databases is different from regular Oracle, SQL Server databse, so it requires special considerations in order to configure Informatica perform read/write on greenplum database.
 
Please see below a small write up how to connect and perfom read/write to greenplum database.  This document is applicable for Greenplum database (running on a Linux OS) and Informatica  (running on a Windows )
 
Database Connectivity : The standard database interface ODBC can be used to establish connectivity between Informatica and Greenplum database. ODBC Drivers for Green Plum are available at: http://www.postgresql.org/ftp/odbc/
 
Steps to configuring ODBC connectivity:
 A) Download and install the driver
1. Download the driver from URL  http://www.postgresql.org/ftp/odbc/ 
2. ODBC driver needs to be installed in the system which hosts Informatica Power Center Client as well as the one which holds the Server.   For instance we have downloaded “psqlodbc-09_01_0200” (for Greenplum) and deployed the PostgreSQL ODBC driver in the Client and Server machines.
3. Credentials like Database name, IP of the Server that hosts Greenplum software, Port number, Username and Password would be needed at the time of creating DSN entries.
4. Greenplum database pg_hba.conf file should be updated to accept the connection from Informatica client/server. (See Appendix A for more details)
 
 B) Creating DSN entries
1. The ODBC Data Source Administrator in Windows is to be used to create a DSN entry in the system that hosts Informatica Client, using PostgreSQL ANSI / PostgreSQL Unicode driver – for Greenplum.
2. A DSN entry is to be created in the system that hosts Informatica PowerCenter Server as well.
Once the above steps are done we are all set to use Informatica to extract or load information into Greenplum database.
 
C) Accessing Greenplum objects within Informatica
All features in Informatica PowerCenter client that works with other databases holds good for these databases as well, like:
1. Using “Import from Database” option in Source Analyzer we can import Greenplum , Views similar to what we do for other databases. The DSN entry that we created in the Power Center Client instance to be used for the same.
2. The same is applicable for importing Views / Tables into Target Designer.
3. It is also possible to create a table in these databases using a table structure defined in Informatica Target Designer using the “Generate and Execute” option.
 
D) Executing Informatica workflows with Greenplum objects
1. Relational Connection Browser in workflow manager is to be used for creating Connection entries. Connections are to be created under the type ODBC. Access credentials Username / password for the Greenplum database and Connect string would be needed for this.
2. Connect String here refers to the DSN entry that we created in the Informatica PowerCenter Server machine.

 

Appendix A

 Allowing Connections to Greenplum Database

Client access and authentication is controlled by a configuration file named pg_hba.conf (the standard PostgreSQL host-based authentication file).


In Greenplum Database, the pg_hba.conf file of the master instance controls client access and authentication to your Greenplum system. The segments also have pg_hba.conf files, but these are already correctly configured to only allow client connections from the master host. The segments never accept outside client connections, so there is no need to alter the pg_hba.conf file on your segments.

The general format of the pg_hba.conf file is a set of records, one per line. Blank lines are ignored, as is any text after the # comment character. A record is made up of a number of fields which are separated by spaces and/or tabs. Fields can contain white space if the field value is quoted. Records cannot be continued across lines. Each remote client access record is in the format of:

 

host database role CIDR-address authentication-method

 

Each UNIX-domain socket access record is in the format of:


local database role authentication-method


The meaning of the fields is as follows:

Field             Description
local
            Matches connection attempts using UNIX-domain sockets. Without a record of this type, UNIX-domain socket connections are disallowed.
host            Matches connection attempts made using TCP/IP. Remote TCP/IP connections will not be possible unless the server is started with an appropriate value for the listen_addresses server configuration parameter.
hostssl        Matches connection attempts made using TCP/IP, but only when the connection is made with SSL encryption. SSL must be enabled at server start
                    time by setting the ssl configuration parameter
hostnossl    Matches connection attempts made over TCP/IP that do not use SSL.
database    Specifies which database names this record matches. The value all specifies that it matches all databases. Multiple database names can be supplied by separating them with commas. A separate file containing database names can  

                   be specified by preceding the file name with @.
role             Specifies which database role names this record matches. The value all specifies that it matches all roles. If the specified role is a group and you want all members of that group to be included, precede the role name with a +.

                   Multiple role names can be supplied by separating them with commas. A separate file containing role names can be specified by preceding the file name with @.
CIDR-address Specifies the client machine IP address range that this record matches. It contains an IP address in standard dotted decimal notation and a CIDR mask length. IP addresses can only be specified numerically, not as domain or host names. The mask length indicates the number of high-order bits of the client IP address that must match. Bits to the right of this must be zero in the given IP address. There must not be any white space between the IP address, the /, and the CIDR mask length.  Typical examples of a CIDR-address are 172.20.143.89/32 for a single host, or 172.20.143.0/24 for a small network, or 10.6.0.0/16 for a larger one. To specify a single host, use a CIDR mask of 32 for IPv4 or 128 for IPv6. In a network address, do not omit trailing zeroes.
IP-address/
IP-mask
          These fields can be used as an alternative to the CIDR-address notation. Instead of specifying the mask length, the actual mask is
                     specified in a separate column. For example, 255.0.0.0 represents an IPv4 CIDR mask length of 8, and 255.255.255.255 represents a CIDR mask length of 32. These fields only apply to host, hostssl, and hostnossl records.
authentication-method Specifies the authentication method to use when connecting.

Comments