Creating External table in Greenplum database - examples

posted Sep 13, 2012, 6:46 AM by Sachchida Ojha
The CREATE EXTERNAL TABLE command defines an external table and specifies the location and format of the external table data files with the LOCATION and FORMAT parameters. This command does not load data into the table. The following examples show how to connect to external data with different protocols. Each CREATE EXTERNAL TABLE command can contain only one protocol. Details about each protocol are found in the sections following these examples.

Example 1 - Starting the Greenplum file server (gpfdist)
Before creating external tables with the gpfdist protocol, the gpfdist file server program must be running. The following code starsts the gpfdist file server program in the background on port 8081 serving files from directory /var/data/staging:
gpfdist -p 8081 -d /var/data/staging -l /home/gpadmin/log &

Example 2—Single Greenplum file server (gpfdist) instance on multiple NIC machine
Creates a readable external table named ext_expenses using the gpfdist protocol. The files are formatted with a pipe ( | ) as the column delimiter.
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*',
'gpfdist://etlhost-2:8081/*')
FORMAT 'TEXT' (DELIMITER '|');

Example 3—Multiple Greenplum file server (gpfdist) instances
Creates a readable external table named ext_expenses using the gpfdist protocol from all files with the txt extension. The files are formatted with a pipe ( | ) as the column delimiter and an empty space as null.
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt',
'gpfdist://etlhost-2:8082/*.txt')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;

Example 4—Single Greenplum file server (gpfdist) instance with error logging
Creates a readable external table named ext_expenses using the gpfdist protocol from all files with the txt extension. The files are formatted with a pipe ( | ) as the column delimiter and an empty space as null.
The external table is accessed in single row error isolation mode. An error log file is specified. If the error count is greater than five, the
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt',
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5;
Create the same readable external table definition as above, but with CSV formatted files:
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gpfdist://etlhost-1:8081/*.txt',
FORMAT 'CSV' ( DELIMITER ',' );
LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5;

Example 5—Hadoop Distributed File Server (HDFS)
Creates a readable external table named ext_expenses using the gphdfs protocol. The files are formatted with a pipe ( | ) as the column delimiter.
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gphdfs://hdfshost-1:8081/data/filename.txt')
FORMAT 'TEXT' (DELIMITER '|');
Note: Only one file is permitted with gphdfs. The file must be in TEXT format.

Example 6—Multiple file protocols in CSV format with header rows

Creates a readable external table named ext_expenses using the file protocol.The wildcard specifications are not the same for all the files. The files are formatted in CSV format and have a header row.
=# CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('file://filehost:5432/data/international/*',
'file://filehost:5432/data/regional/*'
'file://filehost:5432/data/supplement/*.csv')
FORMAT 'CSV' (HEADER);

Example 7—Readable Web External Table with Script
Create a readable web external table that executes a script once per segment host:
CREATE EXTERNAL WEB TABLE log_output (linenum int, message text) EXECUTE '/var/load_scripts/get_log_data.sh' ON HOST
FORMAT 'TEXT' (DELIMITER '|');

Example 8—Writable External Table that Writes to a File
Create a writable external table named sales_out that uses gpfdist to write output data to a file named sales.out. The files are formatted with a pipe (|) as the column delimiter and an empty space as null.
CREATE WRITABLE EXTERNAL TABLE sales_out (LIKE sales)
LOCATION ('gpfdist://etl1:8081/sales.out')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
DISTRIBUTED BY (txn_id);

Example 9 – Writable External Web Table with Script
Create a writable external web table that pipes output data received by the segments to an executable script named to_adreport_etl.sh:
CREATE WRITABLE EXTERNAL WEB TABLE campaign_out
(LIKE campaign)
EXECUTE '/var/unload_scripts/to_adreport_etl.sh'
FORMAT 'TEXT' (DELIMITER '|');
Use the writable external table defined above to unload selected data:
INSERT INTO campaign_out SELECT * FROM campaign WHERE customer_id=123;
Comments