Test unloading data over gpfdist
-- Table Structure
mydb=> \d tab1
Table "schema.tab1"
Column | Type | Modifiers
--------------------------------+-----------------------------+-----------
column_01 | bytea | not null
column_02 | bytea | not null
column_03 | smallint | not null
column_04 | date | not null
column_05 | bytea | not null
column_06 | date | not null
column_07 | date | not null
column_08 | bytea |
column_09 | smallint | not null
column_10 | smallint | not null
column_11 | numeric(15,2) | not null
column_12 | date | not null
column_13 | smallint | not null
column_14 | smallint | not null
column_15 | smallint | not null
column_16 | character varying(14) | not null
column_17 | character varying(8) | not null
column_18 | timestamp without time zone | not null
column_19 | smallint | not null
column_20 | bytea |
Distributed by: (column_02)
-- Record Count
mydb=> select count(*) from tab1;
count
---------
1000000
(1 row)
Time: 182.207 ms
-- Table Size
mydb=> select pg_size_pretty(pg_total_relation_size('tab1'));
pg_size_pretty
----------------
218 MB
(1 row)
Time: 52.392 ms
--- Check Skew
mydb=> select max(c), min(c), avg(c), count(*) from (select gp_segment_id, count(*) c from tab1 group by gp_segment_id) a;
max | min | avg | count
-------+-------+------------------+-------
21086 | 20559 | 20833.3333333333 | 48
(1 row)
Time: 400.475 ms
-- No Skew in Data
--- Start two gpfdist Instances (ssl and no ssl) on server etlserver (connected to the 10GB VLAN on the DCA)
gpfdist -p 8080 -d $PWD -l $PWD/gpfdist_ssl.log -m 4000000 --ssl /certs_dir &
gpfdist -p 8081 -d $PWD -l $PWD/gpfdist_nossl.log -m 4000000 &
-- Dump Data over gpfdist SSL
mydb=> create writable external table ext_w_tab1_ssl (like tab1) location('gpfdists://etlserver:8080/tab1_data_ssl.txt') format 'text';
NOTICE: Table doesn't have 'distributed by' clause, defaulting to distribution columns from LIKE table
CREATE EXTERNAL TABLE
mydb=> insert into ext_w_tab1_ssl select * from tab1;
INSERT 0 1000000
Time: 811513.791 ms
-- Data Transfer Rate
mydb=> select 218.0 / (811513.791/1000) "Data Transfer MB/Sec";
Data Transfer MB/Sec
------------------------
0.26863375880694059579
(1 row)
-- Dump data over gpfdist (no SSL)
mydb=> create writable external table ext_w_tab1_nossl (like tab1) location('gpfdist://etlserver:8081/tab1_data_nossl.txt') format 'text';
NOTICE: Table doesn't have 'distributed by' clause, defaulting to distribution columns from LIKE table
CREATE EXTERNAL TABLE
mydb=> insert into ext_w_tab1_nossl select * from tab1;
INSERT 0 1000000
Time: 8147.634 ms
-- Data Transfer Rate
mydb=> select 218.0 / (8147.634/1000) "Data Transfer MB/Sec";
Data Transfer MB/Sec
----------------------
26.7562337728965243
(1 row)
====================================================
Test loading the same data over gpfdist
====================================================
-- Create table
mydb=> create table tab2 (like tab1);
NOTICE: Table doesn't have 'distributed by' clause, defaulting to distribution columns from LIKE table
CREATE TABLE
-- Load over gpfdist SSL
mydb=> create external table ext_r_tab1_ssl (like tab2) location('gpfdists://etlserver:8080/tab1_data_ssl.txt') format 'text';
CREATE EXTERNAL TABLE
mydb=> insert into tab2 select * from ext_r_tab1_ssl;
INSERT 0 1000000
Time: 12490.419 ms
-- Data Transfer Rate
mydb=> select 218.0 / (12490.419/1000) "Data Transfer MB/Sec";
Data Transfer MB/Sec
----------------------
17.4533776649126022
(1 row)
-- Load over gpfdist No SSL
mydb=> create external table ext_r_tab1_nossl (like tab2) location('gpfdist://etlserver:8081/tab1_data_ssl.txt') format 'text';
CREATE EXTERNAL TABLE
mydb=> insert into tab2 select * from ext_r_tab1_nossl;
INSERT 0 1000000
Time: 6578.678 ms
-- Data Transfer Rate
mydb=> select 218.0 / (6578.678/1000) "Data Transfer MB/Sec";
Data Transfer MB/Sec
----------------------
33.1373567759358339
(1 row)
My observation shows that data download over gpfdists is very slow . This is what I have found. 1. Data download using writable external table using gpfdists protocol (gpfdist with ssl) is about a 100 times slower than downloading over gpfdist (no ssl). 2. Data download speed does not increase while using gpfdists by using multiple gpfdist instances / multiple download files. The speed increases almost linearly when using plain gpfdist. 3. Data upload over gpfdists is slower than gpfdist, but by a much smaller margin (about 2 times). 4. Some observations: Data download speed using gpfdists: 0.27 MB/Sec Data download speed using gpfdist: 27 MB/Sec Data upload speed using gpfdists: 17.45 MB/Sec Data upload speed using gpfdist: 33.14 MB/Sec
|