gpfdists (a secured version of gpfdist protocol)

The gpfdists protocol is a secure version of gpfdist, which enables encrypted communication and secure identification of the file server and the Greenplum Database to protect against attacks such as eavesdropping and man-in-the-middle attacks.
  • The protocol implements SSL security in a client/server scheme, with the following notable features:
  • Client certificates are required.
  • Multi-lingual certificates are not supported.
  • A Certificate Revocation List (CRL) is not supported.
  • The TLSv1 protocol is used with the TLS_RSA_WITH_AES_128_CBC_SHA encryption algorithm. These SSL parameters cannot be changed.
  • SSL renegotiation is supported.
  • The SSL ignore host mismatch parameter is set to false.
  • Private keys containing a passphrase are not supported for the gpfdist file server (server.key) and for the Greenplum Database (client.key).
  • Issuing certificates that are appropriate for the operating system in use is the users responsibility. Generally, converting certificates as shown in is supported.
- There may be a little performance impact seen on the gpfdists process, but i would suggest to test it on your cluster for a similar load while using gpfdist and gpfdists. Apologies, but we do not have a benchmark numbers, and the variation percentage may vary based on the volume of data, cluster size etc.

- Below are the steps which are required to implement gpfdists. 

Step 1: Create a folder under the segment data directory & master data directory with a name called gpfdists and move the below files already created:
- The client certificate file, client.crt
- The client private key file, client.key
- The trusted certificate authorities, root.crt
Note: You can identify the segment data directory loaction using the below sql:
select fselocation,hostname from pg_filespace_entry pf, gp_segment_configuration gp where pf.fsedbid=gp.dbid;
Step 2 : Create an external table with gpfdists protocol. Example:
CREATE EXTERNAL TABLE ext_expenses ( name text, 
date date, amount float4, category text, desc1 text ) 
LOCATION ('gpfdists://etlhost-1:8081/*.txt', 
Step 3: Put data under the path specified. Example:
Create a file call a1.txt under /var/load_files with the required delimiters

Step 4: Execute gpfdist service:
gpfdist -d /var/load_files -p 8081 --ssl $MASTER_DATA_DIRECTORY/gpfdists

Step 5: Fetch data from external table. 

Note: You can also use gpload with ssl option true, More details on YAML structure on the administration guide.

- security is a feature / option provided by gpfdist service. We will not be able to block the usage of execution of gpfdist. Use having the priviliges to execute gpfdist can run the service without ssl options. Security must be implemented at user level as well.

Note: You may further use iptables to strengthen the access to IP / port on which data is served via gpfdist.

Testing gpfdists

On the server side:
1) Change the ssl = off to ssl = on in postgresql.conf on the master node.

2) Create self-signed certs for the master node and for the test end user. It won’t be necessary for your env since your env will apply digital cert for the master node

3) Copy root.crt (you will need to extract the root cert from the master node cert once you get the cert), the master node cert, and the private key of the maste node cert (again, you may have to extract the private key from master cert you are getting) under the data directory of the master node

4) Change the permissions of the cert related files in 3) to 600

5) To force the client to use client side cert, add an hostssl entry in pg_hba.conf as “hostssl <database_name> <user_name> <client_host_ip> cert clientcert=1”. If not forcing client side cert, make it “hostssl <database_name> <user_name> <client_host_ip> password clientcert=0”.

6) Restart the server.
On the client (DIA in my env):

1) Create a self-signed cert for the end user.

2) Under the end-user’s home dir, create a directory “.postgresql”

3) Deploy the end user’s cert, the private key, and the root.crt under the .postgresql directory

4) Change the permissions of the above cert related files to 600

5) Connect to the server using the following command: psql “sslmode=verify-full” –h <master_node_ip/name> -U <user_name>