tablespace in Greenplum database

posted Apr 28, 2017, 4:27 PM by Sachchida Ojha
GP uses tablespace and filespace to maintain corresponding space. Including commands for your reference (took out additional text):

[gpadmin@mdw sachi]$ gpfilespace -o gpfilespace_config
Enter a name for this filespace
> sachi_disk1

Checking your configuration:
Your system has 4 hosts with 6 primary and 6 mirror segments per host.
Your system has 1 hosts with 0 primary and 0 mirror segments per host.

Configuring hosts: [sdw4, sdw1, sdw2, sdw3]

primary location 1> /data1
primary location 2> /data1
primary location 3> /data1
primary location 4> /data1
primary location 5> /data1
primary location 6> /data1

Please specify 6 locations for the mirror segments, one per line:

mirror location 1> /data1
mirror location 2> /data1
mirror location 3> /data1
mirror location 4> /data1
mirror location 5> /data1
mirror location 6> /data1

Configuring hosts: [mdw]

Enter a file system location for the master
master location> /data
Creating configuration file... [created]

To add this filespace to the database please run the command:

gpfilespace --config /home/gpadmin/sachi/gpfilespace_config

[gpadmin@mdw sachi]$ ls -ltr

total 4

-rw-rw-r-- 1 gpadmin gpadmin 1118 Jul 17 18:26 gpfilespace_config

[gpadmin@mdw sachi]$  gpfilespace --config /home/gpadmin/sachi/gpfilespace_config

A tablespace requires a file system location to store its database files. A filespace is a collection of file system locations for all components in a Greenplum system (primary segment, mirror segment and master instances). Once a filespace is created, it can be used by one or more tablespaces.

Reading Configuration file: '/home/gpadmin/sachi/gpfilespace_config'

Performing validation on paths


Connecting to database
Filespace "sachi_disk1" successfully created

[gpadmin@mdw sachi]$ psql -d postgres
psql (8.2.15)
Type "help" for help.

postgres=# create tablespace sachi_tbsp filespace sachi_disk1;


postgres=# grant create on tablespace sachi_tbsp to gpadmin;
postgres=# create table ummc_1 ( x integer) tablespace sachi_tbsp;

NOTICE:  Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'x' as the Greenplum Database data distribution key for this table.
HINT:  The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

Most of the above is self explanatory, attaching a PDF doc for your reference (pages 72 and 76).

Database partitioning for UMMC

Connection to database is based on user, defined by Postgres role. Role is synonymous to user. pg_hba covers external connectivity from database point of view and similar to a simple firewall from database perspective. First record in pg_hba.conf determines connections, not the last entry. We are probably a bit familiar with pg_hba.conf file by now.

User is authenticated by password or as a Linux OS user or through SSL certificate. There is a corresponding backend process for each connection and only serves one active session at a time. Access is based on:

  1.  Host access (pg_hba.conf)
  2.  Database connect privileges (at user level)
  3.  User (or group, if the user belongs to this group) to be given usage on schema
  4.  Need to have rights to use relations, etc.
User need to have login for accessing GP (Postgres) and connect for reaching a database. Users and groups are roles in Postgres, the major difference is that user can have login access. 

General recommendation:
  1. Create a user with login privileges (password). Make sure connect permission is set.
  2. Add an entry in pg_hba.conf (local   all     user_xxx        md5  (OR) host     all         user_xxx            trust). Make sure that user is able to login without pg_hba.conf error.
  3. Create a group (group_abc) and add above user to this group. Grant all on schema schema_xyz to group_abc.
  4. Give individual permissions to the group. Easier to manage as a group, instead of as a user. In other words, we can include number of users to this group and thus easier to manage.
  5. Maintain relations in a non-public schema.