t is essential to ensure that Greenplum Database host file system usage does not exceed 70% of the available space. Disk space monitoring helps protect database performance and avoid failures caused by full disk space. General Monitoring gp_toolkit Administrative Schema: The Greenplum Database includes the gp_toolkit schema, which collects system information through log files and operating system commands. You can run a query against the gp_toolkit to quickly view free disk space. Results are shown in bytes. =# SELECT distinct dfhostname, dfspace,dfdevice FROM gp_toolkit.gp_disk_free ORDER BY dfhostname; Returns the following results: postgres=# select distinct dfhostname,dfdevice,dfspace from gp_toolkit.gp_disk_free; dfhostname | dfdevice | dfspace -----------+------------+----------- fips4 | /dev/sdb1 | 145284456 fips3 | /dev/sdb1 | 145282396 (2 rows) Learn more about the gp_toolkit schema in the Greenplum Database Administration Guide: The gp_toolkit Administrative Schema, available on Powerlink. Shell Commands: Check free disk space with the shell command df –h: From the master server, run the gpssh (broadcast) utility to check free disk space on all hosts in your cluster: $ gpssh -f hostfile df -h [rh55-qavm62] Filesystem Size Used Avail Use% Mounted on [rh55-qavm62] /dev/sda1 9.7G 2.9G 6.4G 32% / [rh55-qavm62] /dev/sdb1 140G 137G 3.5G 98% /data ... [rh55-qavm62] 916G 421G 448G 49% /home/malshp Data Computing Appliance Disk Monitoring ConnectEMC Dial-Home: The Data Computing Appliance (DCA) supports dial-home functionality through the ConnectEMC support utility that collects and sends event data indicating system errors to EMC Global Services. Greenplum Performance Monitor: Performance Monitor includes a web-based user interface that displays query performance and health metrics. The „Health‟ tab displays disk utilization information. Greenplum Performance Monitor is pre-installed on every DCA. DCA Health Monitoring: Each DCA from DCA Software Version 1.1.0.0 includes extensive health monitoring software. This software collects Greenplum Database and system hardware information, including disk utilization. When free disk space reaches certain thresholds, the software sends the following data to EMC Support: Disk Space Warning: If a host‟s file system usage exceeds 80% of capacity, the system sends a message with the severity of Warning. Disk Space Error: If a host‟s file system usage exceeds 90% of capacity, the system sends a message with the severity of Error. Note: The DCA health monitoring software checks all of the hosts‟ file systems. (DCAs have multiple file systems.) Disk Space History using diskspace_history table in the gpperfmon database "month", "year", "day" , -- hostname, -- filesystem, sum(round((bytes_used / power(1024,4))::numeric,2)) used_tb, sum(round((bytes_available / power(1024,4))::numeric,2)) available_tb, sum(round((total_bytes / power(1024,4))::numeric,2)) total_tb from ( select date_trunc('hour', ctime) ctime_hour, to_char(ctime, 'mm') as "month", to_char(ctime, 'yyyy') as "year", to_char(ctime, 'ddd') as "day", to_char(ctime, 'Day') as weekday, to_char(ctime, 'hh24') as "hour", hostname, filesystem, bytes_used, bytes_available, total_bytes, row_number() over (partition by to_char(ctime, 'yyyy'),to_char(ctime, 'ddd'), hostname order by total_bytes desc) space_used_rank from diskspace_history where ctime >= '2010-01-01'::timestamp --ctime between date_trunc('day',localtimestamp- interval '6 day') and date_trunc('day',localtimestamp) and filesystem in ('/data1','/data2') and hostname not like 'etl%' ) a where space_used_rank = 1 group by weekday,day,month,year order by year,month,weekday,day; ========================================================================== Sample disk space data for Full DCA ========================================================================== select hostname,filesystem, round(total_bytes/power(1024,4)::numeric,2) totbyteinTB, round(bytes_used/power(1024,4)::numeric,2) usedinTB, round(bytes_available/power(1024,4)::numeric,2) availinTB from diskspace_now where filesystem in ('/data1','/data2') order by round(substr(hostname,4,3)::numeric,2),filesystem; "sdw1";"/data1";2.68;1.29;1.39 "sdw1";"/data2";2.68;1.29;1.39 "sdw2";"/data1";2.68;1.29;1.39 "sdw2";"/data2";2.68;1.30;1.38 "sdw3";"/data1";2.68;1.29;1.39 "sdw3";"/data2";2.68;1.28;1.40 "sdw4";"/data1";2.68;1.30;1.38 "sdw4";"/data2";2.68;1.29;1.39 "sdw5";"/data1";2.68;1.31;1.37 "sdw5";"/data2";2.68;1.29;1.39 "sdw6";"/data1";2.68;1.33;1.35 "sdw6";"/data2";2.68;1.30;1.38 "sdw7";"/data1";2.68;1.34;1.34 "sdw7";"/data2";2.68;1.31;1.37 "sdw8";"/data1";2.68;1.30;1.38 "sdw8";"/data2";2.68;1.38;1.30 "sdw9";"/data1";2.68;1.28;1.40 "sdw9";"/data2";2.68;1.29;1.39 "sdw10";"/data1";2.68;1.28;1.40 "sdw10";"/data2";2.68;1.28;1.40 "sdw11";"/data1";2.68;1.28;1.40 "sdw11";"/data2";2.68;1.29;1.39 "sdw12";"/data1";2.68;1.30;1.38 "sdw12";"/data2";2.68;1.28;1.40 "sdw13";"/data1";2.68;1.28;1.39 "sdw13";"/data2";2.68;1.28;1.40 "sdw14";"/data1";2.68;1.30;1.38 "sdw14";"/data2";2.68;1.29;1.39 "sdw15";"/data1";2.68;1.28;1.40 "sdw15";"/data2";2.68;1.28;1.40 "sdw16";"/data1";2.68;1.28;1.40 "sdw16";"/data2";2.68;1.30;1.38 ================================================================================================ Note: Greenplum DCA manuals says that Full RAC DCA has 4 GPDB modules and each module has1. 9TB or 31 TB capacity (uncompressed) 2. 4 server/Module Each Server contains 1. 2 Sockets/12 Cores, 48GB RAM 2. 12x600 GB or 2 TB Storage If you sum the /data1 and /data2 mounts size of all segment servers /2 (8 primary, 8 mirror) comes to 42.88 for all primary. Greenplum reserves approx 20% space for transaction logs, spill files etc so usable capacity of database is approx 36TB. ================================================================================================ Below scripts works per database (need to specify database name), you will need to give appropriate path for primary segment directories. . /usr/local/greenplum/greenplum-db/greenplum_path.sh Disk Cleanup activity - Finding space used by backup, database and log files. ======================================================== The following will help in finding out what is taking the space. You should do all this while logged in as gpadmin to mdw.1. If you have used gpssh before, you may already have a seghosts file, if not create one with the output of the below SQL. Basically the file should have the names of segments hosts one per line. select distinct hostname from gp_segment_configuration where content > 0 order by replace(hostname, 'sdw', '')::numeric; In my subsequent scripts I will assume this file as $GPHOME/hosts.seg 2. Find out total space used by primary segment databases (excluding log files and local backup files) [gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h --exclude=*pg_log* --exclude=*db_dump* -s /data[12]/primary/gpseg*" 3. Find out total space used by log files of primary segment databases [gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h -s /data[12]/primary/gpseg*/pg_log*" 3. Find out total space used by backup files of primary segment databases [gpadmin@sachi ~]$gpssh -f $GPHOME/hosts.seg "du –h -s /data[12]/primary/gpseg*/db_dumps*" 4. Find out total space used by un-cleaned temporary spool files. You can check using:
gpssh -f ~/hostfile "du -h -s /data[12]/primary/gpseg*/base/*/pgsql_tmp" Sample output $gpssh -f ~/hostfile "du -h -s /data[12]/primary/gpseg*/pg_log*" .............. [sdw10] 40M /data1/primary/gpseg54/pg_log[sdw10] 40M /data1/primary/gpseg55/pg_log [sdw10] 40M /data1/primary/gpseg56/pg_log [sdw10] 40M /data2/primary/gpseg57/pg_log ................ [ sdw1] 39M /data1/primary/gpseg1/pg_log [ sdw1] 40M /data1/primary/gpseg2/pg_log .............. [gpadmin@ms00597 ~]$ gpssh -f ~/hostfile "du -h -s /data[12]/primary/gpseg*/db_dumps*" [sdw16] 20K /data1/primary/gpseg90/db_dumps [sdw16] 20K /data1/primary/gpseg91/db_dumps [sdw16] 28K /data1/primary/gpseg92/db_dumps [sdw16] 28K /data2/primary/gpseg93/db_dumps ......... [ sdw9] 24K /data2/primary/gpseg52/db_dumps [ sdw9] 28K /data2/primary/gpseg53/db_dumps
2. There is catalog corruption resulting in tables in segment level which are not detected by master. 3. The gp_toolkit SQL was not run as gpadmin database user, so it failed to capture all tables and schemas. 4. There are multiple databases and we analyzed space of only one database. sachi=# \l+ List of databases Name | Owner | Encoding | Access privileges | Size | Tablespace | Description -----------+---------+----------+---------------------+--------+------------+--------------------------- gpadmin | gpadmin | UTF8 | | 28 MB | pg_default | gpperfmon | gpadmin | UTF8 | gpadmin=CTc/gpadmin | 283 MB | pg_default | : =c/gpadmin himanshu | gpadmin | UTF8 | | 28 MB | pg_default | postgres | gpadmin | UTF8 | | 28 MB | pg_default | sachi | gpadmin | UTF8 | =Tc/gpadmin | 30 MB | pg_default | : gpadmin=CTc/gpadmin : sachi=CTc/gpadmin : gpuser=CTc/gpadmin template0 | gpadmin | UTF8 | =c/gpadmin | 27 MB | pg_default | : gpadmin=CTc/gpadmin template1 | gpadmin | UTF8 | =c/gpadmin | 28 MB | pg_default | Default template database : gpadmin=CTc/gpadmin (7 rows) =============================================================================================== List of database and their sizes =============================================================================================== sachi=# \l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+--------------------- gpadmin | gpadmin | UTF8 | gpperfmon | gpadmin | UTF8 | gpadmin=CTc/gpadmin : =c/gpadmin himanshu | gpadmin | UTF8 | postgres | gpadmin | UTF8 | sachi | gpadmin | UTF8 | =Tc/gpadmin : gpadmin=CTc/gpadmin : sachi=CTc/gpadmin : gpuser=CTc/gpadmin template0 | gpadmin | UTF8 | =c/gpadmin : gpadmin=CTc/gpadmin template1 | gpadmin | UTF8 | =c/gpadmin : gpadmin=CTc/gpadmin (7 rows) sachi=# \l+ List of databases Name | Owner | Encoding | Access privileges | Size | Tablespace | Description -----------+---------+----------+---------------------+--------+------------+--------------------------- gpadmin | gpadmin | UTF8 | | 28 MB | pg_default | gpperfmon | gpadmin | UTF8 | gpadmin=CTc/gpadmin | 283 MB | pg_default | : =c/gpadmin himanshu | gpadmin | UTF8 | | 28 MB | pg_default | postgres | gpadmin | UTF8 | | 28 MB | pg_default | sachi | gpadmin | UTF8 | =Tc/gpadmin | 30 MB | pg_default | : gpadmin=CTc/gpadmin : sachi=CTc/gpadmin : gpuser=CTc/gpadmin template0 | gpadmin | UTF8 | =c/gpadmin | 27 MB | pg_default | : gpadmin=CTc/gpadmin template1 | gpadmin | UTF8 | =c/gpadmin | 28 MB | pg_default | Default template database : gpadmin=CTc/gpadmin (7 rows) =============================================================================================== ssh sdw1 du -h -s /data1 du -h -s /data1/primary du -h -s /data1/mirror du -h -s /data1/primary/gpseg* du -h -s /data1/primary/gpseg0/* ls -l /data1/primary/gpseg0/~ddboost ls -l /data1/primary/gpseg0/~ddboost ls -l /data1/primary/gpseg0/~ddboost ls -lR /data1/primary/gpseg0/~ddboost ssh sdw1 du -h -s /data1/primary/gpseg*/gp_dump* ssh sdw1 du -h -s /data1/primary/gpseg*/--ddboost ls -l *host* cat seg_hostfile gpssh -f seg_hostfile "du -h -s /data1/primary/gpseg*/--ddboost" gpssh -f seg_hostfile "du -h -s $ ssh sdw1 [gpadmin@sdw1 ~]$ du -h -s /data1 [gpadmin@sdw1 ~]$ du -h -s /data1/primary [gpadmin@sdw1 ~]$ du -h -s /data1/mirror [gpadmin@sdw1 ~]$ du -h -s /data1/primary/gpseg* [gpadmin@sdw1 ~]$ du -h -s /data1/primary/gpseg0/* [gpadmin@ms001 ~]$ ssh sdw1 du -h -s /data1/primary/gpseg*/gp_dump* [gpadmin@ms0017 ~]$ gpssh -f seg_hostfile "du -h -s /data1/primary/gpseg*/--ddboost" [gpadmin@ms0017 ~]$ gpssh -f seg_hostfile "du -h -s /data1/primary/gpseg*/gp_dump*20130415131834*" [gpadmin@sdw1 ~]$ ls -lR /data1/primary/gpseg0/--ddboost From the master ssh sdw1 [gpadmin@sdw1 ~]$ls -lR /data1/primary/gpseg0/--ddboost [gpadmin@sdw1 ~]$ du -h -s /data1/primary/gpseg0/--ddboost [gpadmin@sdw1 ~]$ du -h -s /data1/primary/gpseg0/* |