How do I remove a segment node from Greenplum cluster while database is up and running?

posted Apr 28, 2017, 4:59 PM by Sachchida Ojha
How to remove a data node from Greenplum cluster without down time.

There are situations when data node/segment node experiences a hardware issue and requires shutdown for some maintenance. In these situations Greenplum can handle fault tolerance without taking database down time.
Follow these steps before taking a segment node OFF from a running Greenplum cluster. This example assumes that there is one master, one standby, and  six segments nodes in the Greenplum cluster. It is assumed sdw6 must be removed from the Greenplum Database for hardware maintenance.

mdw - Master
smdw- Standby master
sdw1 -Segment node1
...
sdw6 -Segment node6.

Make sure all the primary and mirror segments of sdw6 are valid in other servers.
 
Following query should not return any segment status as "d."  All segments mode should show "s" or "c" and status "u."

 psql> select content,role,preferred_role,mode,status,hostname from gp_segment_configuration where content in (select content from gp_segment_configuration where hostname like 'sdw6') and hostname not like 'sdw6' order by 1;
 content | role | preferred_role | mode | status | hostname
---------+------+----------------+------+--------+----------
      24 | p    | p              | s    | u      | sdw5
      25 | p    | p              | s    | u      | sdw5
      26 | p    | p              | s    | u      | sdw5
      27 | p    | p              | s    | u      | sdw5
      28 | p    | p              | s    | u      | sdw5
      29 | p    | p              | s    | u      | sdw5
      30 | m    | m              | s    | u      | sdw7
      31 | m    | m              | s    | u      | sdw7
      32 | m    | m              | s    | u      | sdw7
      33 | m    | m              | s    | u      | sdw7
      34 | m    | m              | s    | u      | sdw7
      35 | m    | m              | s    | u      | sdw7

Note: Replace sdw6 with your server name.
 
If Step 2 is verified, you can take sdw6 out of the cluster without stopping the database.

Note: All current active queries will be failed and they will be rolled back depending upon their transaction status.
 
Now sdw6 is not part of the Greenplum Database cluster.
 
Perform the OS/hardware maintenance.
 
After maintenance, follow these steps to join the "sdw6" to the Greenplum Database cluster without stopping  the database:

6a.  Make sure sdw6 is reachable to all Greneplum cluster nodes.
6b.  In master node run the gprecoverseg utility to join sdw6 in Greenplum Database cluster.
6c.  Run gpstate -e to verify recovery status from master.
 
Any issues reported during this process contact Greenplum Technical Support. 

Notes: To perform this operation with Greenplum 3.x versions, if the GUC "gp_fault_action"  setting  is set to "read-only," the database goes into read-only mode. If "gp_fault_action" is set to "continue," the  database continues, but when you join sdw6 (in the previous example) after maintenance, the database requires a stop and start operation.
Notes: If you don't want active queries to be failed and need this to be handled in scheduled maintenance, shut down the database: Take sdw6 off (shut down) --> Start database. The database will detect sdw6 and fails over on mirrors.

Comments