Fault Detection and Recovery

posted Apr 28, 2017, 3:34 PM by Sachchida Ojha
The Greenplum Database server (postgres) subprocess named ftsprobe handles fault detection. ftsprobe monitors the Greenplum array; it connects to and scans all segments and database processes at intervals that you can configure.

If ftsprobe cannot connect to a segment, it marks the segment as “down” in the Greenplum Database system catalog. The segment remains nonoperational until an administrator initiates the recovery process.

With mirroring enabled, Greenplum Database automatically fails over to a mirror copy if a primary copy becomes unavailable. The system is operational if a segment instance or host fails provided all data is available on the remaining active segments.

To recover failed segments, a Greenplum administrator runs the gprecoverseg recovery utility. This utility locates the failed segments, verifies they are valid, and compares the transactional state with the currently active segment to determine changes made while the segment was offline. gprecoverseg synchronizes the changed database files with the active segment and brings the segment back online. 

Administrators perform the recovery while Greenplum Database is up and running.

With mirroring disabled, the system automatically shuts down if a segment instance fails. Administrators manually recover all failed segments before operations resume.