DR Options in Greenplum Env

posted Apr 28, 2017, 4:07 PM by Sachchida Ojha
The first solution is a data domain appliance at the primary site, with an identical data domain box at the second site. Primary site backup kicks off, and 10 minutes after this is finished (or 1 hour after it starts, whichever comes first - these are the dd defaults) the first dd box starts to send/sync to the second site. The second site needs some kind of GP instance available to restore to. This can be a second dca (in which case it's often used for dev and test) or can be repurposed x86's, or could be VM based and optionally could be VMs on demand in a cloud based scenario, which really makes the cost of DR very small. It depends on how the customer wants to do it. In the event of a failure at the primary site, the offsite dd backup is used to restore to its local GP instance. Customer flips the ip addresses and the DW is up. If backups are overnight, the DW has lost up to one day of data but this is acceptable to most organisations. 

The second option uses SQLfire to create a true active active DW across two sites. This is like dualload etl but is actually simpler and more elegant. A SQL fire instance is needed at both sites (these are cheap - two servers each or even better a new DIA each). Data writes go to the primary site/instance and are replicated to the second SQL fire instance, sub-second. (Note there is a synchronous and an a-synchronous mode, you'll need to talk with your local gemfire/SQLfire expert to get a download on the nuances.) Both SQLfire instances write to their respective local GP instances, and then either GP box can be queried by the end user (in the simplest scenario it would be different groups of users hitting their nearest GP box). Ideally you would put a load balancing server in the middle to distribute queries and to provide seamless HA, but I'm not sure if or how this is done.