Creating a Pilot-Light Disaster Recovery Architecture in OCI

If a large-scale outage affects your production applications, you need the ability to restore the workloads quickly. Your business continuity plan should include a DR strategy that meets your recovery point, recovery time, and budget objectives. A pilot-light topology offers a balance between cost and recovery requirements.

The term pilot light refers to a small flame that is always lit in devices such as gas-powered heaters and can be used to start the devices quickly when required. In the context of DR, a pilot-light environment contains the core components of a given workload, with the latest configuration and critical data, running at a minimal scale at a location remote from the primary site. In a disaster at the primary site, you can quickly use the pilot-light components at the remote location to restore a production-scale environment.

This Architecture provides a multi-tier topology with redundant resources distributed across two Oracle Cloud Infrastructure regions.

List of Components

Regions
Availability Domains
Fault Domains
Virtual Cloud Networks (VCN) & Subnets
Bastion Host
Load Balancer
Internet Gateway
Dynamic Routing Gateway
NAT Gateway
Service Gateway
Compute Instances
Block Volumes
File Storage
Object Storage
Database

Recommendations for Pilot-Light DR Topology

Virtual Cloud Network

When you create each VCN, determine how many IP addresses your cloud resources in each subnet require. Using the Classless Inter-Domain Routing (CIDR) notation, specify a subnet mask and a network address range large enough for the required IP addresses. Use an address range that’s within the standard private IP address space.
Select an address range that doesn’t overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or in another cloud provider) that you intend to set up private connections.
After you create a VCN, you can’t change its address range.
When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.
Use regional subnets!

Security Lists

Configure the required security lists to allow cross-region replication of the database and file storage. Note that replication of the boot and block volumes doesn’t require communication between the hosts to which the volumes are attached.

Block Volumes Backup Policy

Configure a policy to take backups of the block volumes as frequently as necessary to meet your Recovery Point Objective (RPO).

Considerations

Performance

When planning the Recovery Point Objective (RPO) and Recovery Time Objective (RTO), consider the time required for volume backups to be copied across regions.

Availability

DNS steering management can redirect client traffic to the current production region after a failover.

Using compute shapes that provide locally attached NVMe devices, you can back up the data on these devices using traditional backup solutions that use object storage.

Cost

In the event of a failover from the primary to the standby region, you can quickly provide the required infrastructure using Terraform scripts. You can resize the database systems after provisioning them, so specify the minimum shape required initially and change to a larger shape after the failover.

Conclusion

Having a well-defined Disaster Recovery (DR) strategy is crucial for any organization that relies on production applications. A pilot-light topology is an effective and cost-efficient way to balance recovery requirements and budget objectives. This architecture provides redundancy by distributing resources across multiple Oracle Cloud Infrastructure regions, making it highly available and reliable. When designing the topology, considerations should be made for virtual cloud networks, subnets, security lists, block volumes, backup policies, performance, availability, and cost. By following these recommendations and best practices, organizations can restore their workloads quickly in the event of a disaster, minimizing downtime and ensuring business continuity.

Connect with experienced IT System Integrators, like Tangenz, an Oracle Preferred Partner, to learn more about OCI and its implementation assistance.

How to Create a Pilot-Light Disaster Recovery Architecture in OCI?

List of Components