Introduction

If a large-scale outage affects your production applications, you need the ability to restore the workloads quickly. Your business continuity plan should include a DR strategy that meets your recovery point, recovery time, and budget objectives. A pilot-light topology offers a balance between cost and recovery requirements.

The term pilot light refers to a small flame that is always lit in devices such as gas-powered heaters and can be used to start the devices quickly when required. In the context of DR, a pilot-light environment contains the core components of a given workload, with the latest configuration and critical data, running at a minimal scale at a location that’s remote from the primary site. In the event of a disaster at the primary site, you can use the pilot-light components at the remote location to restore a production-scale environment quickly.

This Architecture provides multi-tier topology that has redundant resources distributed across two Oracle Cloud Infrastructure regions.

List of Components

  • Regions
  • Availability Domains
  • Fault Domains
  • Virtual Cloud Networks (VCN) & Subnets
  • Bastion Host
  • Load Balancer
  • Internet Gateway
  • Dynamic Routing Gateway
  • NAT Gateway
  • Service Gateway
  • Compute Instances
  • Block Volumes
  • File Storage
  • Object Storage
  • Database

Recommendations for pilot-light DR topology

Virtual Cloud Network

When you create each VCN, determine how many IP addresses your cloud resources in each subnet require. Using the Classless Inter-Domain Routing (CIDR) notation, specify a subnet mask and a network address range that’s large enough for the required IP addresses. Use an address range that’s within the standard private IP address space.

Select an address range that doesn’t overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data centre, or in another cloud provider) that you intend to set up private connections to.

After you create a VCN, you can’t change its address range.

When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

Use regional subnets.

Security lists

To allow cross-region replication of the database and file storage configure the required security lists. Note that replication of the boot volumes and block volumes doesn’t require communication between the hosts to which the volumes are attached.

Block volumes backup policy

Configure a policy to take backups of the block volumes as frequently as necessary to meet your Recovery Point Objective (RPO).

Considerations

Performance

When planning the Recovery Point Objective (RPO) and Recovery Time Objective (RTO), consider the time required for volume backups to be copied across regions.

Availability

You can use DNS steering management to redirect client traffic to the current production region after a failover.

If you use compute shapes that provide locally attached NVMe devices, you can back up the data on these devices by using traditional backup solutions that use object storage.

Cost

In the event of a failover from the primary to the standby region, you can provision the required infrastructure quickly by using Terraform scripts. You can resize the database systems after provisioning them; so specify the minimum shape required initially, and change to a larger shape after the failover.