SOC 224 April 2026 8 min read

SOC 2 Business Continuity and Disaster Recovery Requirements

SOC 2 Availability criteria (A1.2, A1.3) require business continuity and disaster recovery plans. Learn what auditors look for and how to build compliant BC/DR controls.

Key Takeaways

Business continuity and DR requirements apply when you include the Availability Trust Service Criteria (A1.1–A1.3) in your SOC 2 scope.
A1.2 requires that you can restore systems within the timeframes committed to in your service agreements (RPO/RTO).
A1.3 requires that environmental protections (redundancy, failover) are in place and tested.
DR plans must be tested at least annually — a plan that was never tested is considered a design gap, not just an operational gap.
Recovery Time Objective (RTO) and Recovery Point Objective (RPO) must be defined, documented, and achievable by your actual backup and recovery infrastructure.

In this guide

Availability Criteria Overview
Defining RTO and RPO
Backup Requirements
Redundancy Architecture
Disaster Recovery Plan
DR Testing Requirements
Evidence Auditors Collect

Availability Criteria Overview

The Availability Trust Service Criteria contains three points: A1.1 requires that current processing capacity and usage are maintained and monitored; A1.2 requires that the entity's commitments regarding availability are met, including capacity planning and incident response for availability events; A1.3 requires that environmental protections, data backup, and recovery procedures are implemented and tested. Business continuity and disaster recovery are primarily addressed by A1.2 and A1.3.

The Availability criteria are optional — they apply only if you have included Availability in your SOC 2 scope. However, many SaaS companies add Availability because their customers have uptime requirements in service level agreements (SLAs), and customers want audit evidence that those commitments are backed by real controls. If your product has a 99.9% or higher uptime SLA, you will likely need to include Availability in your SOC 2 scope.

Even for companies that scope to Security only, some CC7 criteria overlap with availability concerns — CC7.4 (incident containment) and CC7.5 (post-incident review) apply to availability incidents as well as security incidents. A major outage that required incident response will be examined under CC7 even in a Security-only scope.

Defining RTO and RPO

Recovery Time Objective (RTO) is the maximum acceptable time from a disaster event to the restoration of normal service. Recovery Point Objective (RPO) is the maximum acceptable data loss measured in time — how old can the most recent backup be and still be acceptable? These two metrics define the performance requirements your DR infrastructure must meet.

Define RTO and RPO for each system tier in your scope: production application, production database, critical internal tools. Different tiers may have different objectives — your customer-facing application may have an RTO of 4 hours and RPO of 1 hour, while your internal analytics database may have an RTO of 24 hours and RPO of 24 hours. Document these objectives in your BC/DR policy.

RTO and RPO objectives must be achievable with your current backup and redundancy architecture — not aspirational. If you state an RPO of 1 hour but your database backups run daily, your stated objective is not achievable. Auditors will test this by reviewing your backup configuration and asking you to walk through the recovery steps for a hypothetical DR scenario.

Backup Requirements

A1.3 requires that data backup procedures are implemented and tested. For AWS-hosted SaaS, standard backup controls include: automated daily snapshots of RDS databases (with point-in-time recovery enabled for 7–35 days), S3 object versioning for critical data buckets, EBS snapshot schedules for EC2 instances with critical state, and regular exports of critical data to a separate AWS account or region to protect against account-level compromise.

Backups must be stored separately from the primary data. An RDS snapshot stored in the same AWS account as the primary database does not protect against an account compromise or a misconfigured deletion policy. Best practice: replicate database snapshots to a separate AWS account (a dedicated backup account with restricted access) and verify that the backup account is not accessible from the production account's IAM configuration.

Backup integrity must be periodically verified. A backup that cannot be restored is not a backup. Test your backup restoration process at least annually — ideally in a staging environment — and document the test: what was restored, how long it took, and whether the restored data was complete and consistent. This test doubles as your DR exercise for the database tier.

Redundancy Architecture

A1.3 requires that environmental protections prevent or minimize service disruptions. For cloud-hosted SaaS, this translates to architectural redundancy: multi-AZ (Availability Zone) deployment for databases and application servers, load balancing across multiple instances, auto-scaling groups that replace failed instances automatically, and health check monitoring with automatic traffic rerouting.

Multi-AZ RDS deployment means your database has a standby replica in a separate AWS Availability Zone. If the primary instance fails, RDS automatically fails over to the standby within 60–120 seconds without data loss (the synchronous replication means RPO ≈ 0 for AZ failures). Document your multi-AZ configuration as evidence of A1.3 environmental protections.

For higher availability requirements (active-active multi-region), Aurora Global Database provides cross-region replication with sub-second RPO and 1-minute RTO for region failures. This architecture is appropriate for companies with strict SLAs and global customer bases. Document the replication lag metrics and failover procedures in your DR plan regardless of which architecture you use.

Disaster Recovery Plan

A documented disaster recovery plan is required for A1.2 and A1.3. The plan should cover: (1) scope — which systems and services are covered; (2) disaster scenarios — natural disasters, cloud provider outages, data corruption, ransomware, human error; (3) decision criteria for declaring a disaster and activating the DR plan; (4) recovery procedures for each system tier with step-by-step instructions; (5) roles and responsibilities during a DR event; (6) communication procedures for internal and external stakeholders; and (7) recovery success criteria and return-to-normal procedures.

The DR plan does not need to be a lengthy document. A well-structured runbook of 10–20 pages covering the above elements, stored in a location accessible to all relevant staff (and not dependent on the system that just went down), is more valuable than a 200-page comprehensive plan that nobody has read. Store a copy of your DR runbook outside your primary cloud environment — in a secondary account, a SaaS document system, or an offline location.

Contact information for all relevant parties — AWS support, your cloud infrastructure engineers, your executive team, your key customers — should be in the DR plan and verified annually. A DR plan that references a phone number for the now-departed former CTO is a planning gap that tabletop exercises will reveal.

DR Testing Requirements

A1.3 explicitly requires that recovery procedures are tested. An untested DR plan is considered a design gap. SOC 2 expects at least annual DR testing. The test must cover restoration from backup — not just a verification that backups exist, but an actual restoration exercise that validates your stated RTO and RPO objectives.

DR test approaches range from a full failover test (simulate the primary environment going down and execute the DR plan to full recovery in a staging environment) to a tabletop exercise with restoration test components (walk through the DR scenarios with the team while also testing database restoration in an isolated environment). Full failover tests are more conclusive but require more preparation. Tabletop + restoration tests are more common for first-time SOC 2 companies.

Document the DR test results: date, participants, scenario tested, time to recovery (actual vs. target RTO), data completeness of restored backup (actual vs. target RPO), issues identified, and action items for improvements. This documentation is the primary evidence for A1.3 DR testing. Auditors will ask whether the test results were within your stated RTO/RPO objectives and what happened when they were not.

Evidence Auditors Collect

For Availability criteria, auditors typically request: (1) your BC/DR policy with defined RTO/RPO objectives; (2) backup configuration screenshots showing frequency, retention, and cross-account storage; (3) AWS console screenshots showing multi-AZ configuration for databases and load balancing; (4) DR test report from the most recent annual test; (5) uptime monitoring data for the observation period (CloudWatch, UptimeRobot, Datadog) showing availability metrics against your SLA commitments; and (6) any availability incident records with post-incident reviews.

Uptime monitoring evidence is often overlooked. If your SLA commits to 99.9% availability, the auditor will want to see that you actually measured and tracked uptime during the observation period. A dashboard screenshot from your monitoring tool showing uptime percentage over the 12-month period is straightforward evidence that A1.1 (current availability monitoring) is operating.

For SaaS companies that added Availability to their scope, the most common gap is the absence of a tested DR plan. Many companies have backups in place and multi-AZ architecture, but have never formally tested and documented the recovery process. Scheduling and executing a DR test as part of audit preparation is high-impact work that is also operationally valuable regardless of its audit implications.

Frequently Asked Questions

Do I need the Availability TSC if my product has an SLA?

If your service agreement commits to specific availability levels (e.g., 99.9% monthly uptime), adding the Availability TSC to your SOC 2 scope is strongly recommended. Without it, customers cannot independently verify through the audit that your availability commitments are backed by real controls. Many enterprise customers specifically request that Availability be in scope for SaaS products they depend on for business-critical functions.

Can we use a cloud provider's native backup as our DR strategy?

Yes. AWS automated backups (RDS automated snapshots, S3 versioning, EBS snapshots) are acceptable for SOC 2 DR purposes if they are properly configured with retention that meets your RPO requirements and cross-account or cross-region storage. Document the backup configuration and demonstrate that restoration has been tested. A pure reliance on cloud-native backups with no cross-account redundancy creates a risk of simultaneous backup loss in an account-level event, which should be documented in your risk register.

How is BC/DR different from incident response?

Incident response (CC7.3–CC7.5) addresses the security response lifecycle: detection, containment, remediation. Business continuity and DR (A1.2–A1.3) addresses the operational recovery lifecycle: restoring systems and services to normal operation after a disruption. They overlap in major incidents that have both security and availability dimensions (e.g., ransomware), where the incident response and DR plans must coordinate. Treat them as complementary programs with distinct triggers and procedures.

Our startup runs entirely on AWS. Does that mean our DR is AWS's responsibility?

No. AWS operates under a shared responsibility model. AWS is responsible for the availability of its infrastructure services (their hardware, network, and hypervisors). You are responsible for the availability of your application, your data, and your recovery procedures within AWS. An AWS region outage would not be attributed to AWS as your DR failure — your DR plan must address how you respond to regional outages, including whether you maintain cross-region redundancy.

How long should our DR test take and what does "passing" look like?

A passing DR test is one where you restored a backup to a functional state within your stated RTO and the restored data was complete within your stated RPO. A 4-hour RTO test that completed in 3.5 hours passes. A test that exceeded the RTO is not an automatic failure if you documented the gap, identified the cause, and created an action plan. Auditors evaluate whether the test was conducted, documented, and followed up — not whether it was perfect.