How-To15 May 2026 9 min read

SOC 2 CloudWatch Alerts: Monitoring Configuration That Satisfies Auditors

Build a SOC 2-ready CloudWatch monitoring setup — covering metric alarms for CC7, log-based anomaly detection, alert routing to PagerDuty or SNS, and evidence collection for auditors.

Key Takeaways

Create metric alarms for the five SOC 2 critical signals: error rate, latency, authentication failure rate, CPU/memory, and unauthorised API calls.
Use CloudWatch Logs Insights to detect anomalous patterns in access logs and ship findings to SNS for automated alerting.
Route alerts to PagerDuty or OpsGenie with a documented escalation policy — auditors want to see that alerts reach humans.
Enable CloudWatch Contributor Insights on API Gateway and ALB access logs to identify top error-producing sources.
Export alarm history to S3 quarterly as audit evidence for CC7.2 and CC7.3.

In this guide

CloudWatch and SOC 2 CC7 requirements
Five critical metric alarms to create
Log-based anomaly detection
Alert routing and escalation policies
Contributor Insights for access analysis
Collecting alarm evidence for auditors
CloudWatch SOC 2 checklist

CloudWatch and SOC 2 CC7 requirements

SOC 2 CC7 (System Operations) requires that you monitor your system for security events, performance degradation, and anomalous activity. CC7.2 specifically requires monitoring designed to identify potential security events. CC7.3 requires evaluation of security events and determination of their nature.

CloudWatch is the primary monitoring layer for AWS-hosted systems. Auditors do not require a specific number of alarms, but they do expect: coverage of authentication and authorisation failures, error rates above normal thresholds, and availability metrics.

The most common CC7 finding: alarms exist but are not routed to an on-call rotation. An alarm that fires and emails a shared mailbox that nobody reads is not a functioning control. Auditors verify that alarms reach humans through documented escalation policies.

Five critical metric alarms to create

1. Authentication failure rate: CloudWatch metric filter on CloudTrail logs for ConsoleLoginFailure events. Alarm when count > 5 in 5 minutes. Namespace: CloudTrailMetrics. This detects brute-force and credential stuffing attacks.

2. 5xx error rate: ALB metric HTTPCode_ELB_5XX_Count > 10 in 1 minute. Or API Gateway 5XXError metric. Alerts on systemic backend failures before customers report them.

3. UnauthorizedApiCalls: CloudTrail metric filter for errorCode = "AccessDenied" OR "UnauthorizedOperation". Alarm when count > 10 in 5 minutes. Detects misconfigured services or lateral movement attempts.

4. High CPU (for EC2/ECS): CPUUtilization > 80% for 5 consecutive minutes. Indicates capacity issues or runaway processes. Availability control under A1.1.

5. RootAccountUsage: CloudTrail metric filter for userIdentity.type = "Root". Alarm on any root account login — root should never be used in normal operations.

Log-based anomaly detection

Use CloudWatch Logs Metric Filters to convert log patterns into metrics. Create a metric filter for your application audit log: pattern `{ $.audit = true && $.outcome = "FAILURE" }` → metric AuditFailureCount. Alarm when AuditFailureCount > 20 in 10 minutes.

Enable CloudWatch Anomaly Detection on key metrics (request count, error count, latency). Anomaly Detection uses ML to establish a normal band and fires when the metric exceeds the band. Use `aws cloudwatch put-anomaly-detector` to create a detector on an existing metric.

For security-critical log groups (CloudTrail, VPC Flow Logs, ALB access logs), enable CloudWatch Logs Live Tail or Logs Insights queries as part of incident response runbooks. Document the specific Logs Insights query for each alert type so on-call engineers can investigate immediately.

Alert routing and escalation policies

Route alarms to SNS topics, then subscribe PagerDuty or OpsGenie to those topics. PagerDuty SNS integration: create an integration in PagerDuty, copy the HTTPS endpoint URL, and add it as an HTTPS subscription on your SNS topic.

Define escalation tiers in PagerDuty: Tier 1 (on-call engineer, 5-minute acknowledgment SLA), Tier 2 (engineering lead, 15-minute escalation), Tier 3 (VP Engineering, 30-minute escalation). Document this escalation policy in your Incident Response Policy document.

Auditors will ask for evidence of alert handling during the audit period. Export PagerDuty incident history for the period showing: alert fired, acknowledged time, resolution time, and root cause note. This satisfies CC7.3 (security event response documentation).

Contributor Insights for access analysis

Enable CloudWatch Contributor Insights on your ALB access logs to identify the top 100 IPs by request count and error count. High error count from a single IP can indicate a scanning or attack pattern. Enable with: `aws cloudwatch put-insight-rule --rule-name TopErrorIPs --rule-definition file://rule.json`.

Create a Contributor Insights rule for your application logs to find the top user IDs by failed authentication attempts. This surfaces account enumeration attacks that a simple count alarm might miss (1 failure per IP across many IPs).

View Contributor Insights results in the CloudWatch console under Insights > Contributor Insights. Screenshot the dashboard weekly and save to your security monitoring evidence folder. Include in your quarterly security review.

Collecting alarm evidence for auditors

Export your CloudWatch alarm configuration as JSON: `aws cloudwatch describe-alarms --output json > alarms.json`. This file shows auditors every alarm: metric name, threshold, period, and SNS topic target.

Pull alarm history for the audit period: `aws cloudwatch describe-alarm-history --alarm-name RootAccountUsage --start-date 2025-01-01 --end-date 2025-12-31 --output json`. This shows every state change (OK to ALARM to OK) with timestamps.

Organise alarm evidence by Trust Service Criteria: RootAccountUsage + UnauthorizedApiCalls + AuthFailureRate → CC7.2. 5XX rate + CPUUtilization → A1.1. PagerDuty incident history → CC7.3. This mapping helps auditors locate evidence without digging through your full archive.

CloudWatch SOC 2 checklist

Before your audit: (1) Five critical alarms created and in OK or ALARM state (not INSUFFICIENT_DATA). (2) All alarms route to SNS → PagerDuty/OpsGenie. (3) Escalation policy documented with acknowledgment SLAs. (4) Metric filters for auth failures and unauthorised API calls. (5) Anomaly Detection enabled on top 3 metrics. (6) Contributor Insights enabled on ALB access logs. (7) Alarm history exported for audit period. (8) Log retention 12 months minimum on all monitored log groups.

Common gaps: alarms exist but SNS topic has no subscriptions (alarm fires but nobody is paged); alarms in INSUFFICIENT_DATA state because the metric has no data (common for authentication failure alarms when no failures have occurred); log groups with Never Expire retention.

Frequently Asked Questions

How many CloudWatch alarms do I need for SOC 2?

There is no fixed number. Auditors assess coverage, not count. The five alarms in this guide (authentication failure, 5xx rate, unauthorised API calls, CPU, root account usage) cover the most common audit findings. Add alarms for any additional systems in scope (RDS, ElastiCache, custom application metrics).

Can I use Datadog instead of CloudWatch for SOC 2 monitoring?

Yes — Datadog is fully acceptable for SOC 2 monitoring. The controls are technology-agnostic. If you use Datadog, the same requirements apply: defined alert thresholds, routing to an on-call rotation, documented escalation policy, and retained alert history. Many auditors are familiar with Datadog dashboards.

What is the difference between a CloudWatch Alarm and a CloudWatch Event (EventBridge)?

CloudWatch Alarms trigger based on metric threshold breaches. EventBridge (formerly CloudWatch Events) triggers based on event patterns (API calls, state changes, scheduled times). Both are useful for SOC 2 monitoring: alarms for metric-based detection, EventBridge for event-driven detection (e.g., new IAM user created, security group changed).

How do I handle alarm flapping (alarm flipping ALARM/OK rapidly)?

Use the datapoints-to-alarm and evaluate-low-sample-count-percentile settings. For example, require 3 out of 5 evaluation periods to be in breach before triggering. This reduces false positives from transient spikes. Document your evaluation period settings in your monitoring runbook.

Does CloudWatch Container Insights satisfy SOC 2 monitoring for ECS/EKS?

Container Insights provides CPU, memory, network, and disk metrics for containers. Enable it for ECS and EKS workloads. It does not provide application-level logging (you still need structured logs from your application). Use both Container Insights and application-level CloudWatch Logs for full coverage.