Back to Blog
How-To 11 min read

SOC 2 Datadog Setup: Monitoring Alerts and SLO Evidence

SOC 2 Datadog setup guide covering monitors, SLOs, security signal rules, log management retention, and how to export Datadog data as CC7.2 and A1.1 audit evidence.

Key Takeaways
  • Datadog monitors with PagerDuty routing satisfy CC7.2 threat and anomaly detection requirements.
  • Datadog SLOs provide A1.1 availability evidence with percentage uptime reports exportable for auditors.
  • Datadog Log Management with 15-month retention satisfies logging evidence requirements across CC7 criteria.
  • Cloud SIEM security signal rules detect unauthorized access patterns and satisfy CC7.3 monitoring.
  • Datadog Audit Trail captures all Datadog user actions, providing evidence of monitoring system integrity.
  • Synthetic tests provide proactive A1.1 availability monitoring from external vantage points.

Datadog's Role in SOC 2 Evidence

Datadog is an observability platform covering metrics, logs, traces, and security. For SOC 2, it contributes evidence across three TSC categories: CC7 (monitoring and response), A1 (availability), and to a lesser degree CC6 (through security signal detection). Datadog holds a SOC 2 Type II report, available from their trust page, covering their platform infrastructure.

Your responsibility is using Datadog correctly: meaningful monitors with routing to on-call, log retention settings that cover your audit period, SLOs that track your availability commitments, and security signal rules that detect relevant threats. This guide covers all four with specific Datadog configuration paths.

Security Monitors (CC7.2, CC7.3)

Create Datadog monitors for security-relevant events. Navigate to Monitors → New Monitor. Key monitors to create: (1) Failed authentication spike — `sum(last_5m):sum:nginx.requests{status:401}.as_count() > 100` — alerts when 401 errors spike above 100 in 5 minutes, indicating a brute force attempt. Set notification to PagerDuty with `@pagerduty-security-high` for immediate on-call response.

(2) CloudTrail unauthorized API calls — if you have AWS integration, create a log monitor on CloudTrail logs: `@evt.name:ConsoleLogin @userAgent:signin.amazonaws.com @additional_event_data.MFAUsed:(No OR null)` — alerts on console logins without MFA. (3) New admin user created — `@evt.name:(CreateUser OR AddUserToGroup) @requestParameters.groupName:*admin*` — alerts on IAM admin provisioning. Route both to your security Slack channel and PagerDuty.

Set monitor notification messages to include runbook links. Every security alert should link to a runbook that tells the on-call engineer what to investigate and what constitutes a true positive. Runbooks serve double duty: they enable faster incident response and provide evidence that your monitoring is backed by documented procedures (CC7.3 response process).

Service Level Objectives (A1.1)

A1.1 requires that the system maintains agreed-upon performance and availability. Define SLOs in Datadog for your customer-facing services. Navigate to Service Mgmt → SLOs → New SLO. Create metric-based SLOs: SLI = `(sum:trace.web.request.hits{http.status_code:2xx,3xx} / sum:trace.web.request.hits{*})` — this measures the percentage of requests that return a successful response. Set the SLO target to 99.9% over a 30-day rolling window.

Add an error budget monitor: create a monitor that alerts when the error budget is 50% consumed within a 7-day window. This gives your team early warning before the SLO is breached. Navigate to the SLO → Create Alerting → Error budget alert. This satisfies A1.1 by demonstrating proactive availability management.

Export SLO reports for your audit period. Go to the SLO → Status and History → select the date range → Export to PDF. Include the SLO definition (target percentage, window, SLI formula) and the historical compliance report. If you have documented SLAs with customers in your contracts, align the SLO target with the SLA to demonstrate that your monitoring is calibrated to your availability commitments.

Log Management and Retention (CC7.2)

CC7.2 requires that security events are logged and retained. In Datadog Log Management, configure pipelines that parse and index security-relevant log sources: application logs, AWS CloudTrail, Kubernetes audit logs, Nginx access logs, and authentication service logs. Navigate to Logs → Pipelines to configure parsing rules.

Set index retention to 15 months for security-relevant logs. Navigate to Logs → Configuration → Indexes → [index] → Retention. 15 months covers your 12-month audit period plus buffer for the start of the next year. For high-volume logs (CDN access logs, application debug logs), use Log Archives — ship to S3 with indefinite retention and rehydrate to Datadog when needed for investigation.

Create log-based metrics for key security indicators: `security.failed_auth.rate` (count of 401/403 responses), `security.admin_actions.count` (count of admin API calls), `security.secret_access.count` (count of secrets manager API calls). These metrics feed into dashboards and monitors, providing a security posture overview that you can screenshot monthly as CC7.2 evidence.

Cloud SIEM Security Signals (CC7.3)

Datadog Cloud SIEM (formerly Security Monitoring) correlates log events into security signals using detection rules. Navigate to Security → Cloud SIEM → Detection Rules. Enable the OOTB rules for your log sources: AWS CloudTrail rules (impossible travel, root account usage, MFA disable), GitHub rules (branch protection disabled, secret exposed), and Kubernetes rules (privileged pod created, exec into container).

Create custom detection rules for your application. Example: `@evt.category:authentication @outcome:failure` with a threshold of 10 events from the same IP in 5 minutes creates a credential stuffing signal. Detection rules fire security signals that appear in the Signals Explorer and can be routed to PagerDuty or Slack via notification rules.

Configure a triage process for Cloud SIEM signals: every Critical signal must be investigated within 1 hour, High within 4 hours, Medium within 24 hours. Document this SLA in your incident response policy. Export the Signals Explorer for your audit period (filter by status: all, date: audit period) as CC7.3 response evidence. The export shows every signal, its severity, when it was triaged, and who handled it.

Synthetic Tests and Uptime (A1.1)

Datadog Synthetic Tests send real HTTP requests to your application from global PoPs every minute, providing external availability measurement. Navigate to Synthetic Monitoring → New Test → API Test. Create tests for your critical endpoints: health check (`GET /health` → expect 200 in <500ms), login (`POST /auth/login` with test credentials → expect 200 and token in response), and key user flows (create a multistep browser test for your primary user journey).

Configure global uptime dashboards using synthetic test metrics: `synthetics.test_runs` and `synthetics.http.response_time`. Add an uptime SLO based on the synthetic test: SLI = synthetic test pass rate over 30 days. This external SLO is more valuable for SOC 2 evidence than an internal metric because it measures what customers actually experience, not what the internal metrics show.

Datadog Audit Trail

Datadog Audit Trail (available in Enterprise plans) records all user actions within Datadog: API key creation, dashboard modifications, monitor changes, log pipeline edits, and user access. Navigate to Organization Settings → Audit Trail. Enable it and set retention to 90 days (the maximum within Datadog). For longer retention, configure Audit Trail archiving to S3.

The Audit Trail is important for SOC 2 because it proves that your monitoring system itself has not been tampered with. An auditor can verify that the monitors and log pipelines you show as evidence were configured at the start of your audit period and were not modified to retroactively improve the evidence. Export the Audit Trail for your audit period as a meta-evidence artifact showing monitoring system integrity.

Exporting Evidence for Auditors

Compile a Datadog evidence package for each audit period. Include: (1) SLO Status and History reports (PDF export) for all customer-facing services — shows A1.1 compliance. (2) Monitor configuration exports — use the Datadog API (`GET /api/v1/monitor`) to export all monitor definitions as JSON, proving the monitors existed and were configured correctly. (3) Cloud SIEM signal history for the audit period — shows CC7.3 detection and response activity.

(4) Log index retention configuration screenshot — shows CC7.2 logging is retained for the required period. (5) Synthetic test results (exported from Synthetic Monitoring → CI Results) — shows external availability measurement. Label each artifact with the SOC 2 criterion it satisfies and organize chronologically. Auditors reviewing a well-organized Datadog evidence package can complete their testing in hours rather than days.

Frequently Asked Questions

Which Datadog plan do we need for SOC 2 compliance?
The Pro plan covers Metrics, APM, Log Management, and Synthetic Monitoring — sufficient for most SOC 2 requirements. Cloud SIEM requires the Enterprise plan or an add-on. SLOs are included in Pro. Audit Trail is an Enterprise feature. Assess whether Cloud SIEM and Audit Trail justify the Enterprise cost versus using a separate SIEM tool.
How long should we retain logs in Datadog for SOC 2?
Retain security-relevant logs for at least 13 months to cover a 12-month audit period plus buffer. Datadog Log Management supports retention up to 15 months depending on your index configuration. For logs beyond 15 months or high-volume logs, use Datadog Log Archives to S3 with indefinite retention and rehydrate to Datadog when needed.
Can Datadog SLO reports replace uptime monitoring tools for SOC 2?
Yes, if the SLOs are based on external synthetic tests or customer-facing metrics. An internal SLO based purely on server-side metrics that shows 100% uptime while customers are experiencing errors would be misleading. Synthetic tests from Datadog PoPs provide external customer-perspective uptime measurement that is credible for SOC 2 A1.1 evidence.
Should we use Datadog as our SIEM or a dedicated SIEM like Splunk?
Datadog Cloud SIEM is suitable for companies that already use Datadog for observability and want a unified platform. It covers the most common log sources and detection use cases. Dedicated SIEMs like Splunk, Microsoft Sentinel, or Elastic SIEM offer more advanced correlation and threat hunting capabilities. For a SOC 2-only use case, Datadog Cloud SIEM is typically sufficient and avoids managing a separate tool.
What monitors are auditors most likely to ask about?
Auditors typically ask about: (1) unauthorized access attempts (failed auth monitoring), (2) privileged access usage (admin action logging), (3) availability monitoring (SLOs, uptime), and (4) security incident detection (SIEM signals). Prepare to demonstrate each of these monitors live during the audit walkthrough, not just show screenshots.

Automate your compliance today

AuditPath runs 86+ automated checks across AWS, GitHub, Okta, and 14 more integrations. SOC 2 and DPDP Act. Free plan available.

Start for free