SOC 2 Data Classification: How to Classify and Protect Data
Implement SOC 2 data classification. Define data categories, classification controls, and how to map classification levels to AWS controls for CC6 compliance.
- Data classification underpins CC6.1 (least privilege) and CC6.6 (protection of information) by defining what data needs which controls.
- A four-tier classification: Public, Internal, Confidential, Restricted maps well to SOC 2 and DPDP Act requirements.
- Classification drives control selection — Restricted data requires encryption, tightest access controls, and shortest retention.
- AWS Macie can automatically discover and classify sensitive data in S3.
- Every data type in your system description should have a classification and documented handling requirements.
In this guide
Why Data Classification Matters for SOC 2
SOC 2 CC6.1 requires that access to data is restricted based on the nature and sensitivity of the information. Without classification, you cannot demonstrate that you are applying appropriate controls to sensitive data — or that you even know what sensitive data you have.
Classification also enables proportional controls — not every piece of data needs the same level of protection, and over-protecting everything is expensive and creates usability friction. Classification defines which data gets which controls, making your security program efficient and explainable to auditors.
Data Classification Tiers
Public: Information intended for public disclosure. No access controls required. Examples: marketing materials, public documentation, published blog posts.
Internal: Information intended for employees only, not confidential but not public. Basic access controls (authentication required). Examples: internal product roadmaps, HR announcements, meeting recordings.
Confidential: Sensitive business information. Encryption at rest and in transit, role-based access, access logging. Examples: source code, financial reports, customer contracts, internal security policies.
Restricted: The most sensitive data. Maximum controls: encryption with customer-managed KMS keys, tightest access controls, access logging with review, shortest retention period. Examples: customer PII, payment data, health records, authentication credentials.
Mapping Classification to Controls
Each classification tier should map to a defined set of controls: access control requirements (who can access), encryption requirements (at rest, in transit), storage restrictions (approved storage systems), retention requirements, and disposal requirements.
For Restricted data in AWS: stored only in encrypted S3 buckets or RDS instances with customer-managed KMS keys, access restricted to named roles via IAM policies, access logging via CloudTrail data events, maximum retention period defined and enforced via S3 lifecycle policies, and deletion via AWS KMS key deletion or cryptographic erasure.
Document the classification policy and the control matrix. When auditors ask "what controls protect customer PII?", the answer should be a direct reference to your Restricted classification tier and its control requirements.
AWS Macie for Automated Classification
AWS Macie uses machine learning to automatically discover sensitive data in S3 buckets — it identifies PII (names, email addresses, phone numbers, SSNs, credit card numbers), credentials (API keys, passwords), and financial data.
Enable Macie: AWS console > Macie > Enable Macie. Configure a discovery job to scan your S3 buckets. Macie will generate findings for buckets containing sensitive data, bucket policy violations, and encryption misconfigurations.
Macie findings feed into AWS Security Hub, providing a centralized view of data sensitivity risk. Use Macie to validate that your classification assumptions are correct — if Macie finds PII in a bucket you classified as Internal, there is a classification gap to investigate.
Classification and DPDP Act Alignment
India's Digital Personal Data Protection (DPDP) Act creates obligations around personal data. Aligning your SOC 2 Restricted classification tier with DPDP Act "personal data" ensures that both sets of compliance requirements are addressed by the same controls.
DPDP Act also distinguishes "sensitive personal data" (financial data, health data, biometric data, sexual orientation, religious beliefs). Map this to a Restricted+ sub-tier if needed, with additional controls such as explicit consent management, purpose limitation, and data minimization.
A classification policy that addresses both SOC 2 and DPDP Act creates a single governance framework rather than two overlapping ones, reducing compliance overhead.
Data Classification Evidence
(1) Data classification policy defining tiers, handling requirements, and control mapping. (2) Data inventory (data map) listing data types, classification tier, storage location, and data owner. (3) AWS Macie findings report showing sensitive data discovery. (4) Control implementation evidence for Restricted data — KMS key policies, S3 bucket encryption, IAM access policies. (5) Employee training records showing data handling training was completed.
Frequently Asked Questions
Does SOC 2 require a formal data classification policy?
How do we classify data that spans multiple categories?
Can we use a two-tier classification (sensitive vs non-sensitive) for simplicity?
Do we need to classify data in our SaaS application customer database?
How do we handle data classification for data we receive from customers (uploaded files, etc.)?
Automate your compliance today
AuditPath runs 86+ automated checks across AWS, GitHub, Okta, and 14 more integrations. SOC 2 and DPDP Act. Free plan available.
Start for free