Controls5 May 2026 6 min read

SOC 2 Data Classification: How to Classify and Protect Data

Implement SOC 2 data classification. Define data categories, classification controls, and how to map classification levels to AWS controls for CC6 compliance.

Key Takeaways

Data classification underpins CC6.1 (least privilege) and CC6.6 (protection of information) by defining what data needs which controls.
A four-tier classification: Public, Internal, Confidential, Restricted maps well to SOC 2 and DPDP Act requirements.
Classification drives control selection — Restricted data requires encryption, tightest access controls, and shortest retention.
AWS Macie can automatically discover and classify sensitive data in S3.
Every data type in your system description should have a classification and documented handling requirements.

In this guide

Why Data Classification Matters for SOC 2
Data Classification Tiers
Mapping Classification to Controls
AWS Macie for Automated Classification
Classification and DPDP Act Alignment
Data Classification Evidence

Why Data Classification Matters for SOC 2

SOC 2 CC6.1 requires that access to data is restricted based on the nature and sensitivity of the information. Without classification, you cannot demonstrate that you are applying appropriate controls to sensitive data — or that you even know what sensitive data you have.

Classification also enables proportional controls — not every piece of data needs the same level of protection, and over-protecting everything is expensive and creates usability friction. Classification defines which data gets which controls, making your security program efficient and explainable to auditors.

Data Classification Tiers

Public: Information intended for public disclosure. No access controls required. Examples: marketing materials, public documentation, published blog posts.

Internal: Information intended for employees only, not confidential but not public. Basic access controls (authentication required). Examples: internal product roadmaps, HR announcements, meeting recordings.

Confidential: Sensitive business information. Encryption at rest and in transit, role-based access, access logging. Examples: source code, financial reports, customer contracts, internal security policies.

Restricted: The most sensitive data. Maximum controls: encryption with customer-managed KMS keys, tightest access controls, access logging with review, shortest retention period. Examples: customer PII, payment data, health records, authentication credentials.

Mapping Classification to Controls

Each classification tier should map to a defined set of controls: access control requirements (who can access), encryption requirements (at rest, in transit), storage restrictions (approved storage systems), retention requirements, and disposal requirements.

For Restricted data in AWS: stored only in encrypted S3 buckets or RDS instances with customer-managed KMS keys, access restricted to named roles via IAM policies, access logging via CloudTrail data events, maximum retention period defined and enforced via S3 lifecycle policies, and deletion via AWS KMS key deletion or cryptographic erasure.

Document the classification policy and the control matrix. When auditors ask "what controls protect customer PII?", the answer should be a direct reference to your Restricted classification tier and its control requirements.

AWS Macie for Automated Classification

AWS Macie uses machine learning to automatically discover sensitive data in S3 buckets — it identifies PII (names, email addresses, phone numbers, SSNs, credit card numbers), credentials (API keys, passwords), and financial data.

Enable Macie: AWS console > Macie > Enable Macie. Configure a discovery job to scan your S3 buckets. Macie will generate findings for buckets containing sensitive data, bucket policy violations, and encryption misconfigurations.

Macie findings feed into AWS Security Hub, providing a centralized view of data sensitivity risk. Use Macie to validate that your classification assumptions are correct — if Macie finds PII in a bucket you classified as Internal, there is a classification gap to investigate.

Classification and DPDP Act Alignment

India's Digital Personal Data Protection (DPDP) Act creates obligations around personal data. Aligning your SOC 2 Restricted classification tier with DPDP Act "personal data" ensures that both sets of compliance requirements are addressed by the same controls.

DPDP Act also distinguishes "sensitive personal data" (financial data, health data, biometric data, sexual orientation, religious beliefs). Map this to a Restricted+ sub-tier if needed, with additional controls such as explicit consent management, purpose limitation, and data minimization.

A classification policy that addresses both SOC 2 and DPDP Act creates a single governance framework rather than two overlapping ones, reducing compliance overhead.

Data Classification Evidence

(1) Data classification policy defining tiers, handling requirements, and control mapping. (2) Data inventory (data map) listing data types, classification tier, storage location, and data owner. (3) AWS Macie findings report showing sensitive data discovery. (4) Control implementation evidence for Restricted data — KMS key policies, S3 bucket encryption, IAM access policies. (5) Employee training records showing data handling training was completed.

Frequently Asked Questions

Does SOC 2 require a formal data classification policy?

Yes, implicitly. CC6.1 requires access controls based on the sensitivity of information, which requires knowing what data is sensitive. Most auditors ask to see a data classification policy and a data inventory as baseline CC6 evidence. Without classification, you cannot demonstrate that your access controls are calibrated to data sensitivity.

How do we classify data that spans multiple categories?

Apply the highest applicable classification. If a database table contains both public (product names) and Restricted (user PII) data, classify the entire table as Restricted. The protection level applied to any data container (S3 bucket, RDS database, Google Drive folder) should match the most sensitive data it contains.

Can we use a two-tier classification (sensitive vs non-sensitive) for simplicity?

Yes, a two-tier system is simpler and may be appropriate for small companies. The key is that sensitive data has defined, enforced controls. However, a four-tier system allows more nuanced control application and better aligns with DPDP Act personal data categories. Use whatever tier count you can operationalize consistently.

Do we need to classify data in our SaaS application customer database?

Yes. Your customers' data is typically your most sensitive data. Classify it as Restricted, define what customer data types you collect (profile PII, usage data, payment information), document who has access, and ensure maximum controls are applied. This is also a DPDP Act compliance requirement for personal data.

How do we handle data classification for data we receive from customers (uploaded files, etc.)?

Apply your Restricted classification to any customer-uploaded content until you can verify its sensitivity. Configure S3 object scanning (Macie or GuardDuty Malware Protection) on the upload bucket. Define in your data handling policy that customer-uploaded data is treated as Restricted by default.