AWS provides a rich set of governance tools, but they require deliberate configuration for AI workloads. This guide covers the practical setup for governing AI systems on AWS.

Account Structure with AWS Organizations

Establish a multi-account structure that separates AI workloads by environment and sensitivity. A typical structure includes a management account (billing and Organizations management only), a security account (centralized logging and security tooling), a shared services account (model registry, artifact stores), and separate accounts for AI development, staging, and production. Use Organizational Units (OUs) to group AI accounts and apply policies consistently.

Service Control Policies for AI

SCPs restrict what services and actions are available in member accounts. For AI workloads, implement SCPs that restrict SageMaker, Bedrock, and EC2 GPU instances to approved regions (supporting data sovereignty requirements), prevent creation of public S3 buckets containing training data, require encryption on all SageMaker endpoints, prevent disabling of CloudTrail and GuardDuty, and restrict which foundation models can be accessed through Bedrock.

Tagging Strategy

Implement a mandatory tagging policy for all AI resources. Required tags should include project, environment, cost-center, data-classification, model-name, and owner. Use AWS Config rules to detect untagged resources. Tag enforcement is critical for AI workloads because GPU instances are expensive and orphaned resources accumulate costs quickly.

Cost Governance

AI workloads can generate significant costs from GPU training instances, Bedrock API calls, and large data storage. Implement AWS Budgets with alerts at 50%, 80%, and 100% of monthly targets. Use Cost Anomaly Detection to catch unexpected spending spikes from runaway training jobs. Configure SageMaker managed spot training to reduce training costs by up to 90%. Set up automatic shutdown of idle notebook instances and training jobs with maximum runtime limits.

Security Controls

IAM policies - Implement least-privilege access for AI resources. Data scientists should have access to development SageMaker notebooks but not production endpoints. Model deployment should require approval through a CI/CD pipeline. VPC configuration - Run SageMaker training and inference in private subnets with no direct internet access. Use VPC endpoints for S3, SageMaker, and Bedrock. Encryption - Enable default encryption on S3 buckets, EBS volumes, and SageMaker storage using KMS keys managed by the security team. Logging - Enable CloudTrail for all API calls, SageMaker experiment tracking, and Bedrock model invocation logging.

Data Governance

Configure S3 bucket policies and IAM roles to enforce data access controls. Use AWS Lake Formation for fine-grained access control on training data in data lakes. Implement S3 Object Lock for training data that must be retained for compliance. Use Macie to automatically discover and classify sensitive data in training datasets.

Compliance Monitoring

Deploy AWS Config rules to continuously monitor compliance. Key rules include checking that SageMaker endpoints use encryption, S3 buckets are not public, VPC flow logs are enabled, and IAM roles follow least-privilege principles. Use Security Hub to aggregate findings from Config, GuardDuty, Inspector, and Macie into a single dashboard. Map findings to specific compliance requirements (GDPR, NIS2, industry standards).

Automation

Define all governance controls as Infrastructure as Code using CDK or Terraform. This ensures consistency across accounts and environments. Use AWS Service Catalog to provide pre-approved AI infrastructure templates that include all required governance controls, allowing AI teams to provision compliant infrastructure without manually configuring security settings.