Cost Estimation for AWS AI Services
How to estimate and manage costs for AI workloads on AWS, covering Bedrock, SageMaker, compute, storage, and strategies for cost optimization.
AWS AI service costs are notoriously hard to predict. Pricing models vary by service (per-token, per-hour, per-request), costs scale non-linearly with usage, and hidden charges (data transfer, storage, logging) add up quickly. This guide covers how to estimate costs accurately and avoid budget surprises.
Cost Components
Foundation Model Inference (Amazon Bedrock)
Bedrock pricing is per-token for on-demand usage:
Input tokens are charged at one rate, output tokens at a higher rate. For Claude models on Bedrock, input tokens cost roughly $3-$15 per million tokens and output tokens $15-$75 per million tokens, depending on the model variant. Pricing changes frequently, so always check current rates.
Estimation approach:
- Estimate average input size (tokens per request)
- Estimate average output size (tokens per request)
- Estimate daily request volume
- Calculate: (input_tokens x input_price + output_tokens x output_price) x daily_volume x 30
Example: 1,000 requests/day, average 500 input tokens and 200 output tokens, using Claude Sonnet:
- Monthly input cost: 500 x 1,000 x 30 x $3/1M = $45
- Monthly output cost: 200 x 1,000 x 30 x $15/1M = $90
- Total: ~$135/month
For higher volumes, Provisioned Throughput offers predictable pricing but requires commitment.
Model Training (SageMaker)
Training costs depend on instance type, training duration, and data volume:
Instance costs. ml.p3.2xlarge (single GPU) runs approximately $3.83/hour. ml.p4d.24xlarge (8 GPUs) runs approximately $37.69/hour. Multi-GPU training is faster but more expensive per hour.
Estimation approach: Benchmark a small training run, extrapolate to full dataset size, and add 30% buffer for hyperparameter tuning and failed runs.
Spot instances can reduce training costs by 60-90% for fault-tolerant training jobs. Configure checkpointing to avoid losing progress when instances are reclaimed.
Inference Hosting (SageMaker Endpoints)
Real-time endpoints run continuously and charge by instance-hour:
Right-sizing matters. An ml.m5.xlarge ($0.23/hour, ~$170/month) may be sufficient for low-traffic models. An ml.g5.xlarge ($1.41/hour, ~$1,015/month) is needed for GPU-dependent models.
Serverless inference charges per-request and eliminates idle costs, but has cold start latency. Suitable for low-traffic or latency-tolerant workloads.
Auto-scaling adjusts capacity based on traffic, but you pay for the minimum instance count at all times.
Data Storage and Transfer
S3 storage: $0.023/GB/month for standard tier. Training datasets and model artifacts add up. A 100GB dataset costs ~$2.30/month.
Data transfer: Transfer between AWS services in the same region is free for most services. Transfer out to the internet is $0.09/GB. Transfer between regions is $0.02/GB.
EBS volumes: SageMaker notebook instances and training jobs use EBS storage. 100GB gp3 volume costs ~$8/month.
Supporting Services
CloudWatch logging: $0.50/GB ingested. AI workloads that log model inputs and outputs can generate significant log volume. A system logging 1,000 requests/day with 1KB average per request: ~1.5GB/month, ~$0.75/month.
Lambda (for preprocessing/postprocessing): $0.20 per million requests plus $0.0000166667 per GB-second. Usually a small fraction of total cost.
OpenSearch (for vector search/RAG): A t3.medium.search instance costs $0.07/hour ($50/month). For production RAG, plan for 2-3 instances ($100-$150/month minimum).
Cost Estimation Template
| Component | Estimation Method | Monthly Cost Range |
|---|---|---|
| LLM inference (Bedrock) | Token volume x price per token | $50 - $10,000+ |
| Model training (SageMaker) | Instance-hours x instance price | $100 - $5,000+ |
| Inference hosting | Instance count x instance price x 730 hours | $170 - $3,000+ |
| Storage (S3, EBS) | GB stored x price per GB | $5 - $100 |
| Vector database | Instance type x instance count | $50 - $500 |
| Monitoring and logging | Log volume x ingestion price | $5 - $50 |
| Data transfer | GB transferred x transfer price | $0 - $100 |
Cost Optimization Strategies
Start with on-demand, move to commitments. Use on-demand pricing during development to understand actual usage patterns. Once usage is stable, commit to Savings Plans or Reserved Instances for 30-60% savings.
Use spot instances for training. Training jobs that support checkpointing can use spot instances safely. Set up automatic retry on interruption.
Right-size inference endpoints. Monitor CPU/GPU utilization on inference endpoints. Most are over-provisioned. An endpoint running at 10% utilization can likely use a smaller instance.
Implement caching. Cache LLM responses for repeated queries. If 20% of queries are repeated, caching saves 20% of inference costs.
Batch inference for non-real-time. Instead of real-time endpoints, use batch transform for workloads that can tolerate latency. Batch processing is significantly cheaper.
Monitor continuously. Set up AWS Cost Explorer alerts for AI-related services. Review costs weekly during development and monthly in production. Cost surprises almost always indicate a misconfiguration or unexpected usage pattern.
The most common cost mistake in AI projects is not monitoring costs during development. A developer who leaves a large training instance running over a weekend can generate thousands of dollars in charges. Implement tagging, alerts, and automatic shutdown policies from day one.
Need help implementing this?
Turn this knowledge into a working prototype. Our structured workshop methodology takes you from idea to deployed AI solution in three sessions.
Explore AI Workshops