Cost-Optimization

19 articles

All articles 19 total

FinOps for AI: Controlling the Cost of LLMs and GPUs New A practical guide to controlling AI spend. Learn the cost drivers behind … Guides

Added 29 Jun · Upd 29 Jun ·9 min

Multi-Model Routing New How to route each query to the right LLM to cut cost and add … Guides

Added 23 Jun · Upd 23 Jun ·9 min

AWS Previews a FinOps Agent for Cloud Cost New AWS introduced a FinOps Agent in preview: an agentic assistant for … News

Added 15 Jun · Upd 16 Jun ·3 min

AI Cost Optimization Patterns Model selection by task, caching strategies, batch vs real-time … Patterns

Added 24 Mar · Upd 30 May ·3 min

Auto-Scaling What auto-scaling is, how it adjusts capacity dynamically, and how to … Glossary

Added 28 Mar · Upd 30 May ·2 min

Batch Inference Patterns for AI Workloads Processing large volumes of AI inference requests efficiently. Queue … Patterns

Added 28 Mar · Upd 30 May ·3 min

Capacity Planning for AI Inference How to right-size GPU and TPU clusters, configure autoscaling for … Guides

Added 28 Mar · Upd 30 May ·4 min

Cost Estimation for AWS AI Services How to estimate and manage costs for AI workloads on AWS, covering … Guides

Added 28 Mar · Upd 30 May ·4 min

Cost Optimization (Well-Architected Pillar) The Well-Architected pillar covering right-sizing, reserved capacity, … Glossary

Added 26 Mar · Upd 30 May ·5 min

GPU Pooling Shared GPU infrastructure with intelligent scheduling: maximizing GPU … Patterns

Added 28 Mar · Upd 30 May ·3 min

LLM Routing Architectures that direct each request to one of several available … Glossary

Added 8 May · Upd 30 May ·5 min

Model Distillation Patterns for Production AI Using large model outputs to train smaller, cheaper, faster models for … Patterns

Added 28 Mar · Upd 30 May ·3 min

Model Tier Routing Matching Request Complexity to Model Cost Patterns

Added 28 Mar · Upd 30 May ·3 min

Multi-Model Routing Patterns Strategies for routing requests to different AI models based on task … Patterns

Added 28 Mar · Upd 30 May ·4 min

Plan-and-Execute Pattern Separating Planning from Execution in AI Agents Patterns

Added 28 Mar · Upd 30 May ·4 min

Prompt Caching Server-side caching of attention key/value tensors for repeated prompt … Glossary

Added 8 May · Upd 30 May ·5 min

Reducing LLM Inference Costs in Production Practical strategies for reducing LLM API and hosting costs without … Guides

Added 28 Mar · Upd 30 May ·3 min

Semantic Caching for AI Applications Caching AI model responses based on semantic similarity rather than … Patterns

Added 28 Mar · Upd 30 May ·3 min

Token Budget The maximum number of tokens allocated for an LLM request or workflow, … Glossary

Added 28 Mar · Upd 30 May ·2 min

Open source projects

Freelancer Templates Contracts, proposals, SOWs

Freelancer Automation Workflow recipes, AI playbooks

Work with Linda

Workshop Series €2,000/mo x 3

1:1 Consulting 60 min session