AI Systems Are Software Systems
Why production AI requires the same engineering discipline as any distributed system, and how this wiki covers the full stack of AI …
Why production AI requires the same engineering discipline as any distributed system, and how this wiki covers the full stack of AI …
Centralized gateway for routing, caching, rate limiting, and observability across multiple AI model providers. A single control plane for …
The testing pyramid adapted for AI systems: unit tests for deterministic logic, integration tests with mocked models, evaluation tests with …
Automatically generate and update architecture diagrams by having AI analyze codebases, infrastructure-as-code, and service dependencies.
AI predicts optimal cache TTLs and invalidation timing based on access patterns and data change frequency, solving the 'two hard problems' …
Using ADRs and architecture evaluation methods like ATAM to document and assess architecture decisions in AI/ML systems.
Processing large volumes of AI inference requests efficiently. Queue design, throughput optimization, error handling, and cost management …
Comparing batch and real-time inference patterns for ML models, covering architecture, cost, latency, and when to use each approach.
What a bounded context is, how it defines model boundaries in DDD, and how it guides microservice decomposition.
A practical guide to building production AI chatbots, covering architecture, conversation design, context management, guardrails, and …
What clean architecture is, how dependency inversion organizes code layers, and when this structure benefits AI applications.
An AI architecture that combines multiple models, retrievers, tools, and programmatic logic to solve tasks that exceed the capabilities of …
How compound AI systems combine multiple models, retrievers, tools, and control logic to achieve capabilities beyond what single models can …
What CQRS is, how it separates read and write models, and when this pattern improves AI application architecture.
The foundational pattern: user input goes to a model API, model response comes back. When this is enough and when you need something more.
What domain-driven design is, how it aligns software architecture with business domains, and when to invest in DDD.
Architecture pattern for building machine learning training and inference pipelines that satisfy GDPR requirements for data minimization, …
Maintaining service quality when AI components fail or degrade. Fallback strategies, feature flags, cached responses, and partial …
What hexagonal architecture is, how ports and adapters decouple business logic from infrastructure, and practical implementation guidance.
Using Conway's Law strategically to design AI team structures that produce the desired system architecture, avoiding accidental complexity.
How to design a centralized LLM access layer that handles routing, rate limiting, cost tracking, caching, and logging across multiple model …
Architectural patterns for giving AI systems memory across conversations, from sliding context windows to persistent vector stores and user …
Comparing microservice and monolithic architectures for AI applications, covering deployment patterns, team structure implications, and …
Combining multiple models for improved accuracy, reliability, and coverage. Voting, cascading, and specialization ensemble strategies.
Route AI requests to different model tiers based on complexity, cost sensitivity, and quality requirements. Reduce spend without sacrificing …
Strategies for routing requests to different AI models based on task complexity, cost constraints, and latency requirements. Router design, …
Architecture pattern for deploying AI systems across multiple regions while respecting data sovereignty requirements, covering data …
Serving multiple customers from shared AI infrastructure while maintaining data isolation, fair resource allocation, and per-tenant …
An orchestrator LLM decomposes complex tasks and delegates subtasks to specialized worker models or agents, coordinating results into a …
What the ports and adapters pattern is, how it structures application boundaries, and its relationship to hexagonal architecture.
How to design and implement prompt chains for complex AI tasks, covering chain architecture, error handling, optimization, and practical …
Decision framework for choosing between real-time and batch AI processing. Latency requirements, cost tradeoffs, hybrid architectures, and …
Implementing streaming responses from LLMs for improved perceived latency. Server-sent events, chunked processing, and progressive …
Comparing REST and GraphQL API designs for AI applications, covering streaming support, query patterns, caching, and practical …
When to use a single AI agent versus a multi-agent system, covering complexity, reliability, cost, and practical decision criteria.
Architecture decisions, ADRs, and trade-offs for AI systems covering serving patterns, training infrastructure, and system decomposition.
What the twelve-factor methodology is, how it guides cloud-native application design, and which factors matter most in practice.
The AWS ML Lens extends the Well-Architected Framework to cover ML lifecycle phases, ML pipeline automation, model security, inference …
Semantic caching, Anthropic prompt caching, response caching, and embedding caching for AI applications. Cost savings analysis and …
The Well-Architected pillar covering right-sizing, reserved capacity, spot instances, and cost allocation - and how it applies to AI …
How to build an AI video processing pipeline that spans on-premises storage and AWS cloud using FSx for NetApp ONTAP as a hybrid bridge, …
The Well-Architected pillar covering runbooks, automation, observability, incident response, and continuous improvement - and how it applies …
The Well-Architected pillar covering compute selection, storage, database, and networking choices - and how it applies to AI workloads …
The Well-Architected pillar covering fault tolerance, disaster recovery, health checks, and scaling - and how it applies to AI workloads …
The Well-Architected pillar added in 2021 covering efficient resource usage, managed services, and data lifecycle management - and how it …
What the Well-Architected Framework is, its origins at AWS, how Azure and GCP adopted it, its six pillars, and why it matters especially for …
Apply cheap analysis first, score results, then apply expensive analysis only to candidates that pass a threshold. Reduces AI API costs by …
How AI system architecture evolves from monolithic single-model deployments through microservices to collaborative multi-agent systems, with …
What the circuit breaker pattern is, why AI services need it for handling model timeouts and rate limits, and how to implement it with AWS …
Handling model failures gracefully in production AI systems: fallback strategies, degraded mode operation, retry with backoff, and …
What event sourcing is, why it matters for AI audit trails and pipeline replay, its relationship to CQRS, and when to apply it in AI …
Using event-driven architecture patterns for AI data pipelines: immutable event logs, replay capability, audit trails, and CQRS for …
How to decompose AI systems into independent services with clear boundaries, API contracts, and independent deployability - treating AI …
How to gradually replace manual processes and legacy rule-based systems with AI using the strangler fig pattern: routing traffic …
How each of the 12 original 12-factor app principles applies to AI and LLM-based systems: model configuration, artifact management, vector …
Model selection by task, caching strategies, batch vs real-time processing, and tiered inference with Haiku, Sonnet, and Opus.
Model cards, decision logging, bias detection, approval workflows, audit trails, compliance documentation, and EU AI Act considerations.
Architecture guide for an end-to-end AI video pipeline: S3 ingest, Lambda trigger, Rekognition analysis, Bedrock processing, FFmpeg editing, …
Architecture and lessons from building a production AI pipeline that processes, indexes, and makes searchable a large library of broadcast …
A practical introduction to multi-agent AI architectures: when to use them, how they work, and which frameworks are production-ready.
Robert C. Martin's architectural pattern for organizing software so that business logic is independent of frameworks, databases, and …
The Gang of Four catalog of reusable solutions to recurring object-oriented design problems. Patterns are not code to copy - they are …
Eric Evans's approach to software design that aligns the structure and language of code with the business domain. Bounded contexts, …
Five principles of object-oriented class design formulated by Robert C. Martin. A foundational framework for writing code that is easy to …
The cloud architecture review methodology used by AWS, Azure, and Google Cloud to evaluate workloads against proven best practices across …