There is a persistent misconception in the AI industry: that building AI is a different discipline from building software. This misconception is reinforced by how AI is taught, marketed, and discussed. Tutorials end at “call the API.” Notebooks are presented as deliverables. The gap between prototype and production is treated as someone else’s problem.

It is not. AI systems are distributed software systems with additional complexity. They have the same operational requirements as any production system - reliability, observability, deployment automation, configuration management, security - plus a layer of complexity unique to non-deterministic behavior, data dependencies, and model lifecycle management. This wiki is built on that premise.

The Misconception and Its Cost

Most AI tutorials are optimized for the shortest path to a working demo. That is useful for learning the capability. It is not useful for building a system that operates reliably at 3am when something breaks.

The pattern is consistent: a developer completes a tutorial, builds a prototype, and discovers that getting to production requires solving a different class of problem entirely. Prompt engineering is not enough when you need to version your prompts across environments. Calling an API is not enough when you need reproducible deployments, regression testing, secrets management, and cost monitoring.

The cost of this misconception is measured in failed deployments, production incidents, and AI projects that never escape the prototype stage. Industry estimates suggest 80-90% of AI proofs of concept fail to reach production. The technical capability is rarely the limiting factor. The surrounding engineering is.

The Four-Layer Model

Every production AI system operates across four distinct layers. Understanding these layers - and the work required at each - is the foundation for building systems that actually work.

Layer 1 - AI Capabilities

This is the domain covered by most AI tutorials: large language models, embedding models, agents, multimodal models, fine-tuning, evaluation, and prompt engineering. The capabilities themselves are genuinely remarkable and relatively accessible. Hosted model APIs have dramatically reduced the barrier to experimentation.

This is also where most tutorials stop.

AI capability is necessary but not sufficient for a production system. A model that generates correct output in a notebook and a model that generates correct output reliably under production load, across diverse inputs, with versioned prompts, measurable quality, and controllable cost, are separated by an engineering problem.

Layer 2 - Engineering Patterns

Engineering patterns are the translation layer between AI capabilities and software systems. This is where prototypes become systems. The key patterns include:

Retrieval-Augmented Generation (RAG) - Architecture for grounding model output in private knowledge without fine-tuning. RAG introduces its own engineering concerns: document ingestion pipelines, chunking strategies, embedding model selection, vector store management, retrieval quality evaluation, and context window management.

Orchestration - Managing multi-step AI workflows, tool calling, agent loops, and state. Orchestration frameworks reduce boilerplate but introduce dependency and versioning concerns that need to be managed deliberately.

Prompt versioning - Prompts are code. They need version control, testing, staged rollout, and rollback capability. Teams that treat prompts as configuration text embedded in application code discover this when a prompt change breaks production.

Observability patterns - LLM-specific observability: token usage tracking, latency distribution, output quality sampling, cost attribution, and anomaly detection on model behavior. Standard application monitoring is necessary but not sufficient.

Dataset lifecycle - Training and evaluation data requires the same rigor as production code: versioning, lineage tracking, quality validation, and access control.

Layer 3 - Software Engineering Foundations

This layer is the most frequently overlooked in AI-specific content, because it is not AI-specific. It is the software engineering discipline that makes any system maintainable and reproducible over time.

Version control workflows - Branching strategies, commit discipline, pull request review processes for AI projects where experiments and production code coexist.

Repository structure - Organizing code so that data science work, application code, infrastructure definitions, and evaluation pipelines can coexist without becoming unmaintainable.

Configuration management - Separating configuration from code, managing environment-specific values, and ensuring reproducibility across development, staging, and production environments.

CI/CD for AI - Automated testing and deployment pipelines that account for AI-specific concerns: model artifact versioning, evaluation gates before promotion, and rollback strategies for model updates.

Testing strategies - Unit tests, integration tests, and AI-specific evaluation frameworks working together. Testing non-deterministic systems requires different approaches than testing deterministic software.

Secrets and credentials management - API keys, model credentials, and data access tokens handled securely through a secrets management system, not hardcoded in notebooks or environment files committed to version control.

Dependency management - Reproducible environments across the team and across deployment targets. Python package management for AI projects has specific challenges around GPU dependencies and model artifact compatibility.

Layer 4 - Infrastructure

Infrastructure is what makes systems scalable and reliable under real operating conditions.

Compute patterns - Serverless functions for low-throughput inference, GPU workloads for high-throughput or latency-sensitive applications, and the tradeoffs between managed and self-managed compute.

Storage architecture - Object storage for model artifacts and training data, structured storage for evaluation results and metadata, and caching layers for repeated inference requests.

Data pipelines - Ingestion, transformation, and quality validation for the data that feeds AI systems. Batch and streaming patterns for different latency requirements.

Event-driven architectures - Asynchronous processing for inference workloads that do not require synchronous responses. Queue-based decoupling for resilience and cost management.

Why the Hard Part Is Not the Model

In 2015, researchers at Google published “Hidden Technical Debt in Machine Learning Systems” (Sculley et al.). The paper introduced a diagram that has since become foundational to ML engineering: a small box labeled “ML Code” surrounded by much larger boxes representing everything else - data collection, feature extraction, data verification, configuration, monitoring, serving infrastructure, and process management.

The paper’s finding was that ML code - the model itself - is a small fraction of the total code in a production ML system. The surrounding infrastructure dwarfs the model. This observation has become more true, not less, as the industry has shifted toward large foundation models accessed via APIs. The model code is now almost entirely abstracted away. The infrastructure complexity remains.

The hard part of building AI systems is not calling the model API. The hard part is operating the system that calls the model API: maintaining prompt quality across versions, detecting output degradation before users report it, managing costs as usage scales, keeping dependencies current, handling model provider incidents gracefully, and shipping changes to a production system that processes real requests.

Martin Kleppmann’s “Designing Data-Intensive Applications” (2017) characterizes production systems by three requirements: reliability (functioning correctly under adversity), scalability (handling growth in load), and maintainability (the ability for engineering teams to work on the system over time without increasing difficulty). AI systems are held to the same standard. Non-deterministic model behavior makes each of these harder, not easier.

What This Wiki Covers, and Why

This wiki covers all four layers because that is what building real AI solutions requires.

When you see an article about version control workflows or .gitignore configuration, it is not generic developer content placed here by accident. It is part of the AI delivery lifecycle. Reproducibility starts with disciplined version control. Teams that skip this step rediscover it when they cannot reproduce a result, cannot trace which prompt version caused a regression, or cannot deploy a rollback.

When you see articles about API design, caching strategies, or infrastructure patterns, those are the patterns you need when your AI system leaves the notebook. The article on RAG systems links to the articles on vector storage, embedding selection, and CI/CD because those topics are not separable in practice.

The content is organized to reflect the reality that AI engineering is a full-stack discipline, not a specialization limited to model selection and prompt writing.

The Gap This Wiki Fills

Most AI content is written from one of three positions: by data scientists without production engineering backgrounds, by marketers producing discovery content, or by researchers focused on model capability. Each produces useful content for its purpose. None produces the content a team needs to take an AI system to production and operate it reliably.

The result is a consistent pattern: a team builds an agent in ten minutes following a tutorial, and then spends three months discovering everything the tutorial omitted. Repository structure. Reproducible deployment. Configuration management across environments. Secrets handling. Prompt versioning. Regression testing. Cost attribution. Incident response.

These are not advanced topics that can be deferred until scale demands them. They are the foundations on which reliable systems are built. Robert Martin’s “Clean Architecture” (2017) argues that the goal of software architecture is to minimize the human resources required to build and maintain a system over time. That goal applies directly to AI systems, which tend toward complexity and operational surprise when their engineering foundations are weak.

How to Use This Wiki

The recommended path through this wiki follows the natural dependency order:

Start with your AI capability need. If you are building an agent, start with the agent guides. If you are building a RAG system, start there. These articles cover the AI-specific concerns first.

Follow the links to engineering patterns. Each capability article links to the patterns required to operationalize it. RAG links to vector store management, prompt versioning, and evaluation. Agents link to orchestration, tool design, and observability.

Then follow the links to foundations. The engineering pattern articles link to the software engineering foundations they depend on. Prompt versioning links to version control workflows. Deployment links to CI/CD configuration.

The wiki is structured so each article connects upward to the business problem it solves and downward to the implementation detail it depends on. A team starting with a specific problem can follow links to discover the full set of concerns that problem entails. A team building a systematic capability can read layer by layer.

The premise throughout is that the gap between prototype and production is an engineering problem, and engineering problems have solutions. This wiki is a map of those solutions, organized for the teams that need to apply them.


References

  • Sculley, D. et al. “Hidden Technical Debt in Machine Learning Systems.” Advances in Neural Information Processing Systems 28 (2015).
  • Kleppmann, M. Designing Data-Intensive Applications. O’Reilly Media, 2017.
  • Martin, R. Clean Architecture: A Craftsman’s Guide to Software Structure and Design. Prentice Hall, 2017.