API Versioning Strategies for AI Services

How to version AI APIs as models evolve: URL path versioning, header versioning, model version pinning, backward compatibility, and deprecation policies.

Added 28 Mar 2026 4 min read Updated 30 May 2026

#api-versioning #api-design #backward-compatibility #deprecation #ai-engineering

Learn this your way

Read Guided course

AI APIs change more frequently than traditional APIs. Model updates alter output quality, new features add response fields, prompt templates evolve, and response formats are refined. Without a versioning strategy, these changes break consumers. With a poor versioning strategy, you accumulate maintenance debt supporting too many versions. This guide covers practical versioning approaches for AI services.

Why AI APIs Need Explicit Versioning

Traditional API versioning handles structural changes: new fields, removed endpoints, changed data types. AI APIs add a second dimension: behavioural changes. The same API contract with a new model version may return different predictions, different confidence scores, or different text. Consumers depend on specific behaviours, not just schemas.

Examples of AI API changes that require versioning:

Model update - A new model version produces better outputs overall but changes behaviour for specific input categories
Response format change - Adding a confidence field or restructuring the metadata object
New capability - Adding streaming support or new model parameters
Prompt template change - Modifying the system prompt changes output style and content
Embedding model change - Switching embedding models makes old vectors incompatible

Versioning Approaches

URL Path Versioning

The most common and explicit approach. The version is part of the URL:

POST /v1/predict
POST /v2/predict

Advantages:

Unambiguous - version is visible in every request
Easy to route different versions to different backends
Simple to monitor and rate-limit per version
Cache-friendly (different URLs are different cache keys)

Disadvantages:

URL changes require client updates
Can lead to code duplication between versions

This is the recommended approach for most AI APIs.

Header Versioning

The version is specified in a request header:

POST /predict
X-API-Version: 2

Advantages:

URL stays stable
Cleaner URL structure

Disadvantages:

Version is invisible in logs, bookmarks, and curl examples unless you inspect headers
Harder to route at the load balancer level

Model Version Pinning

For AI APIs specifically, allow consumers to pin to a model version independently of the API version:

POST /v1/predict
X-Model-Version: gpt-4-2025-12-01

This separates two concerns:

API version - Controls the request/response schema
Model version - Controls inference behaviour

Consumers that need deterministic output pin to a specific model version. Consumers that want the latest quality improvements use the default (latest) model version.

Managing Breaking vs Non-Breaking Changes

Non-Breaking Changes (No Version Bump)

Adding an optional request field with a default value
Adding a new field to the response
Improving model quality without changing the response schema
Adding a new endpoint

Breaking Changes (Require Version Bump)

Removing or renaming a response field
Changing a field’s data type
Changing default model behaviour in ways consumers depend on
Removing an endpoint
Changing authentication mechanism

Gray Area: Model Behaviour Changes

Model updates are the hardest to classify. A model that produces “better” outputs overall may produce worse outputs for a specific consumer’s use case. Strategies:

Default to latest - New model versions are the default. Consumers who need stability pin to a specific version.
Notify and migrate - Announce model version changes in advance. Provide a migration period where both versions are available.
Shadow testing - Run new model versions in shadow mode, comparing outputs against the current version before switching.

Deprecation Policy

Define a clear deprecation lifecycle:

Active - Fully supported, receiving updates
Deprecated - Still functional but no new features. Deprecation header included in responses: Sunset: Sat, 01 Nov 2026 00:00:00 GMT
Retired - Returns 410 Gone with a message pointing to the replacement version

Timelines:

Announce deprecation at least 90 days before retirement
Monitor usage of deprecated versions and proactively reach out to remaining consumers
Provide migration guides with specific changes needed

Implementation Pattern

python

from fastapi import FastAPI, Header, HTTPException

app = FastAPI()

# Version router
v1_app = FastAPI()
v2_app = FastAPI()

app.mount("/v1", v1_app)
app.mount("/v2", v2_app)

@v1_app.post("/predict")
async def predict_v1(request: PredictRequestV1):
    # V1 behaviour: returns flat prediction
    return {"prediction": model.predict(request)}

@v2_app.post("/predict")
async def predict_v2(
    request: PredictRequestV2,
    x_model_version: str = Header(default="latest"),
):
    # V2 behaviour: returns prediction with metadata
    model = get_model(x_model_version)
    result = model.predict(request)
    return {
        "prediction": result.value,
        "confidence": result.confidence,
        "model_version": model.version,
        "metadata": result.metadata,
    }

Keep version-specific code isolated. Shared logic lives in a common layer. Version-specific serialisation and behaviour live in version-specific modules. This prevents changes to one version from accidentally affecting another.

Open source projects

Freelancer Templates Contracts, proposals, SOWs

Freelancer Automation Workflow recipes, AI playbooks

Work with Linda

Workshop Series €2,000/mo x 3

1:1 Consulting 60 min session