MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It covers experiment tracking (logging parameters, metrics, and artifacts during training), a model registry (versioning and staging trained models), model deployment (serving models as REST endpoints), and project packaging (reproducible ML workflows). For AI projects, MLflow provides the operational backbone that takes ML from notebook experiments to production systems with governance and reproducibility.

Official documentation: https://mlflow.org/docs/latest/

Core Components

MLflow Tracking - Logs experiment metadata during model training. You log parameters (hyperparameters, configuration), metrics (accuracy, loss, latency), and artifacts (model weights, charts, data samples) with a few API calls:

import mlflow

with mlflow.start_run():
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("epochs", 50)
    mlflow.log_metric("accuracy", 0.94)
    mlflow.log_metric("f1_score", 0.91)
    mlflow.sklearn.log_model(model, "model")

The tracking UI displays experiments as sortable, filterable tables, making it straightforward to compare runs and identify the best-performing configurations. This is essential when training dozens or hundreds of model variants.

Model Registry - A centralized model store with versioning, stage transitions, and annotations. Models move through stages: None, Staging, Production, Archived. Stage transitions can require approval, providing governance for model promotion. Each model version links back to its training run, preserving full lineage from data to deployed model.

MLflow Models - A standard format for packaging ML models. The MLmodel format includes the model weights, a conda/pip environment specification, and a prediction interface. This format enables deployment to multiple targets: REST API (MLflow serving), SageMaker, Azure ML, Docker containers, and Spark UDFs.

MLflow Projects - A convention for packaging ML code as reproducible, reusable projects. A project includes the code, dependencies, and entry points. Anyone can reproduce a training run by running the project with the same parameters.

LLM Tracking

MLflow has extended its tracking capabilities for LLM applications. The MLflow LLM tracking features include:

Prompt logging - Log prompts and responses for each LLM call, enabling debugging and quality analysis.

LLM evaluation - Built-in evaluation metrics for LLM outputs: toxicity, relevance, faithfulness, and custom LLM-as-judge metrics. Run evaluations against datasets and compare across model versions or prompt configurations.

Tracing - Distributed tracing for multi-step LLM applications (RAG pipelines, agent workflows). Each step is logged with inputs, outputs, latency, and token usage. Integrates with LangChain, LlamaIndex, and OpenAI.

Deployment Patterns

MLflow Serving - Deploy a model as a REST API with a single command. MLflow serves the model using the standard inference interface, handling request parsing and response formatting. Suitable for development and moderate-traffic production.

SageMaker deployment - MLflow can deploy models directly to SageMaker endpoints. The mlflow sagemaker CLI handles container building, model uploading, and endpoint creation. This combines MLflow’s tracking and registry with SageMaker’s production serving infrastructure.

Batch inference - Use the model’s predict function in Spark (via mlflow.pyfunc.spark_udf) for large-scale batch predictions. This pattern processes millions of records by distributing inference across a Spark cluster.

MLflow on AWS

MLflow can be self-hosted on EC2, ECS, or EKS with S3 as the artifact store and RDS (PostgreSQL or MySQL) as the tracking backend. For managed options, SageMaker includes MLflow integration through SageMaker with MLflow, providing a managed tracking server with SageMaker Studio integration.

Comparison with Weights and Biases

MLflow and Weights and Biases (W&B) both provide experiment tracking and model management. MLflow is open-source and self-hostable with no licensing cost. W&B offers a more polished UI, better real-time collaboration features, and hosted infrastructure, but requires a subscription for team use. Choose MLflow when self-hosting is a requirement or budget is constrained. Choose W&B when team collaboration and UI quality are priorities.

Pricing

MLflow is open-source (Apache 2.0 license) and free. Costs are infrastructure only: compute for the tracking server, storage for artifacts (S3), and a database for metadata (RDS). Databricks offers a managed MLflow service included in the Databricks platform.