A model that performed well at deployment will eventually degrade. The world changes, user behavior shifts, and the data your model sees in production drifts away from what it was trained on. Drift detection is the practice of monitoring for these changes and responding before they cause business impact.

Types of Drift

Data drift (covariate shift) occurs when the statistical distribution of input features changes. A recommendation model trained on summer browsing patterns sees different input distributions in winter. The model itself has not changed, but its inputs have moved outside the distribution it learned from.

Concept drift occurs when the relationship between inputs and outputs changes. Customer preferences shift, fraud techniques evolve, or regulatory changes alter what constitutes a correct prediction. The same inputs now map to different correct outputs.

Label drift occurs when the distribution of target values changes. A support ticket classifier trained when most tickets were billing-related may underperform when product issues become dominant.

Prediction drift occurs when the distribution of model outputs changes, even if individual predictions might still be correct. A sudden shift in prediction distribution often signals an upstream data or model issue.

Detection Methods

Statistical tests compare distributions between a reference dataset (typically training or validation data) and recent production data. Common approaches include:

  • Population Stability Index (PSI) for measuring overall distribution shift
  • Kolmogorov-Smirnov test for continuous features
  • Chi-squared test for categorical features
  • Jensen-Shannon divergence for probability distributions

Performance monitoring tracks model accuracy against ground truth labels. This is the most reliable signal but requires labels, which often arrive with delays. Track metrics on a rolling window and alert when they drop below thresholds.

Prediction distribution monitoring tracks the distribution of model outputs over time. Changes in prediction confidence, class balance, or score distribution can indicate problems before ground truth is available.

Feature importance monitoring tracks whether the model’s reliance on features changes over time. A feature that becomes highly influential when it was not important during training may indicate adversarial manipulation or data pipeline issues.

Setting Up Drift Monitoring

Choose your reference dataset. The training dataset or a recent validation dataset serves as the baseline. Update the reference periodically as you retrain models.

Select features to monitor. You cannot monitor every feature with equal intensity. Prioritize features with high model importance, features known to be volatile, and features derived from external sources.

Define detection windows. Compare production data in windows (hourly, daily, weekly) against the reference. Window size trades off between detection speed and statistical power. Short windows catch sudden shifts quickly but produce more false alarms.

Set alert thresholds. Not all drift is harmful. Small distribution shifts are normal. Set thresholds based on historical variation and validated impact on model performance. Start with loose thresholds and tighten as you learn what matters.

Responding to Drift

Triage. Not all drift requires action. Determine whether the drift is transient (a one-time event) or sustained, and whether it affects model performance. Monitor performance metrics alongside drift metrics to assess impact.

Short-term mitigations. For acute drift, consider falling back to a rule-based system, switching to a previous model version, or adjusting prediction thresholds. These buy time while you prepare a more permanent solution.

Retraining. For sustained drift, retrain the model on recent data. If you have a continuous training pipeline, drift detection can automatically trigger retraining. Validate that the retrained model actually performs better on recent data before deploying.

Architecture changes. Persistent drift in the same direction may indicate that your model architecture or feature set is no longer adequate. This requires deeper investigation and potentially redesigning the model.

Tooling

SageMaker Model Monitor, Evidently AI, WhyLabs, and Arize provide drift detection out of the box. For custom implementations, build on statistical libraries (scipy, alibi-detect) and your existing monitoring infrastructure. The key is integrating drift alerts into your existing on-call and incident response workflows rather than creating a separate monitoring silo.