Skip to content

Model Drift Explained: What It Is and How It Works

Model drift is when a production model's accuracy degrades because real-world data has shifted from the training data. Data drift is when input feature distributions change. Concept drift is when the relationship between inputs and outputs changes. Both require monitoring and retraining to fix — neither resolves itself.

Data Drift vs Concept Drift

Drift types and what changes

DATA DRIFT (covariate shift):
  Training: P(X)  = age:35±5, income:60k±15k
  Production: P(X) = age:28±4, income:45k±12k  ← shifted
  Effect: model sees ages/incomes it never trained on

CONCEPT DRIFT:
  Training: P(Y|X) = high_income → low churn risk
  Production: P(Y|X) = high_income → high churn risk ← shifted
  Effect: the world changed; features same, labels differ

PREDICTION DRIFT  (early warning proxy):
  Training avg score: 0.23
  Production avg score: 0.61  ← something changed

Types of Drift

Data Drift

Feature distributions shift

The statistical distribution of input features changes. User demographics shift, market conditions change, or data collection processes change. The model still works — but on inputs it was not trained for.

Concept Drift

Input-output relationship changes

The relationship between features and labels changes. A fraud model trained pre-pandemic fails post-pandemic even if transaction features look the same — behavior patterns changed.

Label Drift

Target distribution changes

The proportion of target classes shifts — e.g. churn rate drops from 15% to 5% after a product improvement. A model calibrated for 15% churn becomes poorly calibrated.

Detection Methods

TestBest forThreshold
Population Stability Index (PSI)Categorical + binary featuresPSI > 0.2 = drift
Kolmogorov-Smirnov (KS) testContinuous feature distributionsp-value < 0.05
Chi-square testCategorical feature frequencyp-value < 0.05
Jensen-Shannon divergencePrediction distribution shiftJS > 0.1 = alert
CUSUM / page-HinkleyGradual concept drift detectionCustom threshold

Detecting Drift with Evidently AI

Evidently AI runs statistical tests across all features and produces a drift report. Integrate into an Airflow DAG to run nightly.

from evidently import Report
from evidently.metrics import (
    DatasetDriftMetric,
    DataDriftTable,
)

report = Report(metrics=[
    DatasetDriftMetric(),    # overall drift Y/N
    DataDriftTable(),         # per-feature drift scores
])

report.run(
    reference_data=training_df,   # from training
    current_data=production_df,   # last 7 days
)

report.save_html('drift_report.html')
result = report.as_dict()
print(f'Drift detected: {result["metrics"][0]["result"]["dataset_drift"]}')

Common Mistakes

Only monitoring infrastructure metrics

A model can serve 99.9% uptime while its predictions are completely wrong. Monitoring latency and error rates tells you nothing about model accuracy. Add prediction distribution and feature drift metrics.

Alerting on every individual feature drift

With 50 features, a 5% significance threshold means ~2.5 false-positive alerts per monitoring run by chance alone. Use dataset-level drift metrics (PSI across all features) and require a minimum share of drifted features before alerting.

Waiting for labels before detecting drift

Production labels are often delayed by days or weeks. Detect data drift (input distribution shifts) and prediction drift (output distribution shifts) as early-warning proxies — don't wait for labeled ground truth.

Fixed retraining schedules instead of drift-triggered

Retraining every Monday wastes compute when data is stable and misses rapid drift between runs. Trigger retraining on drift signals. If drift is rare, scheduled retraining wastes money; if it's frequent, fixed schedules miss it.

FAQ

What is model drift?
Model drift is when a production model's accuracy degrades because real-world data has shifted from training data. It's the primary reason models need ongoing monitoring and retraining.
What is the difference between data drift and concept drift?
Data drift: input feature distributions change (users skew younger, income changes). Concept drift: the relationship between inputs and outputs changes (same features, different correct labels). Both degrade accuracy but require different fixes.
How do you detect model drift?
Use PSI for categorical features, KS test for continuous features, Chi-square for categorical frequency. Tools: Evidently AI, Whylogs, NannyML. Monitor prediction distribution as an early proxy when labels are delayed.
How do you fix model drift?
Retrain on recent data. For data drift, include new distribution in training window. For concept drift, re-examine feature engineering and labeling. Use drift-triggered retraining pipelines for automation.

Related

Press Cmd+K to open