Skip to content

How to Set Up an MLOps Pipeline

A production MLOps pipeline needs 5 layers: experiment tracking → data versioning → model registry → CI/CD deployment → drift monitoring. The minimum viable stack — MLflow + DVC + GitHub Actions + Evidently AI — is open-source and deployable in a day.

Steps

1

Set Up Experiment Tracking with MLflow

Every training run should log hyperparameters, metrics, and artifacts. MLflow gives you a UI to compare runs and pick the best model.

# Install and start MLflow
pip install mlflow
mlflow ui  # → http://localhost:5000

# Instrument your training script
import mlflow

with mlflow.start_run(run_name='xgboost-v1'):
    mlflow.log_params(params)
    mlflow.log_metrics({'auc': auc, 'f1': f1})
    mlflow.sklearn.log_model(model, 'model')
2

Version Your Data with DVC

Every model must be reproducible from a specific data version. DVC tracks large files in Git metadata and stores them in S3/GCS.

pip install dvc dvc-s3
dvc init

# Add data to DVC tracking
dvc add data/train.parquet
git add data/train.parquet.dvc .gitignore
git commit -m 'track training data v1'

# Configure remote and push
dvc remote add -d myremote s3://my-bucket/dvc
dvc push
3

Register and Promote Models

The model registry is the single source of truth for which model is in production. Promote to Production only after passing an evaluation gate.

from mlflow.tracking import MlflowClient

client = MlflowClient()

# Register model from best run
mlflow.register_model(
    f'runs:/{run_id}/model',
    'churn-classifier'
)

# Promote to Production after eval gate passes
if new_auc > baseline_auc:
    client.transition_model_version_stage(
        name='churn-classifier', version=latest,
        stage='Production'
    )
4

Deploy with CI/CD

A GitHub Actions workflow triggers on model registry promotion, builds a container with BentoML, and deploys to Kubernetes.

# .github/workflows/deploy-model.yml
on:
  workflow_dispatch:
    inputs:
      model_version:
        required: true

jobs:
  deploy:
    steps:
      - name: Build BentoML container
        run: bentoml build && bentoml containerize
      - name: Deploy to Kubernetes
        run: kubectl set image deployment/model-api ...
5

Monitor Drift and Trigger Retraining

Run Evidently AI on a schedule to compare production feature distributions against the training reference. Alert on drift and trigger automated retraining.

from evidently import Report
from evidently.metrics import DatasetDriftMetric

report = Report(metrics=[DatasetDriftMetric()])
report.run(
    reference_data=training_df,
    current_data=last_7_days_df
)

if report.as_dict()['metrics'][0]['result']['dataset_drift']:
    retrain_pipeline.trigger()

Common Issues

MLflow runs not reproducible

Log ALL hyperparameters including random seeds. Use mlflow.autolog() for sklearn/XGBoost to capture everything automatically. Pin library versions in requirements.txt and log as an artifact.

CI/CD pipeline deploys a worse model

Add an evaluation gate: new model must outperform the current production model on a held-out test set before the deployment step runs. Block the pipeline if the gate fails.

Drift alerts too noisy

Don't alert on every feature drift individually. Use a dataset-level drift threshold (e.g. >30% of features drifted) and a minimum sample size requirement to avoid false positives from small traffic windows.

FAQ

What is the simplest MLOps stack to start with?
MLflow (tracking + registry) + DVC (data versioning) + GitHub Actions (CI/CD) + Docker (packaging). Add Evidently AI for drift monitoring and Feast for a feature store as you scale.
How do I version my training data in MLOps?
Use DVC. It tracks large data files in Git metadata while storing actual data in S3/GCS. Run `dvc add data/train.csv` to track, then `dvc push` to sync. Every model experiment links to the exact dataset version.
How do I automatically retrain a model when drift is detected?
Run Evidently AI on a schedule, compare production features against training reference data, and publish a trigger event to your orchestrator (Airflow/Kubeflow/Prefect) when drift exceeds a threshold. The orchestrator runs training, evaluation gate, and deployment.

Related

Press Cmd+K to open