ADR-003: Tracking + data versioning is MLflow + DVC, not W&B / Neptune / ClearML | PredictFlow Feature Store

Context

The platform must answer two related but distinct questions:

Which experiment produced this model? — tracking hyperparameters, metrics, artifacts, the registered model version, and the stage (Staging → Production).
Which data was this model trained on? — content-addressable versioning of the training dataset so a checkout-+-rerun reproduces the exact training set.

The classic options:

MLflow + DVC — open-source 2-tool combo. MLflow tracks runs + models; DVC tracks data files via Git-friendly metadata.
Weights & Biases — managed SaaS, polished UI, hosted artifact store.
Neptune — similar shape to W&B, lighter integration scope.
ClearML — open-source single-tool for experiments + data + serve.
Custom — pickle files + a JSON log + Git LFS for data.

Decision

Adopt MLflow + DVC.

# src/run_experiments.py
import mlflow
import mlflow.sklearn

mlflow.set_tracking_uri("sqlite:///mlflow_data/mlflow.db")
mlflow.set_experiment("churn-prediction")

with mlflow.start_run():
    mlflow.sklearn.autolog()
    model.fit(X_train, y_train)
    mlflow.sklearn.log_model(model, "model", registered_model_name="churn-predictor")

# Module 01 data versioning
dvc init
dvc remote add -d s3remote s3://predictflow-dvc-cache
dvc add data/customers.csv
git add data/customers.csv.dvc .dvc/config
dvc push

Tradeoffs we accept

Lever	MLflow + DVC (chosen)	W&B	Neptune	ClearML	Custom
Cost (5 users)	$0 self-host	~$50/user/mo	~$30/user/mo	$0 self-host	$0
UI polish	Functional	Best-in-class	Good	Good	None
Local-dev parity	Identical	Cloud-only	Cloud-only	Identical	N/A
Vendor lock-in	None	High	Medium	None	None
Data versioning	DVC (Git-native)	Artifacts (paid)	Artifacts (paid)	Datasets (built-in)	Git LFS
Model registry	MLflow Registry	W&B Registry	Model Registry	Models	None
Stage transitions	Built-in	Built-in	Built-in	Built-in	Build it
Tutorial reproducibility	Pip install + SQLite	Cloud account	Cloud account	Pip install	Build everything
Audit trail	Run history	Run history	Run history	Run history	Build it

We optimize for zero-cost local-dev parity + no vendor lock-in. W&B / Neptune are better experiences but block tutorial reproducibility. ClearML is a real alternative — same shape as MLflow + DVC in one tool — chose against it because the MLflow + DVC pair is the de-facto industry default and learners are more likely to encounter it on the job.

Consequences (positive)

A learner runs mlflow server + dvc init and has the full tracking stack on their laptop in <10 minutes (Module 01 ships in this budget).
MLflow tracks autolog on scikit-learn / XGBoost — no decorators to scatter through training code.
DVC's content-addressable hashing means data/customers.csv.dvc pinned in Git is exactly the same dataset across all checkouts.
Model Registry stages (Staging → Production) are real receipts a CI/CD pipeline can gate on (used in Module 03's GitHub Actions).
Both tools have an exit ramp — pickled models are framework-native; DVC files live in Git as plain text.

Consequences (negative)

Two tools, two failure modes. MLflow's SQLite backend can corrupt on hard kills; DVC's S3 remote can silently 403. Mitigation: Module 01 runbook covers the common reset paths.
No native UI for cross-tool browsing. A learner clicks the MLflow UI to see runs; switches to dvc dag for data lineage. Custom-build a unified view is out of scope.
DVC requires a Git workflow. Teams used to git lfs will have a small adjustment. Mitigation: Module 01 explicitly walks through the git add data.csv.dvc + dvc push mental model.
Hosted alternatives are simpler if cost isn't a constraint. A team with a $50/user/mo budget gets a better UX from W&B. The audit recommends evaluating that if the team grows past 10 active users.

Reversal plan

The MLflow client interface is a thin Python module; swap is bounded:

Replace mlflow.set_tracking_uri() calls with the new tool's client.
Translate autolog() to the new tool's framework integration.
Migrate registered models — MLflow's REST API exports per-version metadata; W&B's Registry imports from MLflow.
Migrate run history — optional; usually a forward cutover.

DVC swap path (to W&B Artifacts or LakeFS):

Walk *.dvc files; for each, fetch the artifact from dvc remote.
Upload to the new system; record the new URI.
Drop .dvc/; commit a Cutover PR.

Estimated effort: 1-2 engineer-weeks for MLflow + DVC → W&B. Reversible.

References

src/run_experiments.py
src/register_model.py
src/load_model.py
src/hyperparameter_tuning.py
src/reproducible_training.py
ADR-001 (Feast — feature store and tracking are orthogonal concerns)
ADR-004 (BentoML — serving uses MLflow Registry as model source)