Context
The platform must answer two related but distinct questions:
- Which experiment produced this model? — tracking hyperparameters, metrics, artifacts, the registered model version, and the stage (Staging → Production).
- Which data was this model trained on? — content-addressable versioning of the training dataset so a checkout-+-rerun reproduces the exact training set.
The classic options:
- MLflow + DVC — open-source 2-tool combo. MLflow tracks runs + models; DVC tracks data files via Git-friendly metadata.
- Weights & Biases — managed SaaS, polished UI, hosted artifact store.
- Neptune — similar shape to W&B, lighter integration scope.
- ClearML — open-source single-tool for experiments + data + serve.
- Custom —
picklefiles + a JSON log + Git LFS for data.
Decision
Adopt MLflow + DVC.
# src/run_experiments.py
import mlflow
import mlflow.sklearn
mlflow.set_tracking_uri("sqlite:///mlflow_data/mlflow.db")
mlflow.set_experiment("churn-prediction")
with mlflow.start_run():
mlflow.sklearn.autolog()
model.fit(X_train, y_train)
mlflow.sklearn.log_model(model, "model", registered_model_name="churn-predictor")
# Module 01 data versioning
dvc init
dvc remote add -d s3remote s3://predictflow-dvc-cache
dvc add data/customers.csv
git add data/customers.csv.dvc .dvc/config
dvc push
Tradeoffs we accept
| Lever | MLflow + DVC (chosen) | W&B | Neptune | ClearML | Custom |
|---|---|---|---|---|---|
| Cost (5 users) | $0 self-host | ~$50/user/mo | ~$30/user/mo | $0 self-host | $0 |
| UI polish | Functional | Best-in-class | Good | Good | None |
| Local-dev parity | Identical | Cloud-only | Cloud-only | Identical | N/A |
| Vendor lock-in | None | High | Medium | None | None |
| Data versioning | DVC (Git-native) | Artifacts (paid) | Artifacts (paid) | Datasets (built-in) | Git LFS |
| Model registry | MLflow Registry | W&B Registry | Model Registry | Models | None |
| Stage transitions | Built-in | Built-in | Built-in | Built-in | Build it |
| Tutorial reproducibility | Pip install + SQLite | Cloud account | Cloud account | Pip install | Build everything |
| Audit trail | Run history | Run history | Run history | Run history | Build it |
We optimize for zero-cost local-dev parity + no vendor lock-in. W&B / Neptune are better experiences but block tutorial reproducibility. ClearML is a real alternative — same shape as MLflow + DVC in one tool — chose against it because the MLflow + DVC pair is the de-facto industry default and learners are more likely to encounter it on the job.
Consequences (positive)
- A learner runs
mlflow server+dvc initand has the full tracking stack on their laptop in <10 minutes (Module 01 ships in this budget). - MLflow tracks autolog on scikit-learn / XGBoost — no decorators to scatter through training code.
- DVC's content-addressable hashing means
data/customers.csv.dvcpinned in Git is exactly the same dataset across all checkouts. - Model Registry stages (
Staging→Production) are real receipts a CI/CD pipeline can gate on (used in Module 03's GitHub Actions). - Both tools have an exit ramp — pickled models are framework-native; DVC files live in Git as plain text.
Consequences (negative)
- Two tools, two failure modes. MLflow's SQLite backend can corrupt on hard kills; DVC's S3 remote can silently 403. Mitigation: Module 01 runbook covers the common reset paths.
- No native UI for cross-tool browsing. A learner clicks the MLflow
UI to see runs; switches to
dvc dagfor data lineage. Custom-build a unified view is out of scope. - DVC requires a Git workflow. Teams used to
git lfswill have a small adjustment. Mitigation: Module 01 explicitly walks through thegit add data.csv.dvc+dvc pushmental model. - Hosted alternatives are simpler if cost isn't a constraint. A team with a $50/user/mo budget gets a better UX from W&B. The audit recommends evaluating that if the team grows past 10 active users.
Reversal plan
The MLflow client interface is a thin Python module; swap is bounded:
- Replace
mlflow.set_tracking_uri()calls with the new tool's client. - Translate
autolog()to the new tool's framework integration. - Migrate registered models — MLflow's REST API exports per-version metadata; W&B's Registry imports from MLflow.
- Migrate run history — optional; usually a forward cutover.
DVC swap path (to W&B Artifacts or LakeFS):
- Walk
*.dvcfiles; for each, fetch the artifact fromdvc remote. - Upload to the new system; record the new URI.
- Drop
.dvc/; commit aCutoverPR.
Estimated effort: 1-2 engineer-weeks for MLflow + DVC → W&B. Reversible.
References
src/run_experiments.pysrc/register_model.pysrc/load_model.pysrc/hyperparameter_tuning.pysrc/reproducible_training.py- ADR-001 (Feast — feature store and tracking are orthogonal concerns)
- ADR-004 (BentoML — serving uses MLflow Registry as model source)