Skip to content
Back to PredictFlow Feature Store

Tracking + data versioning is MLflow + DVC, not W&B / Neptune / ClearML

✓ AcceptedPredictFlow Feature Store01 — Foundation: ML Experimentation & Tracking
By AI-DE Engineering Team·Stakeholders: ML engineer, data scientist, MLOps reviewer

Context

The platform must answer two related but distinct questions:

  1. Which experiment produced this model? — tracking hyperparameters, metrics, artifacts, the registered model version, and the stage (Staging → Production).
  2. Which data was this model trained on? — content-addressable versioning of the training dataset so a checkout-+-rerun reproduces the exact training set.

The classic options:

  1. MLflow + DVC — open-source 2-tool combo. MLflow tracks runs + models; DVC tracks data files via Git-friendly metadata.
  2. Weights & Biases — managed SaaS, polished UI, hosted artifact store.
  3. Neptune — similar shape to W&B, lighter integration scope.
  4. ClearML — open-source single-tool for experiments + data + serve.
  5. Custompickle files + a JSON log + Git LFS for data.

Decision

Adopt MLflow + DVC.

# src/run_experiments.py
import mlflow
import mlflow.sklearn

mlflow.set_tracking_uri("sqlite:///mlflow_data/mlflow.db")
mlflow.set_experiment("churn-prediction")

with mlflow.start_run():
    mlflow.sklearn.autolog()
    model.fit(X_train, y_train)
    mlflow.sklearn.log_model(model, "model", registered_model_name="churn-predictor")
# Module 01 data versioning
dvc init
dvc remote add -d s3remote s3://predictflow-dvc-cache
dvc add data/customers.csv
git add data/customers.csv.dvc .dvc/config
dvc push

Tradeoffs we accept

LeverMLflow + DVC (chosen)W&BNeptuneClearMLCustom
Cost (5 users)$0 self-host~$50/user/mo~$30/user/mo$0 self-host$0
UI polishFunctionalBest-in-classGoodGoodNone
Local-dev parityIdenticalCloud-onlyCloud-onlyIdenticalN/A
Vendor lock-inNoneHighMediumNoneNone
Data versioningDVC (Git-native)Artifacts (paid)Artifacts (paid)Datasets (built-in)Git LFS
Model registryMLflow RegistryW&B RegistryModel RegistryModelsNone
Stage transitionsBuilt-inBuilt-inBuilt-inBuilt-inBuild it
Tutorial reproducibilityPip install + SQLiteCloud accountCloud accountPip installBuild everything
Audit trailRun historyRun historyRun historyRun historyBuild it

We optimize for zero-cost local-dev parity + no vendor lock-in. W&B / Neptune are better experiences but block tutorial reproducibility. ClearML is a real alternative — same shape as MLflow + DVC in one tool — chose against it because the MLflow + DVC pair is the de-facto industry default and learners are more likely to encounter it on the job.

Consequences (positive)

  • A learner runs mlflow server + dvc init and has the full tracking stack on their laptop in <10 minutes (Module 01 ships in this budget).
  • MLflow tracks autolog on scikit-learn / XGBoost — no decorators to scatter through training code.
  • DVC's content-addressable hashing means data/customers.csv.dvc pinned in Git is exactly the same dataset across all checkouts.
  • Model Registry stages (StagingProduction) are real receipts a CI/CD pipeline can gate on (used in Module 03's GitHub Actions).
  • Both tools have an exit ramp — pickled models are framework-native; DVC files live in Git as plain text.

Consequences (negative)

  • Two tools, two failure modes. MLflow's SQLite backend can corrupt on hard kills; DVC's S3 remote can silently 403. Mitigation: Module 01 runbook covers the common reset paths.
  • No native UI for cross-tool browsing. A learner clicks the MLflow UI to see runs; switches to dvc dag for data lineage. Custom-build a unified view is out of scope.
  • DVC requires a Git workflow. Teams used to git lfs will have a small adjustment. Mitigation: Module 01 explicitly walks through the git add data.csv.dvc + dvc push mental model.
  • Hosted alternatives are simpler if cost isn't a constraint. A team with a $50/user/mo budget gets a better UX from W&B. The audit recommends evaluating that if the team grows past 10 active users.

Reversal plan

The MLflow client interface is a thin Python module; swap is bounded:

  1. Replace mlflow.set_tracking_uri() calls with the new tool's client.
  2. Translate autolog() to the new tool's framework integration.
  3. Migrate registered models — MLflow's REST API exports per-version metadata; W&B's Registry imports from MLflow.
  4. Migrate run history — optional; usually a forward cutover.

DVC swap path (to W&B Artifacts or LakeFS):

  1. Walk *.dvc files; for each, fetch the artifact from dvc remote.
  2. Upload to the new system; record the new URI.
  3. Drop .dvc/; commit a Cutover PR.

Estimated effort: 1-2 engineer-weeks for MLflow + DVC → W&B. Reversible.

References

  • src/run_experiments.py
  • src/register_model.py
  • src/load_model.py
  • src/hyperparameter_tuning.py
  • src/reproducible_training.py
  • ADR-001 (Feast — feature store and tracking are orthogonal concerns)
  • ADR-004 (BentoML — serving uses MLflow Registry as model source)
Built into the project

This decision shipped as part of PredictFlow Feature Store — see the full architecture, starter kit, and 4 more ADRs.

Open project →
Press Cmd+K to open