DVC Migration
Crab Workflow intentionally keeps the familiar DVC shape: stages, deps, outs,
params, metrics, plots, experiments, queues, and repro-style execution. The
main difference is the storage model. Crab uses your Crab remote and object
store rather than a separate DVC remote.
For the automated migration command, see the full Migrating from DVC guide.
Command Mapping
| DVC command | Crab command |
|---|---|
dvc stage add | crab stage add |
dvc repro | crab repro or crab run |
dvc dag | crab workflow dag |
dvc status | crab workflow status |
dvc params show | crab params show |
dvc params diff | crab params diff |
dvc metrics show | crab metrics show |
dvc metrics diff | crab metrics diff |
dvc plots show | crab plots show |
dvc plots diff | crab plots diff |
dvc exp run | crab exp run |
dvc exp show | crab exp show |
dvc exp diff | crab exp diff |
dvc exp apply | crab exp apply |
dvc exp push | crab exp push |
dvc exp pull | crab exp pull |
dvc queue start | crab queue start |
Migration Checklist
-
Commit or stash current DVC work.
-
Initialize Crab and enable Workflow:
crab init crab://my-bucket/ml-project crab config set workflow.enabled true -
Convert
dvc.yaml:crab migrate from-dvc -
Validate:
crab run --validate crab workflow dag -
Run once to create
crab.lock:crab run -
Push git state and stage cache:
git add crab.yaml crab.lock params.yaml git commit -m "migrate workflow to crab" crab push crab workflow push-cache --all -
Remove DVC files after the Crab workflow is verified.
YAML Compatibility
Most DVC fields map directly:
| DVC field | Crab support |
|---|---|
cmd | Supported as string or command list. |
deps | Supported for local paths and supported external deps. |
outs | Supported for path entries and DVC-style path-key maps. |
params | Supported with dotted keys. |
metrics | Supported at top level and stage level. |
plots | Supported at top level and stage level. |
wdir | Supported. |
frozen | Supported with crab freeze and crab unfreeze. |
foreach | Supported. |
matrix | Supported. |
vars and ${...} | Supported for templating. |
always_changed | Supported as DVC-compatible spelling for nondeterministic stages. |
Behavior Differences
- Crab Workflow must be enabled with
crab config set workflow.enabled true. - Stage cache is shared through the Crab remote with
crab run --cache-pushorcrab workflow push-cache --all. - Large files should be tracked with Crab's file layer, not a DVC remote.
- Local experiment metadata lives under
.crab/workflow/exp/. - Run journals live under
.crab/workflow/runs/. - Split workflow files can use
workflow.discover recursiveandworkflow.lockfile split.
Hydra
Enable Hydra-style composition before migrating Hydra-heavy experiments:
crab config set hydra.enabled true
crab config set hydra.config_dir conf
crab config set hydra.config_name config.yamlThen run experiments with group and scalar overrides:
crab exp run -S train/model=efficientnet -S train.optimizer.lr=0.0005See Hydra Workflows for the recommended layout and precedence model.
CI Migration
Replace DVC cache commands with Crab cache commands:
crab run --validate
crab run --pull --allow-missing --cache-push --jsonl
crab workflow push-cache --all --jsonFor strict read-only verification:
crab run --cache-only --jsonFinal Cutover
Only remove DVC files after a clean Crab run, metrics check, and remote cache push:
crab run
crab metrics show
crab plots show
crab workflow push-cache --allThen remove DVC-specific files and update CI scripts. Keep the DVC branch around until the first Crab-backed release is reproducible from a fresh clone.