Redistributing Data Across Stripes
Crab stores file content as content-addressed xorbs. The size and grouping of these xorbs affects both storage cost and access performance. Restriping rewrites xorbs to match a target size profile optimized for your workload — larger xorbs for ML models (fewer objects, lower overhead), smaller xorbs for code (faster random access).
This is different from crab repack, which consolidates git pack files. Restripe operates on the content-addressed xorb layer.
Built-in Profiles
Three profiles cover common workloads:
| Profile | Target xorb size | Max xorbs/file | Group by | Best for |
|---|---|---|---|---|
ml | 256 MiB | 4 | File | Large model weights, safetensors |
dataset | 64 MiB | unlimited | Directory | Training datasets, parquet files |
code | 16 MiB | unlimited | Hash | Source code, configs, small assets |
Auto-inference
When you omit --profile, Crab scans the file-index and selects based on median file size:
- p50 > 100 MiB →
ml - p50 ≥ 1 MiB →
dataset - Otherwise →
code
Usage
Dry run (estimate without writing)
crab restripe --profile ml --dry-runReports source count, estimated destination count, bytes to rewrite, wall-clock estimate, and API cost estimate. Only HEAD and list operations are used.
Apply the restripe
crab restripe --profile ml --applyThe operation is:
- Online — doesn't hold the repo-wide push lock
- Resumable — progress tracked in a SQLite journal
- Crash-safe — SIGINT/SIGTERM finishes the current xorb and exits cleanly
Resume an interrupted run
crab restripe --resumeAbort and clean up
crab restripe --abort
crab gc # reclaim orphan xorbs from the aborted runCustom Profiles
Define your own in .crab/config.toml:
[restripe.profiles.my-profile]
target_xorb_bytes = 134217728 # 128 MiB
max_xorbs_per_file = 8
group_by = "file"
compression = "zstd:5"Constraints: target_xorb_bytes must be between 4 MiB and 2 GiB. Profile names must match [a-z][a-z0-9-]{0,30}.
Tier-Aware Restriping
If source xorbs are in archive storage classes, Crab automatically restores them before processing:
# Skip archived xorbs
crab restripe --profile ml --apply --include-cold=false
# Use bulk restore (cheapest, slowest)
crab restripe --profile ml --apply --restore-tier=bulk
# Write output to a specific storage class
crab restripe --profile ml --apply --output-class=STANDARD_IAConcurrent Push Safety
Restripe runs safely alongside normal pushes. At the end of the run, a reconciliation step:
- Scopes changes to the pre-run xorb snapshot only.
- Leaves new xorbs from concurrent pushes untouched.
- Uses idempotent ref CAS updates.
Old source xorbs become orphans and are reclaimed by the next crab gc.
Concurrency Constraints
| Combination | Result |
|---|---|
| Two restripe runs | Second fails with RestripeAlreadyInProgress [E0332] |
| GC + restripe | Fails with ConcurrentMaintenance [E0333] |
| Push + restripe | Safe (reconciliation handles it) |
CLI Reference
For complete command syntax and all available flags, see the crab restripe reference.