Local Cache
Every time Crab downloads chunks from cloud storage (during hydration or fetch), it stores them in a local cache. The next time you need those chunks — re-hydrating the same file, switching branches, or a teammate hydrating shared content on a shared machine — the data is served from local disk instead of the network.
How the Cache Works
The cache is a content-addressed store — chunks are identified by their blake3 hash. This means:
- Cross-file sharing — If two files share chunks (common in dataset versions), the cache serves both from a single copy.
- Cross-branch sharing — Switching branches and re-hydrating reuses cached chunks from the previous branch.
- Idempotent — Fetching the same chunk twice is a no-op; it's already cached.
Cache Location
| Priority | Source | Default Path |
|---|---|---|
| 1 | $CRAB_CACHE_DIR environment variable | Custom path |
| 2 | .crab/config.toml cache.dir setting | Custom path |
| 3 | Platform default | ~/.cache/crab/ |
To use a fast NVMe drive for the cache:
export CRAB_CACHE_DIR=/mnt/nvme/crab-cacheCache Size Management
The cache grows as you hydrate files. You can set a maximum size:
# .crab/config.toml
[cache]
max_size = "50GB"When the cache exceeds this limit, least-recently-used chunks are evicted. Eviction is lazy — it happens during the next cache write, not in the background.
Inspecting the Cache
crab cache statsShows total size, number of objects, hit rate, and cache directory path.
Cleaning the Cache
# Remove everything (re-download on next hydrate)
crab cache clean
# Remove only unreferenced objects
crab prunecrab prune is usually preferred — it keeps chunks that are still referenced by your current working tree, only removing orphaned data.
Pre-warming the Cache
If you know you'll need certain files later (going offline, preparing for a demo), pre-warm the cache:
crab fetch --include '*.safetensors'This downloads chunks without hydrating files — the data sits in cache ready for instant hydration later.
When to Clean vs. Prune
| Situation | Action |
|---|---|
| Running low on disk space | crab prune (keeps useful data) |
| Switching to a different project | crab cache clean (fresh start) |
| Suspected cache corruption | crab cache clean (nuclear option) |
| Want to reclaim some space | crab prune |
Remote Cache Service
The local cache is per-machine — each developer and CI runner maintains their own. For teams, Crab offers an optional remote cache service (crab-cache-server) that shares cached objects across all machines in your organization.
When configured, the lookup order is: local cache → remote cache → cloud storage. The first fetch from any team member warms the remote cache, and subsequent fetches from anyone else are served at network-local speed instead of cloud latency.
For deployment and configuration details, see the Cache Service guide.
CLI Reference
For complete command syntax, see the crab cache reference.