Your First Crab Push: A Step-by-Step Walkthrough
A complete hands-on walkthrough of pushing a large file with Crab — from install to verification. Follow along with copy-paste commands and see exactly what to expect at each step.
What We're Building
By the end of this tutorial, you'll push a 1 GB machine learning model file to S3 using nothing but Crab and standard git commands. Then you'll clone the repo on a fresh machine and verify the file comes back byte-for-byte identical.
No servers to set up. No LFS endpoints to configure. Just your files, your bucket, and git.
Let's do it.
Step 1: Install Crab
First, install the Crab CLI. This gives you the crab command and a git remote helper that works behind the scenes.
curl -fsSL https://crab.build/install.sh | sh✓ Downloaded crab v0.9.2 (darwin-arm64)
✓ Installed to ~/.cargo/bin/crab
✓ Linked git-remote-crab helper
✓ Ready to go!Verify the install:
crab --versioncrab 0.9.2That's all you need. Crab is a single binary — no runtime dependencies, no background services.
Step 2: Configure Your Bucket
Crab stores your large files in a cloud bucket you own. For this tutorial, we'll use an S3 bucket. Make sure you have AWS credentials configured (via ~/.aws/credentials or environment variables).
Create a new git repo and point Crab at your bucket:
mkdir ml-project && cd ml-project
git init
crab init --remote s3://my-crab-bucket/ml-project✓ Initialized crab in /Users/you/ml-project
✓ Remote: s3://my-crab-bucket/ml-project
✓ Added crab remote to git configThat's the entire setup. Crab wrote a small config file and registered itself as a git remote. Your bucket is now your file server.
Step 3: Add Your Large File
Let's say you have a 1 GB model checkpoint. Stage it with crab add:
crab add models/resnet50-v2.binStaging models/resnet50-v2.bin (1.02 GB)
├─ Chunking .............. 847 chunks
├─ Deduplicating ......... 0 existing, 847 new
└─ Staged ✓
Ready to commit.crab add does the heavy lifting: it splits your file into content-aware chunks and stages them for upload. Nothing has left your machine yet.
Step 4: Commit and Push
Now use regular git commands — commit and push just like you normally would:
git add .
git commit -m "Add resnet50-v2 model checkpoint"[main (root-commit) a3f7c21] Add resnet50-v2 model checkpoint
2 files changed, 4 insertions(+)
create mode 100644 .crab/config
create mode 100644 models/resnet50-v2.bingit push origin mainPushing to s3://my-crab-bucket/ml-project
├─ Uploading 847 chunks (1.02 GB) ████████████████████ 100%
├─ Writing shard index .... done
├─ Writing manifest ....... done
└─ Finalizing refs ........ done
To crab://my-crab-bucket/ml-project
* [new branch] main -> mainYour file is now in S3. The push uploaded 847 compressed chunks, wrote an index so Crab knows how to reassemble them, and updated the branch ref. All through a standard git push.
What Happened Behind the Scenes
Here's the simplified version of what Crab did during that push:
Three things worth noting:
-
Chunking is content-aware. Crab doesn't split at fixed byte offsets. It finds natural boundaries in your data, so small edits only affect nearby chunks — not the entire file.
-
Deduplication happens before upload. If you push a slightly modified version of this model tomorrow, Crab will skip the 800+ chunks that didn't change and only upload the handful of new ones.
-
Everything is compressed. Chunks are packed into compressed archives (called xorbs) before upload, so your actual bandwidth and storage usage is lower than the raw file size.
Step 5: Verify It Worked
Let's prove the file made it safely. Open a new terminal (or imagine you're on a different machine) and clone the repo:
cd /tmp
crab clone crab://my-crab-bucket/ml-project ml-project-cloneCloning into 'ml-project-clone'...
├─ Fetching refs ......... done
├─ Fetching objects ...... done
└─ Checkout .............. done
Repository cloned. Large files are dehydrated (pointer-only).
Run 'crab hydrate' to download file contents.The clone is fast because it only downloads git metadata and lightweight pointers — not the full 1 GB file. To get the actual file contents, hydrate:
cd ml-project-clone
crab hydrateHydrating 1 file (1.02 GB)
├─ models/resnet50-v2.bin ████████████████████ 100%
└─ Done ✓ (1 file, 1.02 GB, 12.4s)Now verify the file is identical:
sha256sum models/resnet50-v2.bina7f3b2c1...d94e models/resnet50-v2.binSame hash as the original. Byte-for-byte identical. Crab reconstructed your file perfectly from its chunks.
What You Just Did
Let's recap. In about five minutes, you:
- Installed Crab (one command)
- Pointed it at an S3 bucket (one command)
- Staged a 1 GB file with smart chunking (
crab add) - Pushed it with standard git (
git commit+git push) - Cloned it elsewhere and verified it came back perfectly (
crab clone+crab hydrate)
No server running. No LFS endpoint. No special hosting. Just your bucket and git.
Next Steps
Now that you've done your first push, here are some things to try:
- Modify and re-push — Edit the model file and push again. Watch how Crab only uploads the changed chunks (much faster the second time).
- Selective hydration — Use
crab hydrate models/to only download files in a specific directory. - Check storage usage — Run
crab statusto see how much unique data is in your bucket. - Add teammates — Anyone with bucket access and Crab installed can clone and contribute. No server setup needed.
For more details, check out the CLI documentation or learn how deduplication works under the hood.
Happy pushing!