Getting Started with Crab in 5 Minutes
A step-by-step guide to installing Crab, initializing a repository, pushing a large file to S3, and cloning it elsewhere — with copy-paste commands and expected output.
- 1Install the Crab binary and verify the Git remote helper is available.
- 2Initialize a repository against your cloud bucket.
- 3Track and commit a large file using normal Git commands.
- 4Push through Crab so large-file bytes land in object storage.
- 5Clone elsewhere and hydrate the file to prove the workflow end to end.
What You'll Build
By the end of this tutorial you'll have a working Crab repository with a large file stored in your own cloud bucket. You'll push it from one machine and clone it on another — no servers, no configuration files, no LFS endpoints.
The whole thing takes about five minutes.
Prerequisites
You need two things before starting:
- Git installed (any recent version)
- A cloud storage bucket — an AWS S3 bucket, Google Cloud Storage bucket, or Azure Blob container that you can write to
Make sure your cloud credentials are configured in your environment. For AWS, that means ~/.aws/credentials or the standard AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set.
Step 1: Install Crab
Run the installer:
curl -fsSL https://crab.build/install.sh | shExpected output:
Downloading crab v0.9.4 for darwin-arm64...
Installing to /usr/local/bin/crab
Creating symlink: git-remote-crab -> crab
Done! Run 'crab --version' to verify.Verify the installation:
crab --versioncrab 0.9.4That single binary handles everything — the CLI commands and the git remote helper. No background services, no daemons.
Step 2: Initialize a Repository
Create a new git repository and tell Crab which bucket to use:
mkdir my-project && cd my-project
git init
crab init crab://my-bucket/my-projectExpected output:
Initialized crab remote: crab://my-bucket/my-project
Remote 'origin' configured.This adds a git remote pointing at your bucket. Under the hood, it's just a standard git remote URL that the git-remote-crab helper knows how to handle.
Step 3: Add a Large File
Let's create a sample large file (or use one you already have — a model checkpoint, dataset, video, anything):
# Create a 500 MB test file
dd if=/dev/urandom of=model.bin bs=1m count=500Now track it with Crab and commit:
crab add model.bin
git add .
git commit -m "Add model checkpoint"Expected output:
[crab] Chunking model.bin (500.0 MB)
[crab] 847 chunks, 492.3 MB unique data
[master (root-commit) a3f1c2d] Add model checkpoint
2 files changed, 1 insertion(+)
create mode 100644 .crabconfig
create mode 100644 model.binThe crab add command splits your file into content-defined chunks and stages the unique data. Git sees a lightweight pointer file — the actual bytes live in Crab's staging area, ready to upload.
Step 4: Push to Cloud Storage
git push origin mainExpected output:
Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Writing objects: 100% (4/4), 312 bytes | 312.00 KiB/s, done.
[crab] Uploading 492.3 MB (847 chunks, 12 xorbs)
[crab] ████████████████████████████████ 100% (52.4 MB/s)
[crab] Push complete: crab://my-bucket/my-project
To crab://my-bucket/my-project
* [new branch] main -> mainYour file is now stored as deduplicated, compressed chunks in your S3 bucket. Git handled the refs and commit objects as usual — Crab handled the large-file data.
Step 5: Clone on Another Machine
On a different machine (or a different directory to simulate it), clone the repository:
crab clone crab://my-bucket/my-project my-project-clone
cd my-project-cloneExpected output:
Cloning into 'my-project-clone'...
[crab] Fetching refs from crab://my-bucket/my-project
[crab] Downloading file data...
[crab] ████████████████████████████████ 100% (78.2 MB/s)
[crab] Hydrating 1 file (500.0 MB)
Done. Repository ready at my-project-clone/Verify the file is intact:
ls -lh model.bin-rw-r--r-- 1 user staff 500M Aug 25 10:32 model.binYour 500 MB file is back, byte-for-byte identical to the original.
How It Works
Here's what happened across those five steps:
-
Install placed a single binary on your machine. That binary acts as both the
crabCLI and thegit-remote-crabhelper that git calls automatically. -
Init configured a git remote pointing at your cloud bucket. No server to provision, no SSH keys to exchange with a hosting provider.
-
Add split your file into variable-size chunks using content-defined boundaries. This means if you later change part of the file, only the affected chunks get re-uploaded — not the whole thing.
-
Push uploaded the unique chunks as compressed archives (called xorbs) directly to your bucket. Git handled the commit graph normally; Crab handled the large-file data.
-
Clone downloaded the commit history and file data from the bucket, then reconstructed your file from its chunks — verified byte-for-byte with cryptographic hashes.
What's Next
You now have a working Crab setup. Here are some things to try:
- Modify and push again — change part of
model.binand push. You'll see that only the changed chunks upload (deduplication in action). - Add more files —
crab addworks on any file. Track datasets, videos, compiled assets — anything too large for vanilla git. - Lazy checkout — for very large repos, use
crab clone --lazyto download only pointer files initially, then hydrate individual files on demand withcrab hydrate. - Mount as a virtual filesystem —
crab mountexposes your repo through FUSE, so files appear on disk but download transparently when accessed.
That's it. Five commands, five minutes, and your large files live in cloud storage with full git history — no servers required.