Your First Crab Push: A Step-by-Step Walkthrough

What We're Building

By the end of this tutorial, you'll push a 1 GB machine learning model file to S3 using nothing but Crab and standard git commands. Then you'll clone the repo on a fresh machine and verify the file comes back byte-for-byte identical.

No servers to set up. No LFS endpoints to configure. Just your files, your bucket, and git.

Let's do it.

Step 1: Install Crab

First, install the Crab CLI. This gives you the crab command and a git remote helper that works behind the scenes.

curl -fsSL https://crab.build/install.sh | sh

✓ Downloaded crab v0.9.2 (darwin-arm64)
✓ Installed to ~/.cargo/bin/crab
✓ Linked git-remote-crab helper
✓ Ready to go!

Verify the install:

crab --version

crab 0.9.2

That's all you need. Crab is a single binary — no runtime dependencies, no background services.

Step 2: Configure Your Bucket

Crab stores your large files in a cloud bucket you own. For this tutorial, we'll use an S3 bucket. Make sure you have AWS credentials configured (via ~/.aws/credentials or environment variables).

Create a new git repo and point Crab at your bucket:

mkdir ml-project && cd ml-project
git init
crab init --remote s3://my-crab-bucket/ml-project

✓ Initialized crab in /Users/you/ml-project
✓ Remote: s3://my-crab-bucket/ml-project
✓ Added crab remote to git config

That's the entire setup. Crab wrote a small config file and registered itself as a git remote. Your bucket is now your file server.

Step 3: Add Your Large File

Let's say you have a 1 GB model checkpoint. Stage it with crab add:

crab add models/resnet50-v2.bin

Staging models/resnet50-v2.bin (1.02 GB)
  ├─ Chunking .............. 847 chunks
  ├─ Deduplicating ......... 0 existing, 847 new
  └─ Staged ✓

Ready to commit.

crab add does the heavy lifting: it splits your file into content-aware chunks and stages them for upload. Nothing has left your machine yet.

Step 4: Commit and Push

Now use regular git commands — commit and push just like you normally would:

git add .
git commit -m "Add resnet50-v2 model checkpoint"

[main (root-commit) a3f7c21] Add resnet50-v2 model checkpoint
 2 files changed, 4 insertions(+)
 create mode 100644 .crab/config
 create mode 100644 models/resnet50-v2.bin

git push origin main

Pushing to s3://my-crab-bucket/ml-project
  ├─ Uploading 847 chunks (1.02 GB) ████████████████████ 100%
  ├─ Writing shard index .... done
  ├─ Writing manifest ....... done
  └─ Finalizing refs ........ done

To crab://my-crab-bucket/ml-project
 * [new branch]      main -> main

Your file is now in S3. The push uploaded 847 compressed chunks, wrote an index so Crab knows how to reassemble them, and updated the branch ref. All through a standard git push.

What Happened Behind the Scenes

Here's the simplified version of what Crab did during that push:

Three things worth noting:

Chunking is content-aware. Crab doesn't split at fixed byte offsets. It finds natural boundaries in your data, so small edits only affect nearby chunks — not the entire file.
Deduplication happens before upload. If you push a slightly modified version of this model tomorrow, Crab will skip the 800+ chunks that didn't change and only upload the handful of new ones.
Everything is compressed. Chunks are packed into compressed archives (called xorbs) before upload, so your actual bandwidth and storage usage is lower than the raw file size.

Step 5: Verify It Worked

Let's prove the file made it safely. Open a new terminal (or imagine you're on a different machine) and clone the repo:

cd /tmp
crab clone crab://my-crab-bucket/ml-project ml-project-clone

Cloning into 'ml-project-clone'...
  ├─ Fetching refs ......... done
  ├─ Fetching objects ...... done
  └─ Checkout .............. done

Repository cloned. Large files are dehydrated (pointer-only).
Run 'crab hydrate' to download file contents.

The clone is fast because it only downloads git metadata and lightweight pointers — not the full 1 GB file. To get the actual file contents, hydrate:

cd ml-project-clone
crab hydrate

Hydrating 1 file (1.02 GB)
  ├─ models/resnet50-v2.bin ████████████████████ 100%
  └─ Done ✓ (1 file, 1.02 GB, 12.4s)

Now verify the file is identical:

sha256sum models/resnet50-v2.bin

a7f3b2c1...d94e  models/resnet50-v2.bin

Same hash as the original. Byte-for-byte identical. Crab reconstructed your file perfectly from its chunks.

What You Just Did

Let's recap. In about five minutes, you:

Installed Crab (one command)
Pointed it at an S3 bucket (one command)
Staged a 1 GB file with smart chunking (crab add)
Pushed it with standard git (git commit + git push)
Cloned it elsewhere and verified it came back perfectly (crab clone + crab hydrate)

No server running. No LFS endpoint. No special hosting. Just your bucket and git.

Next Steps

Now that you've done your first push, here are some things to try:

Modify and re-push — Edit the model file and push again. Watch how Crab only uploads the changed chunks (much faster the second time).
Selective hydration — Use crab hydrate models/ to only download files in a specific directory.
Check storage usage — Run crab status to see how much unique data is in your bucket.
Add teammates — Anyone with bucket access and Crab installed can clone and contribute. No server setup needed.

For more details, check out the CLI documentation or learn how deduplication works under the hood.

Happy pushing!

What We're Building

No servers to set up. No LFS endpoints to configure. Just your files, your bucket, and git.

Let's do it.

Step 1: Install Crab

First, install the Crab CLI. This gives you the crab command and a git remote helper that works behind the scenes.

curl -fsSL https://crab.build/install.sh | sh

✓ Downloaded crab v0.9.2 (darwin-arm64)
✓ Installed to ~/.cargo/bin/crab
✓ Linked git-remote-crab helper
✓ Ready to go!

Verify the install:

crab --version

crab 0.9.2

That's all you need. Crab is a single binary — no runtime dependencies, no background services.

Step 2: Configure Your Bucket

Crab stores your large files in a cloud bucket you own. For this tutorial, we'll use an S3 bucket. Make sure you have AWS credentials configured (via ~/.aws/credentials or environment variables).

Create a new git repo and point Crab at your bucket:

mkdir ml-project && cd ml-project
git init
crab init --remote s3://my-crab-bucket/ml-project

✓ Initialized crab in /Users/you/ml-project
✓ Remote: s3://my-crab-bucket/ml-project
✓ Added crab remote to git config

That's the entire setup. Crab wrote a small config file and registered itself as a git remote. Your bucket is now your file server.

Step 3: Add Your Large File

Let's say you have a 1 GB model checkpoint. Stage it with crab add:

crab add models/resnet50-v2.bin

Staging models/resnet50-v2.bin (1.02 GB)
  ├─ Chunking .............. 847 chunks
  ├─ Deduplicating ......... 0 existing, 847 new
  └─ Staged ✓

Ready to commit.

crab add does the heavy lifting: it splits your file into content-aware chunks and stages them for upload. Nothing has left your machine yet.

Step 4: Commit and Push

Now use regular git commands — commit and push just like you normally would:

git add .
git commit -m "Add resnet50-v2 model checkpoint"

[main (root-commit) a3f7c21] Add resnet50-v2 model checkpoint
 2 files changed, 4 insertions(+)
 create mode 100644 .crab/config
 create mode 100644 models/resnet50-v2.bin

git push origin main

Pushing to s3://my-crab-bucket/ml-project
  ├─ Uploading 847 chunks (1.02 GB) ████████████████████ 100%
  ├─ Writing shard index .... done
  ├─ Writing manifest ....... done
  └─ Finalizing refs ........ done

To crab://my-crab-bucket/ml-project
 * [new branch]      main -> main

Your file is now in S3. The push uploaded 847 compressed chunks, wrote an index so Crab knows how to reassemble them, and updated the branch ref. All through a standard git push.

What Happened Behind the Scenes

Here's the simplified version of what Crab did during that push:

Three things worth noting:

Chunking is content-aware. Crab doesn't split at fixed byte offsets. It finds natural boundaries in your data, so small edits only affect nearby chunks — not the entire file.
Deduplication happens before upload. If you push a slightly modified version of this model tomorrow, Crab will skip the 800+ chunks that didn't change and only upload the handful of new ones.
Everything is compressed. Chunks are packed into compressed archives (called xorbs) before upload, so your actual bandwidth and storage usage is lower than the raw file size.

Step 5: Verify It Worked

Let's prove the file made it safely. Open a new terminal (or imagine you're on a different machine) and clone the repo:

cd /tmp
crab clone crab://my-crab-bucket/ml-project ml-project-clone

Cloning into 'ml-project-clone'...
  ├─ Fetching refs ......... done
  ├─ Fetching objects ...... done
  └─ Checkout .............. done

Repository cloned. Large files are dehydrated (pointer-only).
Run 'crab hydrate' to download file contents.

The clone is fast because it only downloads git metadata and lightweight pointers — not the full 1 GB file. To get the actual file contents, hydrate:

cd ml-project-clone
crab hydrate

Hydrating 1 file (1.02 GB)
  ├─ models/resnet50-v2.bin ████████████████████ 100%
  └─ Done ✓ (1 file, 1.02 GB, 12.4s)

Now verify the file is identical:

sha256sum models/resnet50-v2.bin

a7f3b2c1...d94e  models/resnet50-v2.bin

Same hash as the original. Byte-for-byte identical. Crab reconstructed your file perfectly from its chunks.

What You Just Did

Let's recap. In about five minutes, you:

Installed Crab (one command)
Pointed it at an S3 bucket (one command)
Staged a 1 GB file with smart chunking (crab add)
Pushed it with standard git (git commit + git push)
Cloned it elsewhere and verified it came back perfectly (crab clone + crab hydrate)

No server running. No LFS endpoint. No special hosting. Just your bucket and git.

Next Steps

Now that you've done your first push, here are some things to try:

Modify and re-push — Edit the model file and push again. Watch how Crab only uploads the changed chunks (much faster the second time).
Selective hydration — Use crab hydrate models/ to only download files in a specific directory.
Check storage usage — Run crab status to see how much unique data is in your bucket.
Add teammates — Anyone with bucket access and Crab installed can clone and contribute. No server setup needed.

For more details, check out the CLI documentation or learn how deduplication works under the hood.

Happy pushing!

Your First Crab Push: A Step-by-Step Walkthrough

What We're Building

Step 1: Install Crab

Step 2: Configure Your Bucket

Step 3: Add Your Large File

Step 4: Commit and Push

What Happened Behind the Scenes

Step 5: Verify It Worked

What You Just Did

Next Steps

Getting Started with Crab in 5 Minutes

How Crab Plugs Into Your Git Workflow

Related guides

Your First Crab Push: A Step-by-Step Walkthrough

What We're Building

Step 1: Install Crab

Step 2: Configure Your Bucket

Step 3: Add Your Large File

Step 4: Commit and Push

What Happened Behind the Scenes

Step 5: Verify It Worked

What You Just Did

Next Steps

Getting Started with Crab in 5 Minutes

How Crab Plugs Into Your Git Workflow

Related guides