Getting Started with Crab in 5 Minutes

The five-minute path

1Install the Crab binary and verify the Git remote helper is available.
2Initialize a repository against your cloud bucket.
3Track and commit a large file using normal Git commands.
4Push through Crab so large-file bytes land in object storage.
5Clone elsewhere and hydrate the file to prove the workflow end to end.

What You'll Build

By the end of this tutorial you'll have a working Crab repository with a large file stored in your own cloud bucket. You'll push it from one machine and clone it on another — no servers, no configuration files, no LFS endpoints.

The whole thing takes about five minutes.

Prerequisites

You need two things before starting:

Git installed (any recent version)
A cloud storage bucket — an AWS S3 bucket, Google Cloud Storage bucket, or Azure Blob container that you can write to

Make sure your cloud credentials are configured in your environment. For AWS, that means ~/.aws/credentials or the standard AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set.

Step 1: Install Crab

Run the installer:

curl -fsSL https://crab.build/install.sh | sh

Expected output:

Downloading crab v0.9.4 for darwin-arm64...
Installing to /usr/local/bin/crab
Creating symlink: git-remote-crab -> crab
Done! Run 'crab --version' to verify.

Verify the installation:

crab --version

crab 0.9.4

That single binary handles everything — the CLI commands and the git remote helper. No background services, no daemons.

Step 2: Initialize a Repository

Create a new git repository and tell Crab which bucket to use:

mkdir my-project && cd my-project
git init
crab init crab://my-bucket/my-project

Expected output:

Initialized crab remote: crab://my-bucket/my-project
Remote 'origin' configured.

This adds a git remote pointing at your bucket. Under the hood, it's just a standard git remote URL that the git-remote-crab helper knows how to handle.

Step 3: Add a Large File

Let's create a sample large file (or use one you already have — a model checkpoint, dataset, video, anything):

# Create a 500 MB test file
dd if=/dev/urandom of=model.bin bs=1m count=500

Now track it with Crab and commit:

crab add model.bin
git add .
git commit -m "Add model checkpoint"

Expected output:

[crab] Chunking model.bin (500.0 MB)
[crab] 847 chunks, 492.3 MB unique data
[master (root-commit) a3f1c2d] Add model checkpoint
 2 files changed, 1 insertion(+)
 create mode 100644 .crabconfig
 create mode 100644 model.bin

The crab add command splits your file into content-defined chunks and stages the unique data. Git sees a lightweight pointer file — the actual bytes live in Crab's staging area, ready to upload.

Step 4: Push to Cloud Storage

git push origin main

Expected output:

Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Writing objects: 100% (4/4), 312 bytes | 312.00 KiB/s, done.
[crab] Uploading 492.3 MB (847 chunks, 12 xorbs)
[crab] ████████████████████████████████ 100% (52.4 MB/s)
[crab] Push complete: crab://my-bucket/my-project
To crab://my-bucket/my-project
 * [new branch]      main -> main

Your file is now stored as deduplicated, compressed chunks in your S3 bucket. Git handled the refs and commit objects as usual — Crab handled the large-file data.

Step 5: Clone on Another Machine

On a different machine (or a different directory to simulate it), clone the repository:

crab clone crab://my-bucket/my-project my-project-clone
cd my-project-clone

Expected output:

Cloning into 'my-project-clone'...
[crab] Fetching refs from crab://my-bucket/my-project
[crab] Downloading file data...
[crab] ████████████████████████████████ 100% (78.2 MB/s)
[crab] Hydrating 1 file (500.0 MB)
Done. Repository ready at my-project-clone/

Verify the file is intact:

ls -lh model.bin

-rw-r--r--  1 user  staff   500M Aug 25 10:32 model.bin

Your 500 MB file is back, byte-for-byte identical to the original.

How It Works

Here's what happened across those five steps:

Install placed a single binary on your machine. That binary acts as both the crab CLI and the git-remote-crab helper that git calls automatically.
Init configured a git remote pointing at your cloud bucket. No server to provision, no SSH keys to exchange with a hosting provider.
Add split your file into variable-size chunks using content-defined boundaries. This means if you later change part of the file, only the affected chunks get re-uploaded — not the whole thing.
Push uploaded the unique chunks as compressed archives (called xorbs) directly to your bucket. Git handled the commit graph normally; Crab handled the large-file data.
Clone downloaded the commit history and file data from the bucket, then reconstructed your file from its chunks — verified byte-for-byte with cryptographic hashes.

What's Next

You now have a working Crab setup. Here are some things to try:

Modify and push again — change part of model.bin and push. You'll see that only the changed chunks upload (deduplication in action).
Add more files — crab add works on any file. Track datasets, videos, compiled assets — anything too large for vanilla git.
Lazy checkout — for very large repos, use crab clone --lazy to download only pointer files initially, then hydrate individual files on demand with crab hydrate.
Mount as a virtual filesystem — crab mount exposes your repo through FUSE, so files appear on disk but download transparently when accessed.

That's it. Five commands, five minutes, and your large files live in cloud storage with full git history — no servers required.

The five-minute path

1Install the Crab binary and verify the Git remote helper is available.
2Initialize a repository against your cloud bucket.
3Track and commit a large file using normal Git commands.
4Push through Crab so large-file bytes land in object storage.
5Clone elsewhere and hydrate the file to prove the workflow end to end.

What You'll Build

The whole thing takes about five minutes.

Prerequisites

You need two things before starting:

Git installed (any recent version)
A cloud storage bucket — an AWS S3 bucket, Google Cloud Storage bucket, or Azure Blob container that you can write to

Step 1: Install Crab

Run the installer:

curl -fsSL https://crab.build/install.sh | sh

Expected output:

Downloading crab v0.9.4 for darwin-arm64...
Installing to /usr/local/bin/crab
Creating symlink: git-remote-crab -> crab
Done! Run 'crab --version' to verify.

Verify the installation:

crab --version

crab 0.9.4

That single binary handles everything — the CLI commands and the git remote helper. No background services, no daemons.

Step 2: Initialize a Repository

Create a new git repository and tell Crab which bucket to use:

mkdir my-project && cd my-project
git init
crab init crab://my-bucket/my-project

Expected output:

Initialized crab remote: crab://my-bucket/my-project
Remote 'origin' configured.

This adds a git remote pointing at your bucket. Under the hood, it's just a standard git remote URL that the git-remote-crab helper knows how to handle.

Step 3: Add a Large File

Let's create a sample large file (or use one you already have — a model checkpoint, dataset, video, anything):

# Create a 500 MB test file
dd if=/dev/urandom of=model.bin bs=1m count=500

Now track it with Crab and commit:

crab add model.bin
git add .
git commit -m "Add model checkpoint"

Expected output:

[crab] Chunking model.bin (500.0 MB)
[crab] 847 chunks, 492.3 MB unique data
[master (root-commit) a3f1c2d] Add model checkpoint
 2 files changed, 1 insertion(+)
 create mode 100644 .crabconfig
 create mode 100644 model.bin

The crab add command splits your file into content-defined chunks and stages the unique data. Git sees a lightweight pointer file — the actual bytes live in Crab's staging area, ready to upload.

Step 4: Push to Cloud Storage

git push origin main

Expected output:

Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Writing objects: 100% (4/4), 312 bytes | 312.00 KiB/s, done.
[crab] Uploading 492.3 MB (847 chunks, 12 xorbs)
[crab] ████████████████████████████████ 100% (52.4 MB/s)
[crab] Push complete: crab://my-bucket/my-project
To crab://my-bucket/my-project
 * [new branch]      main -> main

Your file is now stored as deduplicated, compressed chunks in your S3 bucket. Git handled the refs and commit objects as usual — Crab handled the large-file data.

Step 5: Clone on Another Machine

On a different machine (or a different directory to simulate it), clone the repository:

crab clone crab://my-bucket/my-project my-project-clone
cd my-project-clone

Expected output:

Cloning into 'my-project-clone'...
[crab] Fetching refs from crab://my-bucket/my-project
[crab] Downloading file data...
[crab] ████████████████████████████████ 100% (78.2 MB/s)
[crab] Hydrating 1 file (500.0 MB)
Done. Repository ready at my-project-clone/

Verify the file is intact:

ls -lh model.bin

-rw-r--r--  1 user  staff   500M Aug 25 10:32 model.bin

Your 500 MB file is back, byte-for-byte identical to the original.

How It Works

Here's what happened across those five steps:

Install placed a single binary on your machine. That binary acts as both the crab CLI and the git-remote-crab helper that git calls automatically.
Init configured a git remote pointing at your cloud bucket. No server to provision, no SSH keys to exchange with a hosting provider.
Add split your file into variable-size chunks using content-defined boundaries. This means if you later change part of the file, only the affected chunks get re-uploaded — not the whole thing.
Push uploaded the unique chunks as compressed archives (called xorbs) directly to your bucket. Git handled the commit graph normally; Crab handled the large-file data.
Clone downloaded the commit history and file data from the bucket, then reconstructed your file from its chunks — verified byte-for-byte with cryptographic hashes.

What's Next

You now have a working Crab setup. Here are some things to try:

Modify and push again — change part of model.bin and push. You'll see that only the changed chunks upload (deduplication in action).
Add more files — crab add works on any file. Track datasets, videos, compiled assets — anything too large for vanilla git.
Lazy checkout — for very large repos, use crab clone --lazy to download only pointer files initially, then hydrate individual files on demand with crab hydrate.
Mount as a virtual filesystem — crab mount exposes your repo through FUSE, so files appear on disk but download transparently when accessed.

That's it. Five commands, five minutes, and your large files live in cloud storage with full git history — no servers required.

Getting Started with Crab in 5 Minutes

What You'll Build

Prerequisites

Step 1: Install Crab

Step 2: Initialize a Repository

Step 3: Add a Large File

Step 4: Push to Cloud Storage

Step 5: Clone on Another Machine

How It Works

What's Next

Browse First Workflow

Your First Crab Push: A Step-by-Step Walkthrough

Related guides

Getting Started with Crab in 5 Minutes

What You'll Build

Prerequisites

Step 1: Install Crab

Step 2: Initialize a Repository

Step 3: Add a Large File

Step 4: Push to Cloud Storage

Step 5: Clone on Another Machine

How It Works

What's Next

Browse First Workflow

Your First Crab Push: A Step-by-Step Walkthrough

Related guides