What is Crab

Crab is a serverless git extension that stores large files in cloud object storage (S3, GCS, Azure). It works as a drop-in enhancement to git — no servers to run, no databases to manage, no LFS endpoints to configure. Just your bucket and a single binary.

The Problem

Git wasn't designed for large files. A 10 GB model file bloats your .git directory, slows clones to a crawl, and makes branch switching painful. Git LFS helps, but requires running a separate server or paying for hosted LFS storage.

How Crab Solves It

Crab sits between git and your cloud bucket. When you commit a large file, git only stores a tiny pointer blob (~128 bytes). The actual content is split into content-defined chunks, deduplicated, compressed, and uploaded to your own object storage.

The result:

Instant clones — Git only downloads pointers. You hydrate files on demand.
Efficient storage — Content-defined chunking means editing 1% of a file only stores 1% new data.
Standard git UX — git clone, git push, git pull all work normally. Crab is transparent.
Your infrastructure — Data lives in your S3/GCS/Azure bucket. No vendor lock-in, no per-seat pricing on storage.

Quick Start: New Repository

# 1. Install Crab
curl -fsSL https://crab.build/install.sh | bash

# 2. Initialize a repository with your bucket
cd my-project
crab init crab://my-bucket/my-repo

# 3. Ship your first commit
crab ship -m "initial commit"

That's it. crab init auto-detects large files, configures tracking patterns, generates a .crab.toml project config, and validates your credentials. crab ship handles staging, committing, and pushing in one command.

Quick Start: Existing Repository (Collaborator)

When joining a repo that already uses Crab:

# Clone from GitHub (or any git host)
git clone git@github.com:team/project.git
cd project

# Re-apply Crab config from .crab.toml
crab init

# Files are dehydrated (pointers). Hydrate what you need:
crab hydrate "*.py" "*.rs"

Running crab init with no URL reads the .crab.toml that's already in the repo and sets up everything locally — filter driver, remotes, and hooks.

Quick Start: Global Setup (Optional)

For a "set it and forget it" experience across all repos:

crab install --global

After this, any repo with a .crab.toml works immediately on clone — no per-repo crab init needed. The global filter driver auto-configures from the project config.

Key Concepts

Pointer Blobs

Every Crab-managed file in git is a pointer — a small text stub containing the file's hash and metadata. Git stores, diffs, and transfers these pointers. The actual content lives in cloud storage.

Content-Defined Chunking

Files are split at content-determined boundaries using a rolling hash. This means small edits to large files produce minimal new data. Two versions of a 10 GB model that differ by 5% share 95% of their chunks.

Hydration and Dehydration

Hydrate — Download chunks from cloud storage and reconstruct the original file.
Dehydrate — Replace a file with its pointer to free disk space.

You control which files are materialized on disk. Work on models? Hydrate just the model files. Done? Dehydrate them to reclaim space.

Lazy Checkout

After cloning, your working tree has pointers — not full content. You choose what to hydrate and when. A 500 GB repository clones in seconds.

Project Configuration (`.crab.toml`)

A .crab.toml file in the repo root declares the remote URL, tracking patterns, and hydration policy. It's committed to git so collaborators inherit the configuration automatically. See Project Configuration for the full reference.

Where To Go Next

Start Here

Installation & Setup — Install the Crab binary
Creating a Repository — Initialize a new Crab repo
Cloning a Repository — Clone an existing repo
Tracking Files — Configure which files Crab manages
Project Configuration — .crab.toml reference
Mirror Mode — Use GitHub or another git host alongside Crab storage
Importing Existing Buckets — Import data from cloud storage

Tutorials and Guides

Working with Files — The ship → hydrate cycle
Sharing Repositories — Collaborate with your team
Migrating from Git LFS — Move from LFS to Crab
CI/CD Integration — Use Crab in automated pipelines

Core Concepts

Working with Files — How Crab manages files through chunking, pointers, and hydration
Local Cache — Caching, staging, garbage collection, and storage tiers
Virtual Filesystem — Mount Crab repositories for on-demand access
Authentication & Config — Cloud credentials and configuration

Reference

CLI Command Reference — Complete reference for all crab commands

What is Crab

The Problem

How Crab Solves It

The result:

Instant clones — Git only downloads pointers. You hydrate files on demand.
Efficient storage — Content-defined chunking means editing 1% of a file only stores 1% new data.
Standard git UX — git clone, git push, git pull all work normally. Crab is transparent.
Your infrastructure — Data lives in your S3/GCS/Azure bucket. No vendor lock-in, no per-seat pricing on storage.

Quick Start: New Repository

# 1. Install Crab
curl -fsSL https://crab.build/install.sh | bash

# 2. Initialize a repository with your bucket
cd my-project
crab init crab://my-bucket/my-repo

# 3. Ship your first commit
crab ship -m "initial commit"

Quick Start: Existing Repository (Collaborator)

When joining a repo that already uses Crab:

# Clone from GitHub (or any git host)
git clone git@github.com:team/project.git
cd project

# Re-apply Crab config from .crab.toml
crab init

# Files are dehydrated (pointers). Hydrate what you need:
crab hydrate "*.py" "*.rs"

Running crab init with no URL reads the .crab.toml that's already in the repo and sets up everything locally — filter driver, remotes, and hooks.

Quick Start: Global Setup (Optional)

For a "set it and forget it" experience across all repos:

crab install --global

After this, any repo with a .crab.toml works immediately on clone — no per-repo crab init needed. The global filter driver auto-configures from the project config.

Key Concepts

Pointer Blobs

Every Crab-managed file in git is a pointer — a small text stub containing the file's hash and metadata. Git stores, diffs, and transfers these pointers. The actual content lives in cloud storage.

Content-Defined Chunking

Hydration and Dehydration

Hydrate — Download chunks from cloud storage and reconstruct the original file.
Dehydrate — Replace a file with its pointer to free disk space.

You control which files are materialized on disk. Work on models? Hydrate just the model files. Done? Dehydrate them to reclaim space.

Installation & Setup — Install the Crab binary
Creating a Repository — Initialize a new Crab repo
Cloning a Repository — Clone an existing repo
Tracking Files — Configure which files Crab manages
Project Configuration — .crab.toml reference
Mirror Mode — Use GitHub or another git host alongside Crab storage
Importing Existing Buckets — Import data from cloud storage

Tutorials and Guides

Working with Files — The ship → hydrate cycle
Sharing Repositories — Collaborate with your team
Migrating from Git LFS — Move from LFS to Crab
CI/CD Integration — Use Crab in automated pipelines

Core Concepts

Working with Files — How Crab manages files through chunking, pointers, and hydration
Local Cache — Caching, staging, garbage collection, and storage tiers
Virtual Filesystem — Mount Crab repositories for on-demand access
Authentication & Config — Cloud credentials and configuration

Reference

CLI Command Reference — Complete reference for all crab commands

What is Crab

The Problem

How Crab Solves It

Quick Start: New Repository

Quick Start: Existing Repository (Collaborator)

Quick Start: Global Setup (Optional)

Key Concepts

Pointer Blobs

Content-Defined Chunking

Hydration and Dehydration

Lazy Checkout

Project Configuration (`.crab.toml`)

Where To Go Next

Start Here

Tutorials and Guides

Core Concepts

Reference

On this page

What is Crab

The Problem

How Crab Solves It

Quick Start: New Repository

Quick Start: Existing Repository (Collaborator)

Quick Start: Global Setup (Optional)

Key Concepts

Pointer Blobs

Content-Defined Chunking

Hydration and Dehydration

Lazy Checkout

Project Configuration (`.crab.toml`)

Where To Go Next

Start Here

Tutorials and Guides

Core Concepts

Reference

On this page