How Crab Plugs Into Your Git Workflow

Crab Doesn't Replace Git — It Extends It

You already know git push, git pull, and git add. Crab doesn't change any of those. It plugs into git's built-in extension system so large files get chunked, deduplicated, and stored in cloud storage automatically. You keep your existing workflow; Crab handles the heavy lifting in the background.

What You'll Learn

How Crab hooks into git without changing your commands
The two extension points that make transparent large-file handling possible
Why a single binary serving both roles avoids subtle version-skew bugs
Why the filter process stays fast even with thousands of files

Key Takeaways

Zero workflow changes. You keep using git push, git pull, git add — Crab handles the rest behind the scenes.
Two roles, one binary. A remote helper (transport) and a filter process (file transformation) ship as the same compiled binary.
No version skew. Both roles share one binary, one staging area, and one config — so they can't drift out of sync.
Fast at scale. The filter stays running for the entire git operation, so adding 500 large files is nearly as fast as adding 1.

How One Binary Serves Two Roles

Git was designed to be extended without forking. Crab plugs into two of those extension points:

Remote helper. When you git push or git pull, git spawns a helper program to talk to the remote. Crab's helper speaks to S3, GCS, or Azure instead of a git server.
Filter process. When you git add a large file, the filter swaps it for a tiny pointer. On checkout, it reconstructs the original file from cloud storage.

What's unusual about Crab is that both roles are the same compiled binary. Git decides which mode to enter based on the name it invoked the binary with — a Unix trick called argv[0] dispatch. The same idea powers BusyBox, where a single executable provides dozens of utilities.

The single-binary design has practical consequences. There's no version skew between transport and filter — they always run identical code. They share the same staging area, configuration, and chunk cache, so chunks staged during git add are immediately available during git push. And installation is one command instead of three.

How Push and Fetch Work Behind the Scenes

When you run git push crab://bucket/repo, git doesn't know how to talk to S3. It looks for a program called git-remote-crab on your PATH, spawns it, and communicates over a simple text protocol on stdin and stdout.

The conversation looks roughly like this:

Handshake. Git asks "what can you do?" and the helper replies with its capabilities.
List refs. Git asks "what branches exist on the remote?" The helper reads the ref manifest from cloud storage.
Push. Git sends the refs to update. The helper runs Crab's push pipeline: read staged chunks, pack them into compressed archives (called xorbs), upload to S3, update the manifest.
Report. The helper tells git whether each ref update succeeded or failed.

It's the same pattern that git-remote-https uses for GitHub — just pointed at object storage instead of a git server. From git's perspective, S3 is the remote.

Zero workflow changes

Type git push, git pull, git clone — everything works as expected. Crab handles the heavy lifting behind the scenes. No new commands to learn for everyday work.

How Large Files Stay Transparent

The remote helper handles transport. The filter process handles content transformation — it's what makes large files appear normal in your working tree while only tiny pointers actually live in git's object database.

Two operations make this work:

Clean runs on git add. It chunks and deduplicates your 4 GB model file, stages the chunks locally, and gives git a small text pointer to commit.
Smudge runs on git checkout. It takes the pointer and reconstructs the original file from local cache or cloud storage.

The performance trick is git's long-running filter protocol. Older filter drivers spawn a fresh process per file — fine for ten files, catastrophic for a thousand. The long-running protocol spawns the filter once and streams every operation over a persistent pipe, so the expensive setup (opening the staging database, loading the chunk cache, warming the dedup bloom filter) only happens once.

In practice, this means staging 500 large files is nearly as fast as staging 1. The dedup bloom filter, staging database handle, and chunk cache all live across the entire git add command.

What the Pointer Looks Like

When the filter cleans a file, it produces a small text pointer that git actually commits:

version https://crab.io/spec/v1
oid blake3:a7f3b2c1d4e5f6789012345678901234567890123456789012345678901234ab
size 1073741824

Under 200 bytes regardless of whether the original was 100 MB or 100 GB. The oid is a Blake3 hash that uniquely identifies the file's content; size records the original byte count. On checkout, the smudge filter uses this pointer to reconstruct the exact original file from cloud storage (or local cache if it's already been downloaded).

How Installation Connects Everything

When you run make install, three things happen:

The release binary builds.
It's installed to ~/.cargo/bin/crab.
A symlink is created: ~/.cargo/bin/git-remote-crab → crab.

Then your git config ties it together:

[filter "crab"]
    process = crab filter-process
    required = true

The required = true flag matters. Without it, git would silently commit raw file content if the filter exits unexpectedly — easy to miss until you accidentally push a 4 GB file directly into git's object database. With required, git aborts on filter failure and you get a clear error instead of a corrupted commit.

Always use make install

Don't copy the binary manually or use cargo install. The Makefile keeps the binary and git-remote-crab symlink in sync. A stale symlink pointing at an old binary is the most common source of hard-to-debug failures.

The Complete Flow: Add, Commit, Push

Putting it all together, here's what happens end-to-end when you push a large file:

git add large-model.bin → the filter chunks the file, stages the chunks locally, and hands git a pointer blob.
git commit → git stores the pointer (not the 4 GB file) in the commit object.
git push origin main → git spawns git-remote-crab.
The remote helper reads staged chunks, classifies them as new vs. already-uploaded, packs new ones into xorbs, uploads to S3, updates the manifest, and reports success.

When a collaborator clones or pulls, the same machinery runs in reverse. Git spawns git-remote-crab to fetch refs and pack data. On checkout, the filter sees pointer blobs and reconstructs files from cloud storage or local cache.

Your team uses standard git commands, large files live efficiently in cloud storage, and nobody has to run a server.

What This Means for Your Workflow

The single-binary approach gives you three things that matter day-to-day.

No version skew. Multi-binary git extensions are notorious for subtle bugs when components drift out of sync after a partial upgrade. Crab's transport and filter always run the same code by construction.

Shared state. Both roles read the same staging area, configuration, and chunk cache. Chunks staged during git add are immediately available during git push — no redundant work, no second pass.

Simple installation. One make install and you're done. No package managers, no separate services, no daemon to keep running. Your repo URL just looks like crab://bucket/repo instead of https://github.com/....

The deeper point: Crab doesn't reinvent git. It uses git's own well-defined extension points to add capabilities git doesn't have on its own. Your existing tooling, scripts, IDE integrations, and CI jobs keep working — they're still talking to git the same way. Crab just makes large-file storage cheap, fast, and serverless underneath.

Crab Doesn't Replace Git — It Extends It

What You'll Learn

How Crab hooks into git without changing your commands
The two extension points that make transparent large-file handling possible
Why a single binary serving both roles avoids subtle version-skew bugs
Why the filter process stays fast even with thousands of files

Key Takeaways

Zero workflow changes. You keep using git push, git pull, git add — Crab handles the rest behind the scenes.
Two roles, one binary. A remote helper (transport) and a filter process (file transformation) ship as the same compiled binary.
No version skew. Both roles share one binary, one staging area, and one config — so they can't drift out of sync.
Fast at scale. The filter stays running for the entire git operation, so adding 500 large files is nearly as fast as adding 1.

How One Binary Serves Two Roles

Git was designed to be extended without forking. Crab plugs into two of those extension points:

Remote helper. When you git push or git pull, git spawns a helper program to talk to the remote. Crab's helper speaks to S3, GCS, or Azure instead of a git server.
Filter process. When you git add a large file, the filter swaps it for a tiny pointer. On checkout, it reconstructs the original file from cloud storage.

How Push and Fetch Work Behind the Scenes

The conversation looks roughly like this:

Handshake. Git asks "what can you do?" and the helper replies with its capabilities.
List refs. Git asks "what branches exist on the remote?" The helper reads the ref manifest from cloud storage.
Push. Git sends the refs to update. The helper runs Crab's push pipeline: read staged chunks, pack them into compressed archives (called xorbs), upload to S3, update the manifest.
Report. The helper tells git whether each ref update succeeded or failed.

It's the same pattern that git-remote-https uses for GitHub — just pointed at object storage instead of a git server. From git's perspective, S3 is the remote.

Zero workflow changes

Type git push, git pull, git clone — everything works as expected. Crab handles the heavy lifting behind the scenes. No new commands to learn for everyday work.

How Large Files Stay Transparent

Two operations make this work:

Clean runs on git add. It chunks and deduplicates your 4 GB model file, stages the chunks locally, and gives git a small text pointer to commit.
Smudge runs on git checkout. It takes the pointer and reconstructs the original file from local cache or cloud storage.

In practice, this means staging 500 large files is nearly as fast as staging 1. The dedup bloom filter, staging database handle, and chunk cache all live across the entire git add command.

What the Pointer Looks Like

When the filter cleans a file, it produces a small text pointer that git actually commits:

version https://crab.io/spec/v1
oid blake3:a7f3b2c1d4e5f6789012345678901234567890123456789012345678901234ab
size 1073741824

How Installation Connects Everything

When you run make install, three things happen:

The release binary builds.
It's installed to ~/.cargo/bin/crab.
A symlink is created: ~/.cargo/bin/git-remote-crab → crab.

Then your git config ties it together:

[filter "crab"]
    process = crab filter-process
    required = true

Always use make install

The Complete Flow: Add, Commit, Push

Putting it all together, here's what happens end-to-end when you push a large file:

git add large-model.bin → the filter chunks the file, stages the chunks locally, and hands git a pointer blob.
git commit → git stores the pointer (not the 4 GB file) in the commit object.
git push origin main → git spawns git-remote-crab.
The remote helper reads staged chunks, classifies them as new vs. already-uploaded, packs new ones into xorbs, uploads to S3, updates the manifest, and reports success.

Your team uses standard git commands, large files live efficiently in cloud storage, and nobody has to run a server.

What This Means for Your Workflow

The single-binary approach gives you three things that matter day-to-day.

How Crab Plugs Into Your Git Workflow

Crab Doesn't Replace Git — It Extends It

What You'll Learn

Key Takeaways

How One Binary Serves Two Roles

How Push and Fetch Work Behind the Scenes

How Large Files Stay Transparent

What the Pointer Looks Like

How Installation Connects Everything

The Complete Flow: Add, Commit, Push

What This Means for Your Workflow

Your First Crab Push: A Step-by-Step Walkthrough

Explore all guides

Related guides

How Crab Plugs Into Your Git Workflow

Crab Doesn't Replace Git — It Extends It

What You'll Learn

Key Takeaways

How One Binary Serves Two Roles

How Push and Fetch Work Behind the Scenes

How Large Files Stay Transparent

What the Pointer Looks Like

How Installation Connects Everything

The Complete Flow: Add, Commit, Push

What This Means for Your Workflow

Your First Crab Push: A Step-by-Step Walkthrough

Explore all guides

Related guides

Technical detail: how argv[0] dispatch works

Your First Crab Push: A Step-by-Step Walkthrough

Explore all guides

Related guides

Technical detail: how argv[0] dispatch works

Your First Crab Push: A Step-by-Step Walkthrough

Explore all guides

Related guides