What Is Crab? Git for Large Files, Without the Server
Crab lets you push and pull large files with plain git commands, storing everything in your own cloud bucket. No server to run, no LFS endpoint to maintain — just your files and your storage.
- Why large files make plain Git slow and expensive
- How Crab routes large-file bytes directly to your cloud bucket
- Where deduplication, pointers, and hydration fit in the workflow
- When Crab is a better fit than a hosted LFS server
The Short Version
You have large files — models, datasets, game assets, media — and git wasn't built for them. Crab fixes that. It's a small tool that plugs into git and routes your large files straight to cloud storage (S3, GCS, or Azure). No server in between. No infrastructure to manage. Just your files, your bucket, and the git commands you already know.
Who Is Crab For?
Crab is built for teams that work with large files every day but don't want to run (or pay for) dedicated file servers.
Machine learning engineers who iterate on multi-gigabyte model weights and training datasets. You push a new checkpoint, and Crab only uploads the parts that actually changed — not the entire file again.
Game developers managing binary assets like textures, meshes, and audio. Your artists commit assets alongside code, and everyone clones only what they need for their current task.
Data teams versioning large Parquet files, database dumps, or media libraries. You get full git history for your data without blowing up your storage bill.
If your workflow involves files bigger than what GitHub or GitLab comfortably handles, Crab is worth a look.
How It Works
The core idea is simple: your repository talks directly to cloud storage. There's no server sitting in between.
When you run git push, Crab intercepts the large files, splits them into chunks, deduplicates them against what's already in your bucket, and uploads only the new pieces. When you git clone or git pull, it reassembles your files from those chunks. The entire process uses standard git commands — no new workflow to learn.
What Makes Crab Different from Git LFS
If you've used Git LFS before, you know the drill: set up a server, configure endpoints, manage storage quotas, and hope the server stays up when your team needs it. Crab takes a fundamentally different approach.
No server to run. Git LFS requires a dedicated server (or a hosted service) to store and serve your files. Crab stores everything directly in your cloud bucket. There's nothing to deploy, scale, or keep online.
Smart deduplication. When you modify a large file, LFS uploads the entire new version. Crab splits files into content-aware chunks and only uploads the pieces that actually changed. Edit one layer of a 4 GB model? You might upload 50 MB instead of 4 GB.
You own the storage. Your files live in your own S3 bucket (or GCS, or Azure). You control access, retention, and cost. There's no third-party service holding your data or charging per-seat fees for storage access.
Try It in 30 Seconds
Here's what using Crab looks like. If you've used git before, this will feel familiar:
# Install crab
curl -fsSL https://crab.build/install.sh | sh
# Initialize crab in an existing git repo
cd my-project
crab init --remote s3://my-bucket/my-project
# Add a large file and push
crab add models/checkpoint-v2.bin
git commit -m "Add latest model checkpoint"
git push origin mainThat's it. Your 3 GB model file is now chunked, deduplicated, and stored in your S3 bucket. Your teammates can clone the repo and pull down exactly the files they need — no special setup on their end beyond having Crab installed.
What Happens Under the Hood
You don't need to understand the internals to use Crab, but here's the quick version:
- Chunking — Crab splits your file at content-determined boundaries (not fixed sizes). This means small edits only affect nearby chunks.
- Deduplication — Before uploading, Crab checks which chunks already exist in your bucket. Duplicates are skipped entirely.
- Upload — Only genuinely new chunks get uploaded and compressed.
- Metadata — A lightweight manifest tracks which chunks make up each file version, so reconstruction is fast and exact.
The result: fast pushes, minimal bandwidth, and storage costs that scale with your actual unique data — not with how many times you've modified a file.
Getting Started
Crab works on macOS, Linux, and Windows. It supports S3, Google Cloud Storage, and Azure Blob Storage. You can start using it in under five minutes:
- Install Crab (one command)
- Point it at a bucket you own
- Use git normally — Crab handles the rest
Check out our Getting Started guide for a full walkthrough, or visit the CLI documentation for detailed command reference.
Whether you're versioning ML models, game assets, or research datasets, Crab gives you the version control workflow you already know — without the server you never wanted to manage.