Crab vs Git LFS: What's Different and Why It Matters
Git LFS and Crab both solve the 'large files in git' problem, but they take fundamentally different approaches. Here's a fair comparison to help you choose.
Two Tools, One Problem
Git wasn't designed for large files. Push a 2 GB model or a 500 MB texture pack, and everything slows to a crawl — clones take forever, history bloats, and your teammates start complaining.
Both Git LFS and Crab solve this problem, but they take different architectural paths to get there. This post gives you a fair look at both so you can pick the right tool for your situation.
What You'll Learn
- How Git LFS works at a high level
- How Crab's serverless approach differs
- A side-by-side comparison on the dimensions that matter
- When each tool is the better choice
How Git LFS Works
Git LFS (Large File Storage) is the most widely adopted solution for large files in git. It replaces large files in your repository with small pointer files, then stores the actual file content on a separate LFS server.
When you push, LFS uploads the large file to its server via the Batch API. When you clone or pull, LFS downloads the files you need from that server. Your git history stays lean because it only contains pointers — not the full file data.
LFS is battle-tested, well-integrated with GitHub, GitLab, and Bitbucket, and familiar to most teams. It works well when you have a reliable LFS server and predictable storage needs.
How Crab Works Differently
Crab takes a serverless approach. There's no LFS server, no Batch API endpoint, and no separate infrastructure to manage. Your files go directly into cloud object storage — the same S3 bucket, GCS bucket, or Azure container that holds the rest of your repository data.
Crab acts as a git remote helper, which means it plugs into git's native push and fetch protocols. You use standard git commands (git push, git clone) and Crab handles the large file storage transparently.
Three key differences stand out:
- No server to run — files go straight to object storage. No middleware, no API server, no database.
- Content-defined chunking — Crab breaks files into variable-size chunks based on content boundaries, then deduplicates across your entire repository. Change one layer of a Docker image? Only the changed chunks get uploaded.
- Lazy checkout — clone pointer metadata first instead of downloading every large-file byte upfront. Files appear as lightweight pointers until you hydrate them or read through a FUSE mount.
Side-by-Side Comparison
Here's how the two tools compare on the dimensions that matter most for day-to-day use:
| Git LFS | Crab | |
|---|---|---|
| Server requirement | Dedicated LFS server (self-hosted or provider-managed) | None — files go directly to object storage |
| Deduplication | None — each file version stored in full | Content-defined chunking with cross-file dedup |
| Cost model | LFS server hosting + storage + bandwidth | Object storage costs only (no server fees) |
| Setup complexity | Install LFS, configure server endpoint, set .gitattributes | Install Crab, point at a bucket — done |
| Clone speed | Downloads all LFS files on checkout | Lazy — downloads only what you access |
| Provider lock-in | Tied to LFS server provider (GitHub, GitLab, etc.) | Any S3-compatible storage, GCS, or Azure |
| Git compatibility | Standard git commands + git lfs commands | Standard git commands plus optional crab maintenance commands |
| Max file size | Varies by provider (GitHub: 5 GB) | Bounded by object-storage and Crab upload limits, not a hosted LFS quota |
When to Use Git LFS
Git LFS is a solid choice when:
- Your team is already on GitHub/GitLab LFS and the built-in hosting meets your storage and bandwidth needs
- Files are modest in size (under a few GB) and you don't need deduplication
- You want zero setup — if your git host provides LFS, it's already configured
- Your workflow is simple — a few large files that don't change often
LFS is proven, widely supported, and requires no infrastructure decisions. For many teams, it's the right default.
When to Use Crab
Crab shines when:
- You're hitting LFS limits — bandwidth caps, storage quotas, or provider costs are adding up
- Files change frequently — model checkpoints, datasets, or build artifacts that update daily benefit from chunk-level deduplication
- You want to own your storage — bring your own S3 bucket with your own lifecycle policies, encryption, and access controls
- Repos are large — terabyte-scale repositories benefit from lazy checkout: clone pointer metadata first, then hydrate content on demand
- You don't want to run a server — no LFS endpoint to deploy, monitor, scale, or pay for
Migrating from Git LFS to Crab
If you're currently using Git LFS and want to try Crab, start with the compatibility layer rather than rewriting history on day one. Install Crab as the Git LFS transfer agent, point the repo at your bucket, and keep your existing .gitattributes patterns:
# Initialize Crab in an existing repository
crab init crab://my-bucket/my-repo
# Route Git LFS transfers through Crab
crab lfs install
# Optional: inspect history before any rewrite
crab lfs migrate info --pointersYour .gitattributes patterns keep working. When you're ready for a coordinated history rewrite, use crab lfs migrate import or crab lfs migrate export on a backup branch, then verify before force-pushing.
The Bottom Line
Git LFS and Crab solve the same fundamental problem — keeping large files out of git history — but they make different tradeoffs. LFS adds a server layer and stores files whole. Crab removes the server entirely and adds deduplication.
Neither is universally better. LFS is simpler to start with if your git host provides it. Crab is more cost-effective and flexible at scale, especially when files change often or repositories grow large.
The good news: you don't have to decide forever. Crab's LFS compatibility means you can start with LFS today and migrate later without disrupting your team.