01 Install & Init
One command to install, one to connect your bucket. Crab auto-detects large files and configures git tracking.
$ crab init s3://my-bucket/repoCrab is a serverless Git remote storage solution powered by the Xet protocol for chunk-level deduplication — version huge binaries straight into your S3, GCS, or Azure bucket.
Serverless Git for Large Files
git remote helper
Three steps to version and store your large files — no servers, no proprietary lock-in.
One command to install, one to connect your bucket. Crab auto-detects large files and configures git tracking.
$ crab init s3://my-bucket/repoA single command stages, commits, and pushes. Crab handles chunking, deduplication, and parallel upload.
$ crab ship . -m "add model v2"Files are deduplicated and packed into xorbs in your own S3, GCS, or Azure bucket. No server needed.
$ crab://my-bucket/repo ✓Core Features
One serverless toolchain across the CLI, desktop app, and cloud — covering large-file Git, deduplicated storage, AI agents, and ML pipelines.
See how Crab compares to other popular tools for managing large files, datasets, and ML assets alongside your code.
| Feature | Crab | Git LFS | DVC | HF Hub |
|---|---|---|---|---|
| Chunk-level deduplication | ||||
| Serverless (no infra) | ||||
| Standard Git CLI | Partial | |||
| Desktop App | ||||
| FUSE virtual filesystem | ||||
| ML pipeline engine | ||||
| Cloud-native storage | ||||
| Open source | Planned | Partial |
Interactive Demo
Two common workflows — starting fresh or cloning an existing repo. Full end-to-end from init to push.
Architecture
Files are split at natural content boundaries using Gearhash CDC. Duplicate chunks are identified across three tiers — minimizing storage costs even for large binary datasets.
“Chunk-level dedup turned our 1,212 GB checkpoint repo into a 10 GB push. We swapped Git LFS for Crab in an afternoon and never looked back — pulls are instant and the bucket bill dropped by an order of magnitude.”
Everything you need to know about Crab
Crab is a serverless Git remote storage solution powered by the Xet protocol. It acts as a Git remote helper and filter driver, enabling you to version any file — regardless of size — directly into your own cloud storage bucket (S3, GCS, or Azure Blob).
Git LFS stores files as whole objects on a dedicated server you have to run and scale. Crab uses content-defined chunking (CDC) to split files at natural byte boundaries and deduplicates at the chunk level. There's no server — data goes straight to your cloud bucket. Identical chunks are uploaded exactly once, even across different files and repos.
Crab supports Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage. Your data stays in your bucket under your control. We also support S3-compatible endpoints like MinIO and Cloudflare R2.
Crab uses Gearhash content-defined chunking (CDC) to split files at natural content boundaries. Each chunk is hashed and checked against a 3-tier deduplication index (session, shard, and database). Chunks that already exist on the remote are skipped — only truly new data is uploaded. This typically saves 50–90% on storage for versioned binary files.
Absolutely. Crab works with standard Git commands — git clone, add, commit, push. You add a Crab remote to your existing repo and push. If you're migrating from Git LFS, Crab can detect LFS pointers and store those files alongside xorbs without re-uploading.
Install Crab with a single command: brew install crabbuild/tap/crab on macOS, or cargo install crab on any platform. Then run crab init in your repo to configure a cloud remote. Check out our docs for a complete quickstart guide.
Quickstart
Install the Crab CLI for your platform and initialize your first serverless remote bucket.
# Install using Homebrew$ brew install crabbuild/tap/crab==> Fetching crabbuild/tap/crab...==> Installed crab v0.8.4 # Initialize Crab in your repository$ crab init --bucket my-s3-bucket --region us-west-2Initialized serverless Crab remote.Install the Crab CLI in minutes, or grab the desktop app for a full GUI with previews, integrated terminal, and remote workspaces.
Built for scale
Numbers that hold up under big repositories
Crab is engineered for ML model weights, datasets, and game assets that break Git LFS. Every metric below comes from the shipping CLI.
SIMD-accelerated Gearhash CDC throughput on a single core.
Concurrent xorb transfers saturate cloud bandwidth on push.
Session, shard, and database index — chunks uploaded once.
Packed chunks balance Range GET cost against round trips.