Creating a Repository
A Crab repository is a standard git repository with a cloud storage backend for large files. Creating one takes a single command — Crab handles all the wiring between git and your bucket.
Quick Start
mkdir my-project && cd my-project
crab init crab://my-bucket/my-project
crab add .
git commit -m "initial commit"
git pushThat's it. Three commands from empty directory to files in the cloud. Or even simpler with crab ship:
mkdir my-project && cd my-project
crab init crab://my-bucket/my-project
crab ship . -m "initial commit"What You Need
Before creating a Crab repository, you need:
- A cloud storage bucket — S3, GCS, or Azure Blob Storage
- Credentials — Access to write to that bucket (AWS keys, GCP service account, or Azure credentials)
You do not need to run git init separately — crab init handles that automatically.
Initialize
cd my-project
crab init crab://my-bucket/my-projectThis single command:
- Runs
git initif no git repository exists - Creates the
.crab/directory with remote configuration - Registers the filter driver in
.git/config - Scans for large files and auto-tracks their extensions in
.gitattributes
Example output
Initialized git repository in /home/user/my-project
Detected large files — tracking: *.safetensors, *.bin, *.parquet
Auto-tracked 3 extension(s) in .gitattributesWhat Crab Creates
| Created | Purpose |
|---|---|
.git/ | Git repository (created if missing) |
.crab/remote | Stores the remote bucket URL |
.crab/config.toml | Local configuration (concurrency, cache settings) |
.crab/staging/ | Local staging area for chunks between add and push |
.git/config changes | Registers the Crab filter driver |
.gitattributes | Tracking rules for large file extensions (auto-generated) |
Auto-Tracking
crab init automatically detects files that should be managed by Crab:
- Files larger than 1 MiB trigger tracking for their extension
- Well-known binary formats (
.safetensors,.bin,.onnx,.parquet,.h5, etc.) are always tracked when found
This means you rarely need to manually run crab track — the common case is handled automatically. If you need to add more patterns later, use:
crab track '*.custom-extension'To skip auto-tracking (e.g., in CI or when you want full manual control):
crab init crab://my-bucket/my-project --no-auto-trackURL Format
Crab URLs follow the pattern crab://<bucket>/<repo-path>:
crab://my-bucket/my-project
crab://company-data/team-ml/experiment-42
crab://us-west-2-storage/repos/frontend-assetsThe <repo-path> isolates this repository's data within the bucket. Multiple repositories can share a single bucket with different paths.
Supported Cloud Providers
| Provider | URL Format | Example |
|---|---|---|
| AWS S3 | crab://<bucket>/<path> | crab://ml-data/models |
| Google Cloud Storage | crab://gs:<bucket>/<path> | crab://gs:ml-data/models |
| Azure Blob Storage | crab://az:<container>/<path> | crab://az:ml-data/models |
| S3-compatible (MinIO, R2, Ceph) | crab://<bucket>/<path> | crab://my-minio-bucket/data |
How Provider Detection Works
Crab identifies the storage backend from the prefix on the bucket name in the URL:
- No prefix → AWS S3 (or S3-compatible endpoint)
gs:prefix → Google Cloud Storageaz:prefix → Azure Blob Storage
When no prefix is present, Crab resolves the provider through a fallback chain:
- The
storage_providerfield in your local.crab/config.tomlunder[auth] - The
CRAB_STORAGE_PROVIDERenvironment variable - If neither is set, defaults to S3
[auth]
storage_provider = "gcs" # "s3" | "gcs" | "azure" | "auto"The environment variable accepts these values (case-insensitive):
| Value | Provider |
|---|---|
s3 (default) | AWS S3 / S3-compatible |
gcs, gs, google | Google Cloud Storage |
azure, az, abs | Azure Blob Storage |
S3-Compatible Endpoints
For S3-compatible stores like MinIO, Cloudflare R2, or Ceph, use the standard crab:// URL (no prefix) and set the custom endpoint via environment variable:
export AWS_ENDPOINT_URL=https://s3.example.com
crab init crab://my-bucket/my-projectThe S3 builder picks up AWS_ENDPOINT_URL automatically — no URL-level distinction is needed.
After Initialization
Once initialized, start adding files:
# Add your large files (auto-tracked extensions are already configured)
crab add .
# Commit and push
git commit -m "Initial commit with large files"
git pushOr use the one-shot shorthand:
crab ship . -m "Initial commit with large files"For Collaborators
When a teammate has already set up a Crab repository, you have two options:
Option A: crab clone (recommended)
crab clone crab://my-bucket/my-projectThis handles everything: git clone, filter driver, tracking rules, and optional hydration.
Option B: Global install + regular git clone
# One-time setup (works for all repos):
crab install --global
# Then clone normally:
git clone <url>With the global filter driver installed, any repo that has .gitattributes with filter=crab rules will work automatically.
Re-initialization
Running crab init on an already-initialized repository is safe and idempotent. It refreshes the filter driver configuration and re-scans for new large file extensions without overwriting your existing config or staging data.
Troubleshooting
"invalid URL" error — Ensure the URL follows crab://<bucket>/<path>. The bucket name must not contain slashes.
"access denied" error — Check that your cloud credentials have read/write access to the bucket. Run crab auth status to verify.
Filter driver not registered — Run crab doctor for a full health check. If the filter is missing, crab init will re-register it.
Next Steps
- Track file patterns — Manually configure additional patterns
- Add and push files — Stage tracked files for push
- Clone a repository — Share with collaborators