Creating a Repository

A Crab repository is a standard git repository with a cloud storage backend for large files. Creating one takes a single command — Crab handles all the wiring between git and your bucket.

Quick Start

mkdir my-project && cd my-project
crab init crab://my-bucket/my-project
crab add .
git commit -m "initial commit"
git push

That's it. Three commands from empty directory to files in the cloud. Or even simpler with crab ship:

mkdir my-project && cd my-project
crab init crab://my-bucket/my-project
crab ship . -m "initial commit"

What You Need

Before creating a Crab repository, you need:

A cloud storage bucket — S3, GCS, or Azure Blob Storage
Credentials — Access to write to that bucket (AWS keys, GCP service account, or Azure credentials)

You do not need to run git init separately — crab init handles that automatically.

Initialize

cd my-project
crab init crab://my-bucket/my-project

This single command:

Runs git init if no git repository exists
Creates the .crab/ directory with remote configuration
Registers the filter driver in .git/config
Scans for large files and auto-tracks their extensions in .gitattributes

Example output

Initialized git repository in /home/user/my-project
Detected large files — tracking: *.safetensors, *.bin, *.parquet
Auto-tracked 3 extension(s) in .gitattributes

What Crab Creates

Created	Purpose
`.git/`	Git repository (created if missing)
`.crab/remote`	Stores the remote bucket URL
`.crab/config.toml`	Local configuration (concurrency, cache settings)
`.crab/staging/`	Local staging area for chunks between add and push
`.git/config` changes	Registers the Crab filter driver
`.gitattributes`	Tracking rules for large file extensions (auto-generated)

Auto-Tracking

crab init automatically detects files that should be managed by Crab:

Files larger than 1 MiB trigger tracking for their extension
Well-known binary formats (.safetensors, .bin, .onnx, .parquet, .h5, etc.) are always tracked when found

This means you rarely need to manually run crab track — the common case is handled automatically. If you need to add more patterns later, use:

crab track '*.custom-extension'

To skip auto-tracking (e.g., in CI or when you want full manual control):

crab init crab://my-bucket/my-project --no-auto-track

URL Format

Crab URLs follow the pattern crab://<bucket>/<repo-path>:

crab://my-bucket/my-project
crab://company-data/team-ml/experiment-42
crab://us-west-2-storage/repos/frontend-assets

The <repo-path> isolates this repository's data within the bucket. Multiple repositories can share a single bucket with different paths.

Supported Cloud Providers

Provider	URL Format	Example
AWS S3	`crab://<bucket>/<path>`	`crab://ml-data/models`
Google Cloud Storage	`crab://gs:<bucket>/<path>`	`crab://gs:ml-data/models`
Azure Blob Storage	`crab://az:<container>/<path>`	`crab://az:ml-data/models`
S3-compatible (MinIO, R2, Ceph)	`crab://<bucket>/<path>`	`crab://my-minio-bucket/data`

How Provider Detection Works

Crab identifies the storage backend from the prefix on the bucket name in the URL:

No prefix → AWS S3 (or S3-compatible endpoint)
gs: prefix → Google Cloud Storage
az: prefix → Azure Blob Storage

When no prefix is present, Crab resolves the provider through a fallback chain:

The storage_provider field in your local .crab/config.toml under [auth]
The CRAB_STORAGE_PROVIDER environment variable
If neither is set, defaults to S3

.crab/config.toml

[auth]
storage_provider = "gcs"   # "s3" | "gcs" | "azure" | "auto"

The environment variable accepts these values (case-insensitive):

Value	Provider
`s3` (default)	AWS S3 / S3-compatible
`gcs`, `gs`, `google`	Google Cloud Storage
`azure`, `az`, `abs`	Azure Blob Storage

S3-Compatible Endpoints

For S3-compatible stores like MinIO, Cloudflare R2, or Ceph, use the standard crab:// URL (no prefix) and set the custom endpoint via environment variable:

export AWS_ENDPOINT_URL=https://s3.example.com
crab init crab://my-bucket/my-project

The S3 builder picks up AWS_ENDPOINT_URL automatically — no URL-level distinction is needed.

After Initialization

Once initialized, start adding files:

# Add your large files (auto-tracked extensions are already configured)
crab add .

# Commit and push
git commit -m "Initial commit with large files"
git push

Or use the one-shot shorthand:

crab ship . -m "Initial commit with large files"

For Collaborators

When a teammate has already set up a Crab repository, you have two options:

Option A: `crab clone` (recommended)

crab clone crab://my-bucket/my-project

This handles everything: git clone, filter driver, tracking rules, and optional hydration.

Option B: Global install + regular git clone

# One-time setup (works for all repos):
crab install --global

# Then clone normally:
git clone <url>

With the global filter driver installed, any repo that has .gitattributes with filter=crab rules will work automatically.

Running crab init on an already-initialized repository is safe and idempotent. It refreshes the filter driver configuration and re-scans for new large file extensions without overwriting your existing config or staging data.

Troubleshooting

"invalid URL" error — Ensure the URL follows crab://<bucket>/<path>. The bucket name must not contain slashes.

"access denied" error — Check that your cloud credentials have read/write access to the bucket. Run crab auth status to verify.

Filter driver not registered — Run crab doctor for a full health check. If the filter is missing, crab init will re-register it.

Next Steps

Track file patterns — Manually configure additional patterns
Add and push files — Stage tracked files for push
Clone a repository — Share with collaborators

Creating a Repository

A Crab repository is a standard git repository with a cloud storage backend for large files. Creating one takes a single command — Crab handles all the wiring between git and your bucket.

Quick Start

mkdir my-project && cd my-project
crab init crab://my-bucket/my-project
crab add .
git commit -m "initial commit"
git push

That's it. Three commands from empty directory to files in the cloud. Or even simpler with crab ship:

mkdir my-project && cd my-project
crab init crab://my-bucket/my-project
crab ship . -m "initial commit"

What You Need

Before creating a Crab repository, you need:

A cloud storage bucket — S3, GCS, or Azure Blob Storage
Credentials — Access to write to that bucket (AWS keys, GCP service account, or Azure credentials)

You do not need to run git init separately — crab init handles that automatically.

Initialize

cd my-project
crab init crab://my-bucket/my-project

This single command:

Runs git init if no git repository exists
Creates the .crab/ directory with remote configuration
Registers the filter driver in .git/config
Scans for large files and auto-tracks their extensions in .gitattributes

Example output

Initialized git repository in /home/user/my-project
Detected large files — tracking: *.safetensors, *.bin, *.parquet
Auto-tracked 3 extension(s) in .gitattributes

What Crab Creates

Created	Purpose
`.git/`	Git repository (created if missing)
`.crab/remote`	Stores the remote bucket URL
`.crab/config.toml`	Local configuration (concurrency, cache settings)
`.crab/staging/`	Local staging area for chunks between add and push
`.git/config` changes	Registers the Crab filter driver
`.gitattributes`	Tracking rules for large file extensions (auto-generated)

Auto-Tracking

crab init automatically detects files that should be managed by Crab:

Files larger than 1 MiB trigger tracking for their extension
Well-known binary formats (.safetensors, .bin, .onnx, .parquet, .h5, etc.) are always tracked when found

This means you rarely need to manually run crab track — the common case is handled automatically. If you need to add more patterns later, use:

crab track '*.custom-extension'

To skip auto-tracking (e.g., in CI or when you want full manual control):

crab init crab://my-bucket/my-project --no-auto-track

URL Format

Crab URLs follow the pattern crab://<bucket>/<repo-path>:

crab://my-bucket/my-project
crab://company-data/team-ml/experiment-42
crab://us-west-2-storage/repos/frontend-assets

The <repo-path> isolates this repository's data within the bucket. Multiple repositories can share a single bucket with different paths.

Supported Cloud Providers

Provider	URL Format	Example
AWS S3	`crab://<bucket>/<path>`	`crab://ml-data/models`
Google Cloud Storage	`crab://gs:<bucket>/<path>`	`crab://gs:ml-data/models`
Azure Blob Storage	`crab://az:<container>/<path>`	`crab://az:ml-data/models`
S3-compatible (MinIO, R2, Ceph)	`crab://<bucket>/<path>`	`crab://my-minio-bucket/data`

How Provider Detection Works

Crab identifies the storage backend from the prefix on the bucket name in the URL:

No prefix → AWS S3 (or S3-compatible endpoint)
gs: prefix → Google Cloud Storage
az: prefix → Azure Blob Storage

When no prefix is present, Crab resolves the provider through a fallback chain:

The storage_provider field in your local .crab/config.toml under [auth]
The CRAB_STORAGE_PROVIDER environment variable
If neither is set, defaults to S3

.crab/config.toml

[auth]
storage_provider = "gcs"   # "s3" | "gcs" | "azure" | "auto"

The environment variable accepts these values (case-insensitive):

Value	Provider
`s3` (default)	AWS S3 / S3-compatible
`gcs`, `gs`, `google`	Google Cloud Storage
`azure`, `az`, `abs`	Azure Blob Storage

S3-Compatible Endpoints

For S3-compatible stores like MinIO, Cloudflare R2, or Ceph, use the standard crab:// URL (no prefix) and set the custom endpoint via environment variable:

export AWS_ENDPOINT_URL=https://s3.example.com
crab init crab://my-bucket/my-project

The S3 builder picks up AWS_ENDPOINT_URL automatically — no URL-level distinction is needed.

After Initialization

Once initialized, start adding files:

# Add your large files (auto-tracked extensions are already configured)
crab add .

# Commit and push
git commit -m "Initial commit with large files"
git push

Or use the one-shot shorthand:

crab ship . -m "Initial commit with large files"

For Collaborators

When a teammate has already set up a Crab repository, you have two options:

Option A: `crab clone` (recommended)

crab clone crab://my-bucket/my-project

This handles everything: git clone, filter driver, tracking rules, and optional hydration.

Option B: Global install + regular git clone

# One-time setup (works for all repos):
crab install --global

# Then clone normally:
git clone <url>

With the global filter driver installed, any repo that has .gitattributes with filter=crab rules will work automatically.

Re-initialization

Troubleshooting

"invalid URL" error — Ensure the URL follows crab://<bucket>/<path>. The bucket name must not contain slashes.

"access denied" error — Check that your cloud credentials have read/write access to the bucket. Run crab auth status to verify.

Filter driver not registered — Run crab doctor for a full health check. If the filter is missing, crab init will re-register it.

Next Steps

Track file patterns — Manually configure additional patterns
Add and push files — Stage tracked files for push
Clone a repository — Share with collaborators

Creating a Repository

Quick Start

What You Need

Initialize

Example output

What Crab Creates

Auto-Tracking

URL Format

Supported Cloud Providers

How Provider Detection Works

S3-Compatible Endpoints

After Initialization

For Collaborators

Option A: `crab clone` (recommended)

Option B: Global install + regular git clone

Re-initialization

Troubleshooting

Next Steps

On this page

Creating a Repository

Quick Start

What You Need

Initialize

Example output

What Crab Creates

Auto-Tracking

URL Format

Supported Cloud Providers

How Provider Detection Works

S3-Compatible Endpoints

After Initialization

For Collaborators

Option A: `crab clone` (recommended)

Option B: Global install + regular git clone

Re-initialization

Troubleshooting

Next Steps

On this page