Project Configuration (`.crab.toml`)

The .crab.toml file lives in your repository root and declares how Crab should behave for this project. It's committed to git so every collaborator inherits the same configuration.

Full Schema

# Required: cloud storage location for xorbs and manifests
[remote]
url = "crab://my-bucket/my-repo"

# Optional: file patterns to track with Crab
[track]
patterns = ["*.bin", "*.safetensors", "*.parquet", "datasets/**"]

# Optional: hydration behavior on clone/checkout
[hydrate]
default = "lazy"                    # "lazy" | "eager"
auto_patterns = ["*.py", "*.rs", "*.toml", "README*", "LICENSE*"]

# Optional: mirror mode (GitHub + Crab coexistence)
[mirror]
origin_remote = "origin"
crab_remote = "crab"

# Optional: credential hints
[auth]
provider = "aws"                    # "aws" | "gcp" | "azure"
profile = "my-profile"

Sections

`[remote]` (required)

The only required section. Specifies where Crab stores chunked file data.

[remote]
url = "crab://my-bucket/my-repo"

Supported URL schemes:

crab:// — AWS S3 (or S3-compatible)
gs:// — Google Cloud Storage
az:// — Azure Blob Storage

`[track]` (optional)

Declares which file patterns Crab manages. These are synced to .gitattributes with the filter=crab attribute.

[track]
patterns = ["*.bin", "*.safetensors", "*.onnx", "datasets/**"]

If omitted, crab init auto-detects large files (>1 MiB) and well-known binary extensions.

`[hydrate]` (optional)

Controls what happens after clone or checkout.

[hydrate]
default = "lazy"
auto_patterns = ["*.py", "*.rs", "*.toml", "README*"]

Field	Values	Default	Description
`default`	`"lazy"`, `"eager"`	`"lazy"`	Whether to hydrate all files on clone
`auto_patterns`	Array of globs	`[]`	Always hydrate these patterns regardless of default

With default = "lazy", files remain as pointers until explicitly hydrated. With default = "eager", all tracked files are hydrated immediately after clone.

`[mirror]` (optional)

Enables mirror mode for GitHub/GitLab + Crab coexistence. See Mirror Mode for the full guide.

[mirror]
origin_remote = "origin"
crab_remote = "crab"

When present, crab init (re-apply mode) installs pre-push and post-checkout hooks automatically.

`[auth]` (optional)

Explicit credential hints. Rarely needed — Crab's credential discovery chain finds credentials automatically from environment variables, cloud SDK configs, and instance metadata.

[auth]
provider = "aws"
profile = "ml-team"

Use this when your team uses a non-default AWS profile or needs to override auto-detection.

Precedence

Configuration resolves in this order (highest priority first):

CLI flags — --pattern, --eager, --mirror, etc.
.crab.toml — project-level defaults
Built-in defaults — lazy hydration, auto-detection

`.crab.toml` vs `.crab/config.toml`

	`.crab.toml`	`.crab/config.toml`
Location	Repo root	`.crab/` directory
Purpose	Project config (shared)	Internal state (local)
Committed to git	Yes	No
Edited by	Users, `crab init`	Crab automatically

Think of .crab.toml as "what this repo needs" and .crab/config.toml as "what this machine has done."

Examples

ML Repository

[remote]
url = "crab://ml-artifacts/bert-finetune"

[track]
patterns = ["*.safetensors", "*.bin", "*.onnx", "*.pt", "datasets/**"]

[hydrate]
default = "lazy"
auto_patterns = ["*.py", "*.yaml", "requirements.txt", "README*"]

Collaborators clone instantly (pointers only), then hydrate the specific model checkpoint they need.

Monorepo with Large Assets

[remote]
url = "crab://company-assets/monorepo"

[track]
patterns = [
  "assets/**/*.psd",
  "assets/**/*.fbx",
  "assets/**/*.blend",
  "builds/**"
]

[hydrate]
default = "lazy"
auto_patterns = ["*.ts", "*.tsx", "*.json", "*.md", "*.css"]

Developers get code hydrated immediately. Designers hydrate asset files on demand.

Mirror Mode (GitHub + Crab)

[remote]
url = "crab://team-bucket/our-project"

[track]
patterns = ["*.bin", "*.parquet", "models/**"]

[mirror]
origin_remote = "origin"
crab_remote = "crab"

[hydrate]
default = "lazy"
auto_patterns = ["*.py", "*.rs", "*.toml"]

[auth]
provider = "aws"
profile = "ml-team"

Code goes to GitHub via origin. Large files go to S3 via crab. The team's PR workflow stays unchanged.

How It's Generated

crab init <url> creates .crab.toml automatically with:

[remote] from the provided URL
[track] from auto-detected large file patterns
[mirror] if --mirror flag was used

You can edit it manually afterward to add [hydrate] or [auth] sections.

Mirror Mode — GitHub + Crab coexistence
crab init — generates .crab.toml
crab track — updates tracked file patterns

Project Configuration (`.crab.toml`)

The .crab.toml file lives in your repository root and declares how Crab should behave for this project. It's committed to git so every collaborator inherits the same configuration.

Full Schema

# Required: cloud storage location for xorbs and manifests
[remote]
url = "crab://my-bucket/my-repo"

# Optional: file patterns to track with Crab
[track]
patterns = ["*.bin", "*.safetensors", "*.parquet", "datasets/**"]

# Optional: hydration behavior on clone/checkout
[hydrate]
default = "lazy"                    # "lazy" | "eager"
auto_patterns = ["*.py", "*.rs", "*.toml", "README*", "LICENSE*"]

# Optional: mirror mode (GitHub + Crab coexistence)
[mirror]
origin_remote = "origin"
crab_remote = "crab"

# Optional: credential hints
[auth]
provider = "aws"                    # "aws" | "gcp" | "azure"
profile = "my-profile"

Sections

`[remote]` (required)

The only required section. Specifies where Crab stores chunked file data.

[remote]
url = "crab://my-bucket/my-repo"

Supported URL schemes:

crab:// — AWS S3 (or S3-compatible)
gs:// — Google Cloud Storage
az:// — Azure Blob Storage

`[track]` (optional)

Declares which file patterns Crab manages. These are synced to .gitattributes with the filter=crab attribute.

[track]
patterns = ["*.bin", "*.safetensors", "*.onnx", "datasets/**"]

If omitted, crab init auto-detects large files (>1 MiB) and well-known binary extensions.

`[hydrate]` (optional)

Controls what happens after clone or checkout.

[hydrate]
default = "lazy"
auto_patterns = ["*.py", "*.rs", "*.toml", "README*"]

Field	Values	Default	Description
`default`	`"lazy"`, `"eager"`	`"lazy"`	Whether to hydrate all files on clone
`auto_patterns`	Array of globs	`[]`	Always hydrate these patterns regardless of default

With default = "lazy", files remain as pointers until explicitly hydrated. With default = "eager", all tracked files are hydrated immediately after clone.

`[mirror]` (optional)

Enables mirror mode for GitHub/GitLab + Crab coexistence. See Mirror Mode for the full guide.

[mirror]
origin_remote = "origin"
crab_remote = "crab"

When present, crab init (re-apply mode) installs pre-push and post-checkout hooks automatically.

`[auth]` (optional)

Explicit credential hints. Rarely needed — Crab's credential discovery chain finds credentials automatically from environment variables, cloud SDK configs, and instance metadata.

[auth]
provider = "aws"
profile = "ml-team"

Use this when your team uses a non-default AWS profile or needs to override auto-detection.

Precedence

Configuration resolves in this order (highest priority first):

CLI flags — --pattern, --eager, --mirror, etc.
.crab.toml — project-level defaults
Built-in defaults — lazy hydration, auto-detection

`.crab.toml` vs `.crab/config.toml`

	`.crab.toml`	`.crab/config.toml`
Location	Repo root	`.crab/` directory
Purpose	Project config (shared)	Internal state (local)
Committed to git	Yes	No
Edited by	Users, `crab init`	Crab automatically

Think of .crab.toml as "what this repo needs" and .crab/config.toml as "what this machine has done."

Examples

ML Repository

[remote]
url = "crab://ml-artifacts/bert-finetune"

[track]
patterns = ["*.safetensors", "*.bin", "*.onnx", "*.pt", "datasets/**"]

[hydrate]
default = "lazy"
auto_patterns = ["*.py", "*.yaml", "requirements.txt", "README*"]

Collaborators clone instantly (pointers only), then hydrate the specific model checkpoint they need.

Monorepo with Large Assets

[remote]
url = "crab://company-assets/monorepo"

[track]
patterns = [
  "assets/**/*.psd",
  "assets/**/*.fbx",
  "assets/**/*.blend",
  "builds/**"
]

[hydrate]
default = "lazy"
auto_patterns = ["*.ts", "*.tsx", "*.json", "*.md", "*.css"]

Developers get code hydrated immediately. Designers hydrate asset files on demand.

Mirror Mode (GitHub + Crab)

[remote]
url = "crab://team-bucket/our-project"

[track]
patterns = ["*.bin", "*.parquet", "models/**"]

[mirror]
origin_remote = "origin"
crab_remote = "crab"

[hydrate]
default = "lazy"
auto_patterns = ["*.py", "*.rs", "*.toml"]

[auth]
provider = "aws"
profile = "ml-team"

Code goes to GitHub via origin. Large files go to S3 via crab. The team's PR workflow stays unchanged.

How It's Generated

crab init <url> creates .crab.toml automatically with:

[remote] from the provided URL
[track] from auto-detected large file patterns
[mirror] if --mirror flag was used

You can edit it manually afterward to add [hydrate] or [auth] sections.

Mirror Mode — GitHub + Crab coexistence
crab init — generates .crab.toml
crab track — updates tracked file patterns

Project Configuration (`.crab.toml`)

Full Schema

Sections

`[remote]` (required)

`[track]` (optional)

`[hydrate]` (optional)

`[mirror]` (optional)

`[auth]` (optional)

Precedence

`.crab.toml` vs `.crab/config.toml`

Examples

ML Repository

Monorepo with Large Assets

Mirror Mode (GitHub + Crab)

How It's Generated

On this page

Project Configuration (`.crab.toml`)

Full Schema

Sections

`[remote]` (required)

`[track]` (optional)

`[hydrate]` (optional)

`[mirror]` (optional)

`[auth]` (optional)

Precedence

`.crab.toml` vs `.crab/config.toml`

Examples

ML Repository

Monorepo with Large Assets

Mirror Mode (GitHub + Crab)

How It's Generated

On this page