crab init

Initialize a new Crab repository.

Synopsis

crab init [OPTIONS] <URL>

crab init sets up a git repository for Crab by configuring the remote helper, filter driver, and cloud storage backend. It creates the .crab/ directory, registers the crab filter in .git/config, and optionally auto-detects large files to track.

If no git repository exists in the current directory, crab init automatically runs git init first — no need to initialize git separately.

After writing the configuration, crab init scans the working tree for large files (above 1 MiB) and files with well-known binary extensions (.safetensors, .bin, .onnx, .parquet, etc.) and automatically adds tracking rules to .gitattributes. This eliminates the most common setup mistake of forgetting to run crab track before crab add.

For conceptual background, see Creating a Repository.

Arguments

Argument	Required	Description
`<URL>`	Yes	Cloud storage URL (e.g. `crab://my-bucket/repo`, `s3://bucket/path`)

Options

Option	Default	Description
`--no-auto-track`	`false`	Skip auto-detection and tracking of large file extensions
`--log-level`	—	Set log verbosity (`error`, `warn`, `info`, `debug`, `trace`)

What It Does

Creates a git repository (if .git doesn't exist)
Creates .crab/config.toml with the remote URL
Registers the filter driver in .git/config
Scans for large files and auto-tracks their extensions in .gitattributes

Examples

New project from scratch

mkdir my-ml-project && cd my-ml-project
crab init crab://my-bucket/ml-models
# ✓ Initialized git repository
# ✓ Created .crab/config.toml
# ✓ Registered filter.crab driver
# Detected large files — tracking: *.safetensors, *.bin

Existing git repo

cd my-existing-repo
crab init s3://team-bucket/my-repo
# ✓ Created .crab/config.toml
# ✓ Registered filter.crab driver
# Detected large files — tracking: *.parquet, *.h5

Skip auto-tracking

crab init crab://bucket/repo --no-auto-track
# Only creates config + registers filter, no .gitattributes changes

Auto-Tracked Extensions

When auto-tracking is enabled (the default), crab init tracks extensions that meet either criterion:

Size threshold: Any file above 1 MiB triggers tracking for its extension
Well-known binary formats: These extensions are always tracked when found, regardless of size:

Domain	Extensions
ML/AI	`.safetensors`, `.bin`, `.onnx`, `.pt`, `.pth`, `.h5`, `.hdf5`, `.pkl`
Data	`.parquet`, `.arrow`, `.feather`, `.npy`, `.npz`, `.zarr`
Media	`.fbx`, `.blend`, `.psd`, `.tiff`, `.exr`, `.dpx`, `.mov`, `.mp4`, `.wav`
Archives	`.tar`, `.gz`, `.zip`, `.zst`, `.lz4`
Databases	`.db`, `.sqlite`, `.sqlite3`

URL Format

Crab URLs follow the pattern crab://<bucket>/<repo-path>:

crab://my-bucket/my-project
crab://company-data/team-ml/experiment-42
s3://us-west-2-storage/repos/frontend-assets

crab clone — clone an existing repository
crab track — manually configure file patterns
crab install — install filter driver globally
crab ship — one-shot add + commit + push

crab init

Initialize a new Crab repository.

Synopsis

crab init [OPTIONS] <URL>

Description

If no git repository exists in the current directory, crab init automatically runs git init first — no need to initialize git separately.

For conceptual background, see Creating a Repository.

Arguments

Argument	Required	Description
`<URL>`	Yes	Cloud storage URL (e.g. `crab://my-bucket/repo`, `s3://bucket/path`)

Options

Option	Default	Description
`--no-auto-track`	`false`	Skip auto-detection and tracking of large file extensions
`--log-level`	—	Set log verbosity (`error`, `warn`, `info`, `debug`, `trace`)

What It Does

Creates a git repository (if .git doesn't exist)
Creates .crab/config.toml with the remote URL
Registers the filter driver in .git/config
Scans for large files and auto-tracks their extensions in .gitattributes

Examples

New project from scratch

mkdir my-ml-project && cd my-ml-project
crab init crab://my-bucket/ml-models
# ✓ Initialized git repository
# ✓ Created .crab/config.toml
# ✓ Registered filter.crab driver
# Detected large files — tracking: *.safetensors, *.bin

Existing git repo

cd my-existing-repo
crab init s3://team-bucket/my-repo
# ✓ Created .crab/config.toml
# ✓ Registered filter.crab driver
# Detected large files — tracking: *.parquet, *.h5

Skip auto-tracking

crab init crab://bucket/repo --no-auto-track
# Only creates config + registers filter, no .gitattributes changes

Auto-Tracked Extensions

When auto-tracking is enabled (the default), crab init tracks extensions that meet either criterion:

Size threshold: Any file above 1 MiB triggers tracking for its extension
Well-known binary formats: These extensions are always tracked when found, regardless of size:

Domain	Extensions
ML/AI	`.safetensors`, `.bin`, `.onnx`, `.pt`, `.pth`, `.h5`, `.hdf5`, `.pkl`
Data	`.parquet`, `.arrow`, `.feather`, `.npy`, `.npz`, `.zarr`
Media	`.fbx`, `.blend`, `.psd`, `.tiff`, `.exr`, `.dpx`, `.mov`, `.mp4`, `.wav`
Archives	`.tar`, `.gz`, `.zip`, `.zst`, `.lz4`
Databases	`.db`, `.sqlite`, `.sqlite3`

URL Format

Crab URLs follow the pattern crab://<bucket>/<repo-path>:

crab://my-bucket/my-project
crab://company-data/team-ml/experiment-42
s3://us-west-2-storage/repos/frontend-assets

crab clone — clone an existing repository
crab track — manually configure file patterns
crab install — install filter driver globally
crab ship — one-shot add + commit + push

crab init

Synopsis

Description

Arguments

Options

What It Does

Examples

New project from scratch

Existing git repo

Skip auto-tracking

Auto-Tracked Extensions

URL Format

On this page

crab init

Synopsis

Description

Arguments

Options

What It Does

Examples

New project from scratch

Existing git repo

Skip auto-tracking

Auto-Tracked Extensions

URL Format

On this page