crab init
Initialize a new Crab repository.
Synopsis
crab init [OPTIONS] <URL>Description
crab init sets up a git repository for Crab by configuring the remote helper,
filter driver, and cloud storage backend. It creates the .crab/ directory,
registers the crab filter in .git/config, and optionally auto-detects large
files to track.
If no git repository exists in the current directory, crab init automatically
runs git init first — no need to initialize git separately.
After writing the configuration, crab init scans the working tree for large
files (above 1 MiB) and files with well-known binary extensions (.safetensors,
.bin, .onnx, .parquet, etc.) and automatically adds tracking rules to
.gitattributes. This eliminates the most common setup mistake of forgetting
to run crab track before crab add.
For conceptual background, see Creating a Repository.
Arguments
| Argument | Required | Description |
|---|---|---|
<URL> | Yes | Cloud storage URL (e.g. crab://my-bucket/repo, s3://bucket/path) |
Options
| Option | Default | Description |
|---|---|---|
--no-auto-track | false | Skip auto-detection and tracking of large file extensions |
--log-level | — | Set log verbosity (error, warn, info, debug, trace) |
What It Does
- Creates a git repository (if
.gitdoesn't exist) - Creates
.crab/config.tomlwith the remote URL - Registers the filter driver in
.git/config - Scans for large files and auto-tracks their extensions in
.gitattributes
Examples
New project from scratch
mkdir my-ml-project && cd my-ml-project
crab init crab://my-bucket/ml-models
# ✓ Initialized git repository
# ✓ Created .crab/config.toml
# ✓ Registered filter.crab driver
# Detected large files — tracking: *.safetensors, *.binExisting git repo
cd my-existing-repo
crab init s3://team-bucket/my-repo
# ✓ Created .crab/config.toml
# ✓ Registered filter.crab driver
# Detected large files — tracking: *.parquet, *.h5Skip auto-tracking
crab init crab://bucket/repo --no-auto-track
# Only creates config + registers filter, no .gitattributes changesAuto-Tracked Extensions
When auto-tracking is enabled (the default), crab init tracks extensions that meet either criterion:
- Size threshold: Any file above 1 MiB triggers tracking for its extension
- Well-known binary formats: These extensions are always tracked when found, regardless of size:
| Domain | Extensions |
|---|---|
| ML/AI | .safetensors, .bin, .onnx, .pt, .pth, .h5, .hdf5, .pkl |
| Data | .parquet, .arrow, .feather, .npy, .npz, .zarr |
| Media | .fbx, .blend, .psd, .tiff, .exr, .dpx, .mov, .mp4, .wav |
| Archives | .tar, .gz, .zip, .zst, .lz4 |
| Databases | .db, .sqlite, .sqlite3 |
URL Format
Crab URLs follow the pattern crab://<bucket>/<repo-path>:
crab://my-bucket/my-project
crab://company-data/team-ml/experiment-42
s3://us-west-2-storage/repos/frontend-assetsRelated Commands
crab clone— clone an existing repositorycrab track— manually configure file patternscrab install— install filter driver globallycrab ship— one-shot add + commit + push