Operations

Operate the cache service as a performance-critical dependency, not as the source of truth. Origin remains authoritative, but a cold or unavailable cache can still affect developer and CI productivity.

Capacity Planning

Size the cache for the active working set: the unique data your team reuses during a normal 7- to 30-day window.

Starting points:

Team profile	Suggested cache size
Small team, one large repo	256-500 GiB
Mid-size team, multiple repos	1 TiB
Large team or active CI fleet	4-8 TiB

If hit rate is low and eviction is frequent, increase cache size before scaling out.

Eviction

The service evicts old data when it approaches the configured size budget.

Defaults:

[eviction]
high_water_ratio = 0.95
low_water_ratio = 0.90

Lower both values if:

Push warming frequently arrives in bursts.
The cache disk has little free space beyond the configured budget.
You see storage-pressure errors.

Scaling

Prefer vertical scaling first:

Increase cache disk.
Increase network bandwidth.
Increase CPU if TLS or concurrent traffic is the bottleneck.

Use horizontal scaling when you have:

Multiple regions.
Separate business units or tenants.
Very large working sets that should not share one cache.

Run separate cache instances for isolation. Do not share one cache volume between instances.

Multi-Region Deployments

Deploy one cache per region and point clients to their nearest cache. Keep each cache close to the object-store replica it reads from.

If object-store replication is eventual, expect a newly pushed object to miss in another region until the origin replica catches up.

Backup And Recovery

Do not back up the cache volume by default. The cache can be rebuilt from origin.

Recovery procedure:

Start a new cache service with an empty cache directory.
Keep clients pointed at the service.
Allow normal reads and pushes to warm the cache.
Watch hit rate and origin traffic until they return to normal.

Expect elevated origin reads and uploads while the cache warms.

Upgrades

Recommended upgrade flow:

Remove the instance from readiness rotation.
Stop the service or roll the pod.
Start the new binary with the same config and cache volume.
Verify health, metrics, and admin stats.
Return the instance to service.

For Kubernetes with a single persistent volume, use a recreate strategy rather than a rolling update.

Operational Checklist

Weekly:

Review hit rate.
Review cache utilization.
Review origin miss volume.
Review auth failures.

Monthly:

Revisit cache size against active repo growth.
Rotate PSK or tokens according to your security policy.
Review policy rules.
Confirm CI and developer clients still point at the expected cache.

Operations

Capacity Planning

Size the cache for the active working set: the unique data your team reuses during a normal 7- to 30-day window.

Starting points:

Team profile	Suggested cache size
Small team, one large repo	256-500 GiB
Mid-size team, multiple repos	1 TiB
Large team or active CI fleet	4-8 TiB

If hit rate is low and eviction is frequent, increase cache size before scaling out.

Eviction

The service evicts old data when it approaches the configured size budget.

Defaults:

[eviction]
high_water_ratio = 0.95
low_water_ratio = 0.90

Lower both values if:

Push warming frequently arrives in bursts.
The cache disk has little free space beyond the configured budget.
You see storage-pressure errors.

Scaling

Prefer vertical scaling first:

Increase cache disk.
Increase network bandwidth.
Increase CPU if TLS or concurrent traffic is the bottleneck.

Use horizontal scaling when you have:

Multiple regions.
Separate business units or tenants.
Very large working sets that should not share one cache.

Run separate cache instances for isolation. Do not share one cache volume between instances.

Multi-Region Deployments

Deploy one cache per region and point clients to their nearest cache. Keep each cache close to the object-store replica it reads from.

If object-store replication is eventual, expect a newly pushed object to miss in another region until the origin replica catches up.

Backup And Recovery

Do not back up the cache volume by default. The cache can be rebuilt from origin.

Recovery procedure:

Start a new cache service with an empty cache directory.
Keep clients pointed at the service.
Allow normal reads and pushes to warm the cache.
Watch hit rate and origin traffic until they return to normal.

Expect elevated origin reads and uploads while the cache warms.

Upgrades

Recommended upgrade flow:

Remove the instance from readiness rotation.
Stop the service or roll the pod.
Start the new binary with the same config and cache volume.
Verify health, metrics, and admin stats.
Return the instance to service.

For Kubernetes with a single persistent volume, use a recreate strategy rather than a rolling update.

Operational Checklist

Weekly:

Review hit rate.
Review cache utilization.
Review origin miss volume.
Review auth failures.

Monthly:

Revisit cache size against active repo growth.
Rotate PSK or tokens according to your security policy.
Review policy rules.
Confirm CI and developer clients still point at the expected cache.

Operations

Capacity Planning

Eviction

Scaling

Multi-Region Deployments

Backup And Recovery

Upgrades

Operational Checklist

On this page

Operations

Capacity Planning

Eviction

Scaling

Multi-Region Deployments

Backup And Recovery

Upgrades

Operational Checklist

On this page