Operations
Operate the cache service as a performance-critical dependency, not as the source of truth. Origin remains authoritative, but a cold or unavailable cache can still affect developer and CI productivity.
Capacity Planning
Size the cache for the active working set: the unique data your team reuses during a normal 7- to 30-day window.
Starting points:
| Team profile | Suggested cache size |
|---|---|
| Small team, one large repo | 256-500 GiB |
| Mid-size team, multiple repos | 1 TiB |
| Large team or active CI fleet | 4-8 TiB |
If hit rate is low and eviction is frequent, increase cache size before scaling out.
Eviction
The service evicts old data when it approaches the configured size budget.
Defaults:
[eviction]
high_water_ratio = 0.95
low_water_ratio = 0.90Lower both values if:
- Push warming frequently arrives in bursts.
- The cache disk has little free space beyond the configured budget.
- You see storage-pressure errors.
Scaling
Prefer vertical scaling first:
- Increase cache disk.
- Increase network bandwidth.
- Increase CPU if TLS or concurrent traffic is the bottleneck.
Use horizontal scaling when you have:
- Multiple regions.
- Separate business units or tenants.
- Very large working sets that should not share one cache.
Run separate cache instances for isolation. Do not share one cache volume between instances.
Multi-Region Deployments
Deploy one cache per region and point clients to their nearest cache. Keep each cache close to the object-store replica it reads from.
If object-store replication is eventual, expect a newly pushed object to miss in another region until the origin replica catches up.
Backup And Recovery
Do not back up the cache volume by default. The cache can be rebuilt from origin.
Recovery procedure:
- Start a new cache service with an empty cache directory.
- Keep clients pointed at the service.
- Allow normal reads and pushes to warm the cache.
- Watch hit rate and origin traffic until they return to normal.
Expect elevated origin reads and uploads while the cache warms.
Upgrades
Recommended upgrade flow:
- Remove the instance from readiness rotation.
- Stop the service or roll the pod.
- Start the new binary with the same config and cache volume.
- Verify health, metrics, and admin stats.
- Return the instance to service.
For Kubernetes with a single persistent volume, use a recreate strategy rather than a rolling update.
Operational Checklist
Weekly:
- Review hit rate.
- Review cache utilization.
- Review origin miss volume.
- Review auth failures.
Monthly:
- Revisit cache size against active repo growth.
- Rotate PSK or tokens according to your security policy.
- Review policy rules.
- Confirm CI and developer clients still point at the expected cache.