Cache Service
Monitoring
The cache service exposes health endpoints, Prometheus metrics, and structured logs. Monitor it as an acceleration layer: cache failures matter, but origin availability remains the correctness boundary.
Health Endpoints
| Endpoint | Use |
|---|---|
/v1/health/live | Liveness. Returns ok when the process is running. |
/v1/health | Readiness. Returns ok when the service can reach origin. |
/health/live | Compatibility alias. |
/health | Compatibility alias. |
Use liveness for restarts. Use readiness for load-balancer rotation.
Prometheus
Scrape:
scrape_configs:
- job_name: crab-cache
static_configs:
- targets: ["crab-cache.example.com:8443"]
metrics_path: /v1/metricsImportant metrics:
| Metric | Meaning |
|---|---|
cache_hit_total | Cache hits by object type. |
cache_miss_total | Cache misses by object type. |
cache_bytes_served | Bytes served to clients, split by hit and miss. |
cache_bytes_stored | Current cache size. |
origin_fetch_total | Misses that required origin reads. |
origin_fetch_bytes | Bytes fetched from origin. |
push_warming_total | Successful push-warming writes. |
dedup_query_total | Dedup query count. |
dedup_chunks_known | Chunks reported as already known. |
dedup_chunks_unknown | Chunks reported as unknown. |
cache_eviction_total | Evicted objects by type. |
Useful Dashboard Panels
Track:
- Cache hit rate.
- Bytes served from cache versus origin.
- Cache utilization versus configured budget.
- Origin fetch latency.
- Push warming rate.
- Dedup known/unknown ratio.
- 4xx and 5xx response rate.
Example Queries
Cache hit rate:
sum(rate(cache_hit_total[5m])) /
(sum(rate(cache_hit_total[5m])) + sum(rate(cache_miss_total[5m])))Bytes served from cache:
sum(rate(cache_bytes_served{hit="true"}[1h]))Dedup ratio:
sum(rate(dedup_chunks_known[5m])) /
(sum(rate(dedup_chunks_known[5m])) + sum(rate(dedup_chunks_unknown[5m])))Alerts
Recommended alerts:
| Alert | Condition |
|---|---|
| Cache down | Prometheus cannot scrape the service. |
| Origin unreachable | Readiness fails for several minutes. |
| Hit rate low | Hit rate remains low after the cache should be warm. |
| Cache near full | Cache usage exceeds the planned high-water point. |
| Origin latency high | Origin miss path becomes slow. |
| Auth failures spike | 401 or 403 rate increases unexpectedly. |
Do not page on a low hit rate immediately after a new deployment or after replacing the cache volume. The cache needs time to warm.
Logs
Use JSON logs in production:
[logging]
format = "json"
level = "info"Use logs to answer:
- Are clients reaching the service?
- Are requests hitting cache or falling through to origin?
- Are push-warming requests arriving?
- Are auth failures caused by missing credentials or policy denial?