communications/postmortems/2025-07-17.md
Changes were made to improve the availability of the underlying Knative infrastructure, which resulted in persistent volume claims (PVC) being disabled. As a result, builds were failing until PVC functionality was re-enabled.
All times in Pacific Time (PT)
In order to reduce deployment times and improve performance, our runners use PVC claims to mount Filestore volumes. When changes intended to improve availability were made to the underlying Knative infrastructure, the new configuration inadvertently disabled PVC claims for Knative services. When build jobs complete, they automatically deploy a new runner with the updated build. Because the runner configuration includes a read/write PVC, but PVCs were disabled, the conflict caused Knative to fail when creating the new runner service. This in turn caused the build job to fail.
Knative configuration was updated to re-enable PVCs.
Improvements to logs, metrics, and alerting.
Implementation of incident management tooling, including on-call and pagers.