vertical-pod-autoscaler/enhancements/8818-in-place-only/README.md
AEP-4016 introduced the InPlaceOrRecreate update mode which attempts in-place updates first but falls back to pod eviction if the in-place update fails. However, for certain workloads, any disruption is unacceptable, and users would prefer to retry in-place updates indefinitely rather than evict and recreate pods.
This proposal introduces a new update mode that only attempts in-place updates and retries on failure without ever falling back to eviction.
There are several use cases where pod disruption should be avoided at all costs:
In these scenarios, users would prefer:
Add a new supported value of UpdateMode: InPlace
This mode will:
InPlaceOrRecreatepodsForEviction if in-place updates failspec.resources (see Resize Status Handling for details)Add UpdateModeInPlace to the VPA types:
// In pkg/apis/autoscaling.k8s.io/v1/types.go
const (
// ... existing modes ...
// UpdateModeInPlace means that VPA will only attempt to update pods in-place
// and will never evict them. If in-place update fails, VPA will rely on
// Kubelet's automatic retry mechanism.
UpdateModeInPlace UpdateMode = "InPlace"
)
VPA must track infeasible resize attempts to prevent infinite retry loops. This is necessary because infeasibility can be detected at different points depending on the Kubernetes version:
| Kubernetes Version | When Infeasibility Is Detected | spec.resources After Attempt | How VPA Learns |
|---|---|---|---|
| < 1.36 (or later if KEP slips) | After patch succeeds, kubelet reports status | Updated to attempted value | Resize status = Infeasible |
| >= 1.36 (targeted, not guaranteed) | At patch time, API server rejects | Unchanged (old value) | Patch error response |
Note: The Kubernetes sig-node team is targeting admission-time feasibility checks for 1.36 (kubernetes/kubernetes#136043), but this timeline is not guaranteed and may slip to a later release. VPA implements version-agnostic detection that works correctly regardless of which version introduces this change.
To handle both cases uniformly, VPA maintains a map of infeasible attempts:
type updater struct {
// ... existing fields ...
// infeasibleAttempts maps pod UID to the last resource values
// that were determined to be infeasible. This prevents retrying the same
// infeasible values repeatedly.
infeasibleAttempts map[types.UID]*vpa_types.RecommendedPodResources
infeasibleMu sync.RWMutex
}
Using the pod UID as the key ensures that entries are uniquely identified even if pods with the same name are recreated.
At the beginning of each updater cycle, VPA cleans up entries from the infeasibleAttempts map for pods that no longer exist. This prevents memory leaks from accumulating stale entries:
// CleanupInfeasibleAttempts removes entries from infeasibleAttempts for pods that no longer exist.
// This should be called at the beginning of each updater cycle with the list of all live pods.
func (u *updater) CleanupInfeasibleAttempts(livePods []*apiv1.Pod) {
u.infeasibleMu.Lock()
defer u.infeasibleMu.Unlock()
// Build a set of existing pod UIDs
seenPods := sets.New[types.UID]()
for _, pod := range livePods {
seenPods.Insert(pod.UID)
}
// Remove entries for pods that no longer exist
for podUID := range u.infeasibleAttempts {
if !seenPods.Has(podUID) {
delete(u.infeasibleAttempts, podUID)
klog.V(4).InfoS("Cleaned up infeasible attempt for non-existent pod", "podUID", podUID)
}
}
}
The infeasibleAttempts map is stored in-memory within the updater component. This has the following implications:
infeasibleAttempts data is lost. This means:
Since VPA cannot know at runtime which Kubernetes version is running (and the admission-time check may ship in 1.36 or later), VPA implements dual detection paths that handle both scenarios simultaneously:
VPA attempts patch → Patch succeeds → spec.resources updated → Kubelet evaluates → Status = Infeasible → VPA stores spec.resources
In this path:
spec.resources is updated to the attempted valuesInfeasible status during reconciliationspec.resources (which contains the infeasible values)VPA attempts patch → API server rejects with infeasibility error → VPA stores attempted recommendation
In this path:
StatusCause with Type: NodeCapacity to indicate the specific reasonspec.resources remains unchanged (the patch was rejected)NodeCapacity cause, and stores the recommendation it attempted to applyThese paths are mutually exclusive for any given update attempt:
VPA implements both paths unconditionally, so it works correctly on any Kubernetes version without needing version detection.
The reconciliation loop follows this sequence of steps for each pod:
VPA retrieves the current recommendation for the pod from the VPA object. Then the VPA checks if there is a stored infeasible attempt for this pod in the infeasibleAttempts map.
If the pod is currently undergoing an in-place resize (i.e., spec.resources differs from status.resources), check the resize status:
| Status | Action | Rationale |
|---|---|---|
InProgress | Wait, take no action | Resize is actively being applied by kubelet |
Deferred | Wait, take no action | Kubelet has accepted resize but is waiting for the right conditions |
Infeasible | Store spec.resources as infeasible, skip pod | Pre-admission-check path: kubelet determined resize cannot be accommodated. spec.resources contains the attempted values. |
Error | Take no action | Kubelet will automatically retry |
VPA attempts to patch the pod's spec.resources with the current recommendation:
| Outcome | Action | Rationale |
|---|---|---|
| Patch succeeds | Clear any stored infeasible attempt | Update is now in progress; previous infeasibility no longer relevant |
| Patch fails with error containing NodeCapacity cause | Store attempted recommendation as infeasible, skip pod | Post-admission-check path: API server rejected because resources exceed node capacity |
| Patch fails with transient error | Do nothing | Will retry in next reconciliation loop |
Periodically, VPA removes entries from the infeasibleAttempts map for pods that no longer exist. This prevents memory leaks from accumulating stale entries. This cleanup behavior is targeted for beta.
Key Difference from InPlaceOrRecreate: In InPlace mode, Deferred, Infeasible, and InProgress statuses all result in waiting—VPA never falls back to eviction. In contrast, InPlaceOrRecreate mode may fall back to eviction after a timeout. This ensures that InPlace mode pods are never evicted, regardless of how long they remain in a non-updatable state.
Modify the CanInPlaceUpdate to accommodate the new update mode:
// CanInPlaceUpdate checks if pod can be safely updated
func (ip *PodsInPlaceRestrictionImpl) CanInPlaceUpdate(pod *apiv1.Pod, updateMode vpa_types.UpdateMode) utils.InPlaceDecision {
// Feature gate checks based on update mode
switch updateMode {
case vpa_types.UpdateModeInPlaceOrRecreate:
if !features.Enabled(features.InPlaceOrRecreate) {
return utils.InPlaceEvict
}
case vpa_types.UpdateModeInPlace:
if !features.Enabled(features.InPlace) {
return utils.InPlaceDeferred
}
case vpa_types.UpdateModeAuto:
// Auto mode is deprecated but still supports in-place updates
// when the feature gate is enabled
if !features.Enabled(features.InPlaceOrRecreate) {
return utils.InPlaceEvict
}
default:
// UpdateModeOff, UpdateModeInitial, UpdateModeRecreate, etc.
return utils.InPlaceEvict
}
cr, present := ip.podToReplicaCreatorMap[getPodID(pod)]
if !present {
klog.V(4).InfoS("Can't in-place update pod, but not falling back to eviction. Waiting for next loop", "pod", klog.KObj(pod))
return utils.InPlaceDeferred
}
if pod.Status.Phase == apiv1.PodPending {
return utils.InPlaceDeferred
}
singleGroupStats, present := ip.creatorToSingleGroupStatsMap[cr]
if !present {
klog.V(4).InfoS("Can't in-place update pod, but not falling back to eviction. Waiting for next loop", "pod", klog.KObj(pod))
return utils.InPlaceDeferred
}
if isInPlaceUpdating(pod) {
// For InPlace mode: wait for all non-terminal statuses, never evict.
// Infeasible attempts are tracked and only retried when recommendation changes.
if updateMode == vpa_types.UpdateModeInPlace {
resizeStatus := getResizeStatus(pod)
switch resizeStatus {
case utils.ResizeStatusInfeasible:
// Infeasible means node can't accommodate the resize.
// Store spec.resources and wait for recommendation to change before retrying.
klog.V(4).InfoS("In-place update infeasible, will retry", "pod", klog.KObj(pod))
return utils.InPlaceInfeasible
case utils.ResizeStatusDeferred:
// Deferred means kubelet is waiting to apply the resize.
// Do nothing, wait for kubelet to proceed.
klog.V(4).InfoS("In-place update deferred by kubelet, waiting", "pod", klog.KObj(pod))
return utils.InPlaceDeferred
case utils.ResizeStatusInProgress:
// Resize is actively being applied, wait for completion.
klog.V(4).InfoS("In-place update in progress, waiting for completion", "pod", klog.KObj(pod))
return utils.InPlaceDeferred
case utils.ResizeStatusError:
// Error during resize, retry
klog.V(4).InfoS("In-place update error, will retry", "pod", klog.KObj(pod))
return utils.InPlaceInfeasible
default:
klog.V(4).InfoS("In-place update status unknown, waiting", "pod", klog.KObj(pod), "status", resizeStatus)
return utils.InPlaceDeferred
}
}
// For InPlaceOrRecreate/Auto modes, check timeout and potentially evict
canEvict := CanEvictInPlacingPod(pod, singleGroupStats, ip.lastInPlaceAttemptTimeMap, ip.clock)
if canEvict {
return utils.InPlaceEvict
}
return utils.InPlaceDeferred
}
if ip.inPlaceSkipDisruptionBudget && utils.IsNonDisruptiveResize(pod) {
klog.V(4).InfoS("in-place-skip-disruption-budget enabled, skipping disruption budget check for in-place update")
return utils.InPlaceApproved
}
if ip.inPlaceSkipDisruptionBudget {
klog.V(4).InfoS("in-place-skip-disruption-budget enabled, but pod has RestartContainer resize policy", "pod", klog.KObj(pod))
}
if singleGroupStats.isPodDisruptable() {
return utils.InPlaceApproved
}
klog.V(4).InfoS("Can't in-place update pod, but not falling back to eviction. Waiting for next loop", "pod", klog.KObj(pod))
return utils.InPlaceDeferred
}
InPlace feature gate is disabled and a VPA is configured with UpdateMode: InPlace, the updater will skip processing that VPA entirely (not fall back to eviction).InPlaceOrRecreate with its feature gate disabled will fall back to eviction mode.This design ensures that InPlace mode truly guarantees no evictions, even in misconfiguration scenarios.
While InPlace mode prevents pod eviction and eliminates the disruption associated with pod recreation, it is still subject to the behavior of Kubernetes' InPlacePodVerticalScaling feature.
When a memory limit is decreased in-place, there is a small but non-zero risk of OOMKill if the container's current memory usage exceeds the new lower limit at the moment the resize is applied.
This is an inherent limitation of in-place resource updates documented in KEP-1287, not a VPA-specific behavior.
This risk may be unacceptable for workloads with strict SLO requirements where even brief disruptions (including OOMKills) cannot be tolerated.
For workloads where even unintended OOMKills are unacceptable, users should implement one or more of the following strategies:
The following test scenarios will be added to e2e tests. The InPlace mode will be tested in the following scenarios:
InPlaceOrRecreate)Infeasible Attempt Tracking Tests:
Infeasible, verify spec.resources is stored and pod is skipped until recommendation changesNodeCapacity cause, verify attempted recommendation is stored and pod is skipped until recommendation changesOn upgrade to VPA 1.6.0 (tentative release version), users can opt into the new InPlace mode by enabling the alpha Feature Gate (which defaults to disabled) by passing --feature-gates=InPlace=true to the updater and admission-controller components and setting their VPA UpdateMode to use InPlace.
Existing VPAs will continue to work as before.
On downgrade of VPA from 1.6.0 (tentative release version), nothing will change. VPAs will continue to work as previously, unless, the user had enabled the feature gate. In which case downgrade could break their VPA that uses InPlace.
InPlaceDisabling of feature gate InPlace will cause the following to happen:
InPlace configured
InPlace mode (no in-place updates or evictions will be attempted)Enabling of feature gate InPlace will cause the following to happen:
InPlace configuredInPlaceInPlace is being built assuming that it will be running on a Kubernetes version of at least 1.34 with the beta version of KEP-1287: In-Place Update of Pod Resources enabled.
Should these conditions not be true, the VPA shall not be able to scale your workload at all.
Kubernetes 1.36+ Considerations: Starting with Kubernetes 1.36, infeasible resize requests are rejected at the API server level rather than being accepted and later marked as Infeasible by the kubelet. The InPlace mode handles both scenarios through unified infeasible attempt tracking:
Infeasible, VPA stores spec.resources (which reflects the attempted values)NodeCapacity cause, VPA stores the attempted recommendationIn both cases, VPA only retries when the recommendation changes from the stored values. This ensures consistent behavior across Kubernetes versions without requiring user configuration.