Back to Autoscaler

AEP-4016: Support for in place updates in VPA

vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md

latest17.0 KB
Original Source

AEP-4016: Support for in place updates in VPA

<!-- toc --> <!-- /toc -->

Summary

VPA applies its recommendations with a mutating webhook, during pod creation. It can also evict pods expecting that it will apply the recommendation when the pod is recreated. Today, this process is potentially disruptive as any changes in recommendations requires a pod to be recreated.

We can instead reduce the amount of disruption by leveraging the in-place update feature which is currently an alpha feature since 1.27 and graduating to beta in 1.33.

This proposal enables only core uses of in place updates in VPA with intention of providing the foundational pieces. Further advanced uses of in place updates in VPA (like applying different recommendations during pod initialization or providing more frequent smaller updates) will be introduced as separate enhancement proposals.

A Note On Disruptions

It is important to note that VPA cannot guarantee NO disruptions. This is because the underlying container runtime is responsible for actuating the resize operation and there are no guarantees provided (see this thread for more information). However, in practice if the underlying container runtime supports it, we expect these disruptions to be minimal and that MOST of the time the updates will be done in-place.

This proposal therefore focuses on reducing disruptions while still harnessing the benefits of VPA.

Goals

  • Allow VPA to actuate with reduced disruption.
  • Allow VPA to actuate in situations where actuation by eviction is not desirable.

Non-Goals

  • Allow VPA to actuate more frequently.
  • Allow VPA to operate with NO disruptions, see the note above.
  • Improved handling of injected sidecars
    • Separate AEP will improve VPAs handling of injected sidecars.
  • Partial updates applied to some containers of a pod, some changes skipped (request in recommendation bounds).

Proposal

Add a new supported value of UpdateMode:

  • InPlaceOrRecreate

Here we specify InPlaceOrRecreate to make sure the user explicitly knows that the existing pod may be replaced.

For the initial release of in-place updates with VPA, in-place updates will only be available using the InPlaceOrRecreate mode. In the future, once the SIG feels that the feature is mature enough, this behavior will become the default behavior for the Auto mode. See the Auto mode documentation.

Context

In-place update of pod resources KEP is available in alpha in 1.27 and graduating to beta in 1.33. The feature allows changing container resources while the container is running. It adds two key features:

  • A /resize subresource that can be used to mutate the Pod.Spec.Containers[i].Resources field.
  • A ResizePolicy field to Container. This field allows to the user to specify the behavior when modifying a resource value. Currently it has two modes:
    • PreferNoRestart (default) which indicates to the container runtime that it should try to resize the container without restarting. However, it does not guarantee that a restart will not happen.
    • RestartContainer which indicates that any mutation to the resource requires a restart (for example, this is important for Java apps using the -xmxN which are unable to resize memory without restarting).

Note that resize operations will NOT change the pod's quality of service (QoS) class.

Note that in the initial Beta version of in-place updates, memory limit downscaling is forbidden for pods with resizePolicy: PreferNoRestart. This means that when VPA will attempt to apply the patch, it will fail and VPA will need to fallback to a regular eviction (see below).

Design Details

Prior to this AEP, only the VPA admission controller was responsible for changing the pod spec.

The VPA updater is responsible for evicting pods so that the admission controller can change them during admission.

In the newly added InPlaceOrRecreate mode, the VPA Updater will attempt to execute in-place updates FIRST. If it is unable to process an in-place update in time, it will evict the pod to force a change.

This will effectively match the current behavior in Auto except that resizes will first be attempted in-place.

In the future, this logic may be improved to:

  • Provide more frequent resizes.
  • Make changes that are only attempted using in-place resizes and wouldn't ultimately result in an eviction on failure.
  • In the case of failure, make smaller updates to circumvent a node that does not have enough headroom to accept the full resize but could accommodate a smaller one.

We classify two types of updates in the context of this new mode:

  1. Updates on pod admission
  2. In-place updates

Applying Updates During Pod Admission

For VPAs using the new InPlaceOrRecreate mode, the VPA Admission Controller will apply updates to starting pods just as it does for VPAs in Initial, Auto, and Recreate modes.

In-Place Updates

In the InPlaceOrRecreate modes, and for updates that require a container restart, the VPA updater will attempt to apply updates in place. It will update them under the same conditions that would trigger an update with Recreate mode. That is it will apply an in-place update if:

  • Any container has a request below the corresponding LowerBound or
  • Any container has a request above the corresponding UpperBound or
  • Difference between sum of pods requests and sum of recommendation Targets is more than 10% and the pod has been running undisrupted for at least 12h.
    • NOTE: A successful update counts as disruption here (and prevents further disruptive updates to the pod for 12h).

(NEW!) In addition, VPA will attempt an in-place update in some cases where we NORMALLY would not be able to perform an eviction, including:

  • If CanEvict is false.
  • If any of the EvictionRequirements on the VPA are not true.

If the in-place resize operation fails in this case, VPA can still proceed with the normal eviction path which would get blocked anyway due to these conditions preventing it from happening.

The VPA updater will evict a pod to actuate a recommendation if it attempted to apply the recommendation in place and failed. This will happen even if we attempted the in-place resize for conditions that normally would not lead to an eviction. This is safe because the eviction would be prevented anyway.

VPA updater will consider that the update failed if:

  • The pod has condition PodResizePending with reason Infeasible or
  • The pod has condition PodResizePending with reason Deferred and:
    • In the initial alpha implementation: more than 5 minutes elapsed since the update or
    • Eventually in the alpha stage: more than --in-place-deferred-resize-timeout elapsed since the update or
  • The pod has condition PodResizeInProgress and:
    • In the initial alpha implementation: more than 1 hour elapsed since the update or
    • Eventually in the alpha stage: more than --in-place-resize-timeout elapsed since the update or
  • Patch attempt returns an error.

Note that in the initial version of In-Place updates, memory limit downscaling will always fail the patch operation. This means VPA will need to evict the pod normally for this change to happen.

A note on ResizePolicy.

VPA does not care and should not care about a container's ResizePolicy setting. In the new mode, it will simply issue the /resize request and let the underlying machinery apply the resize operation in a way that complies with the user's specification.

Partial Updates

A non-goal for this initial implementation of in-place updates is to support "partial container updates," which refers to sending resize requests only to the container(s) that require them.

This feature is not in the initial AEP scope due to the way the VPA has worked in the past. Note that on the API level, in-place resizes work by submitting patches to the resize subresource for individual containers. Updates through the Recreate updateMode are actuated by Pod evictions, and VPA does not accounted for individual containers when deciding when to evict. This requires a refactor that is not immediately obvious to implement.

On the flipside, updating all containers of a pod during an in-place resize is not ideal. This can potentially cause unnecessary disruption when downsizing containers with limits or updating containers with .spec.resizePolicy[].restartPolicy: RestartContainer.

Moving forward, support for in-place partial updates will be considered as a feature request or future enhancement.

Comparison of UpdateModes

Today, VPA updater considers the following conditions when deciding if it should apply an update:

  • CanEvict:
    • Pod is Pending or
    • There are enough running pods in the controller.
  • Quick OOM:
    • Any container in the pod had a quick OOM (by default less than 10 minutes after the container started) and
    • There is difference between sum of recommendations and sum of current requests over containers (see [defaultPriorityProcessor.GetUpdatePriority]).
  • Long-lived pod - started enough time ago (by default 12h)
  • Significant change - difference between sum of requests over containers and sum of recommendations are different enough (by default 10%).
  • Outside recommended range:
    • At least one container has at least one resource request lower than the lower bound of the corresponding recommendation or
    • At least one container has at least one resource request higher than the upper bound of the corresponding recommendation.
  • NEW Disruption-free update - doesn't change any resources for which the relevant container specifies RestartPolicy: RestartContainer.

Auto / Recreate evicts pod if:

  • CanEvict returns true for the pod, and it meets at least one of the following conditions:
    • Quick OOM,
    • Outside recommended range,
    • Long-lived pod with significant change.
    • EvictionRequirements are all true.

InPlaceOrRecreate will attempt to apply an update in place if it meets at least one of the following conditions:

Test Plan

The following test scenarios will be added to e2e tests. The InPlaceOrRecreate mode will be tested in the following scenarios:

  • Admission controller applies recommendation to pod controlled by VPA.
  • In-place update applied to all containers of a pod.
  • In-place update will fail. Pod should be evicted and the recommendation applied.
  • In-place update will fail but CanEvict is false, pod should not be evicted.
  • In-place update will fail but EvictionRequirements are false, pod should not be evicted.

Upgrade / Downgrade Strategy

Upgrade

On upgrade of the VPA to 1.4.0 (tentative release version), nothing will change, VPAs will continue to work as before.

Users can use the new InPlaceOrRecreate by enabling the alpha Feature Gate (which defaults to disabled) by passing --feature-gates=InPlaceOrRecreate=true to the updater and admission-controller components and setting their VPA UpdateMode to use InPlaceOrRecreate.

Downgrade

On downgrade of VPA from 1.4.0 (tentative release version), nothing will change. VPAs will continue to work as previously, unless, the user had enabled the feature gate. In which case downgrade could break their VPA that uses InPlaceOrRecreate.

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?

  • Feature gate name: InPlaceOrRecreate
  • Components depending on the feature gate:
    • admission-controller
    • updater

Disabling of feature gate InPlaceOrRecreate will cause the following to happen:

  • admission-controller to reject new VPA objects being created with InPlaceOrRecreate configured
    • A descriptive error message should be returned to the user letting them know that they are using a feature gated feature
  • updater to fall back to Recreate, should it encounter a VPA configured with InPlaceOrRecreate

Enabling of feature gate InPlaceOrRecreate will cause the following to happen:

  • admission-controller to accept new VPA objects being created with InPlaceOrRecreate configured
  • updater will attempt to perform an in-place adjustment for VPAs configured with InPlaceOrRecreate

Kubernetes version compatibility

InPlaceOrRecreate is being built assuming that it will be running on a Kubernetes version of at least 1.33 with the beta version of KEP-1287: In-Place Update of Pod Resources enabled. Should these conditions not be true, the VPA shall fall back to Recreate and emit a log message saying that it did so.

Details still to consider

Careful with memory scale down

Downsizing memory may have to be done slowly to prevent OOMs if application starts to allocate rapidly. Needs more research on how to scale down on memory safely.

Implementation History

  • 2023-05-10: initial version
  • 2025-02-19: Updates to align with latest changes to KEP-1287.
  • 2025-03-06: Scope changes to "partial updates" feature
  • 2025-03-08: Add "Upgrade / Downgrade Strategy" and "Kubernetes version compatibility" sections
  • 2025-03-27: Add flags to control the in-place resize timeouts