vertical-pod-autoscaler/enhancements/4831-control-eviction-behavior/README.md
VPA provides different UpdateModes, describing if/how the VPA applies new recommendations to the Pods. They range from "recommendation only, don't make any changes" (Off), over "apply new recommendations when pods are re-scheduled, but don't evict" (Initial) to "evict a Pod to apply a new recommendation" (Auto). These existing UpdateModes work in the same way for scaling up and scaling down and are the same for all resources controlled by the VPA. We propose adding a new functionality which allows to control the conditions in which Pods can be evicted to apply a new recommendation based on the changes (are we scaling up or scaling down?) for each resource individually.
For some workloads, each eviction introduces disruptions for users. Examples include anything running as a singleton (e.g. etcd) or workload that needs a lot of time to process data on startup (e.g. prometheus). Therefore, operators may want to limit the amount of evictions. We can do this by introducing a new tradeoff: Overspend on resources to increase uptime. It is possible to turn off eviction for downscaling entirely or have it managed by an external component to allow downscaling only during certain time windows, days, or based on other criteria.
Add a new field EvictionRequirements to PodUpdatePolicy of type []*EvictionRequirement. A single EvictionRequirement defines a condition which must be true to allow eviction for the corresponding Pod. When multiple EvictionRequirements are specified for a Pod, all of them must evaluate to true to allow eviction. For Pods with multiple Containers, an EvictionRequirement evaluates to true when at least one Container that is under VPA control fulfills the EvictionRequirement.
A single EvictionRequirement specifies Resources and a ChangeRequirement comparing the new recommendation (Target) with the existing requests on a Pod (Requests). Possible values for Resources are [CPU] and [Memory] or both [CPU,Memory]. If Resources: [CPU, Memory], the condition must be true for either of the two resources to allow for eviction. Possible values for ChangeRequirement are TargetHigherThanRequests and TargetLowerThanRequests.
Add validation to prevent users from adding EvictionRequirements which can never evaluate to true:
EvictionRequirement for a single resource is foundResources: [CPU, Memory] is specified on one EvictionRequirement together with Resources: [CPU] or Resources: [Memory] on another EvictionRequirementEvictionRequirements evaluate to trueEvictionRequirementsWe currently achieve a similar functionality by running VPA in spec.updatePolicy.updateMode: "Off" and having a different controller inspect the recommendations and apply them based on certain criteria like scaling direction and time.
With this approach you either end up re-building half of the VPA (updater/webhook), or use a different mechanism to apply the recommendations, such as modifying the requests in the Pod owning Object – which has its own drawbacks.