diff options
| author | k8s-ci-robot <k8s-ci-robot@users.noreply.github.com> | 2018-01-03 03:34:19 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2018-01-03 03:34:19 -0800 |
| commit | cb3151376cdfef627bc020bc6eeff9f7b8279f24 (patch) | |
| tree | 9efeb0d006ab84d68906a5bdb243bdf76e40f183 | |
| parent | 109954c6661ad4302e5b1112851fb7410ad4f88a (diff) | |
| parent | 5a6183f3c0d234bf1946429d2ae51d7d8b59cf9d (diff) | |
Merge pull request #338 from kgrygiel/master
Vertical Pod Autoscaler - design proposal.
| -rw-r--r-- | contributors/design-proposals/images/vpa-architecture.png | bin | 0 -> 310129 bytes | |||
| -rw-r--r-- | contributors/design-proposals/vertical-pod-autoscaler.md | 730 |
2 files changed, 730 insertions, 0 deletions
diff --git a/contributors/design-proposals/images/vpa-architecture.png b/contributors/design-proposals/images/vpa-architecture.png Binary files differnew file mode 100644 index 00000000..c8af3073 --- /dev/null +++ b/contributors/design-proposals/images/vpa-architecture.png diff --git a/contributors/design-proposals/vertical-pod-autoscaler.md b/contributors/design-proposals/vertical-pod-autoscaler.md new file mode 100644 index 00000000..b4b08704 --- /dev/null +++ b/contributors/design-proposals/vertical-pod-autoscaler.md @@ -0,0 +1,730 @@ +Vertical Pod Autoscaler +======================= +**Authors:** kgrygiel, mwielgus +**Contributors:** DirectXMan12, fgrzadkowski, jszczepkowski, smarterclayton + +Vertical Pod Autoscaler +([#10782](https://github.com/kubernetes/kubernetes/issues/10782)), +later referred to as VPA (aka. "rightsizing" or "autopilot") is an +infrastructure service that automatically sets resource requirements of Pods +and dynamically adjusts them in runtime, based on analysis of historical +resource utilization, amount of resources available in the cluster and real-time +events, such as OOMs. + +- [Introduction](#introduction) + - [Background](#background) + - [Purpose](#purpose) + - [Related features](#related-features) +- [Requirements](#requirements) + - [Functional](#functional) + - [Availability](#availability) + - [Extensibility](#extensibility) +- [Design](#design) + - [Overview](#overview) + - [Architecture overview](#architecture-overview) + - [API](#api) + - [Admission Controller](#admission-controller) + - [Recommender](#recommender) + - [Updater](#updater) + - [Recommendation model](#recommendation-model) + - [History Storage](#history-storage) + - [Open questions](#open-questions) +- [Future work](#future-work) + - [Pods that require VPA to start](#pods-that-require-vpa-to-start) + - [Combining vertical and horizontal scaling](#combining-vertical-and-horizontal-scaling) + - [Batch workloads](#batch-workloads) +- [Alternatives considered](#alternatives-considered) + - [Pods point at VPA](#pods-point-at-vpa) + - [VPA points at Deployment](#vpa-points-at-deployment) + - [Actuation using the Deployment update mechanism](#actuation-using-the-deployment-update-mechanism) + +------------ +Introduction +------------ + +### Background ### +* [Compute resources](https://kubernetes.io/docs/user-guide/compute-resources/) +* [Resource QoS](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/resource-qos.md) +* [Admission Controllers](https://kubernetes.io/docs/admin/admission-controllers/) +* [External Admission Webhooks](https://kubernetes.io/docs/admin/extensible-admission-controllers/#external-admission-webhooks) + +### Purpose ### +Vertical scaling has two objectives: + +1. Reducing the maintenance cost, by automating configuration of resource +requirements. + +2. Improving utilization of cluster resources, while minimizing the risk of containers running out of memory or getting CPU starved. + +### Related features ### +#### Horizontal Pod Autoscaler #### +["Horizontal Pod Autoscaler"](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) +(often abbreviated to HPA) is an infrastructure service that dynamically adjusts +the number of Pods in a replication controller based on realtime analysis of CPU +utilization or other, user specified signals. +Usually the user will choose horizontal scaling for stateless workloads and +vertical scaling for stateful. In some cases both solutions could be combined +([see more](#combining-vertical-and-horizontal-scaling)). + +#### Cluster Autoscaler #### +["Cluster Autoscaler"](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) +is a tool that automatically adjusts the size of the Kubernetes cluster based on +the overall cluster utilization. +Cluster Autoscaler and Pod Autoscalers (vertical or horizontal) are +complementary features. Combined together they provide a fully automatic scaling +solution. + +#### Initial resources #### +["Initial Resources"](https://github.com/kgrygiel/community/blob/master/contributors/design-proposals/initial-resources.md) +is a very preliminary, proof-of-concept feature providing initial request based +on historical utilization. It is designed to only kick in on Pod creation. +VPA is intended to supersede this feature. + +#### In-place updates #### +In-place Pod updates ([#5774] +(https://github.com/kubernetes/kubernetes/issues/5774)) is a planned feature to +allow changing resources (request/limit) of existing containers without killing them, assuming sufficient free resources available on the node. +Vertical Pod Autoscaler will greatly benefit from this ability, however it is +not considered a blocker for the MVP. + +#### Resource estimation #### +Resource estimation is another planned feature, meant to improve node resource +utilization by temporarily reclaiming unused resources of running containers. +It is different from Vertical Autoscaling in that it operates on a shorter +timeframe (using only local, short-term history), re-offers resources at a +lower quality, and does not provide initial resource predictions. +VPA and resource estimation are complementary. Details will follow once +Resource Estimation is designed. + +------------ +Requirements +------------ + +### Functional ### + +1. VPA is capable of setting container resources (CPU & memory request/limit) at + Pod submission time. + +2. VPA is capable of adjusting container resources of existing Pods, in + particular reacting to CPU starvation and container OOM events. + +3. When VPA restarts Pods, it respects the disruption budget. + +4. It is possible for the user to configure VPA with fixed constraints on + resources, specifically: min & max request. + +5. VPA is compatible with Pod controllers, at least with Deployments. + In particular: + * Updates of resources do not interfere/conflict with spec updates. + * It is possible to do a rolling update of the VPA policy (e.g. min resources) + on an existing Deployment. + +6. It is possible to create Pod(s) that start following the VPA policy + immediately. In particular such Pods must not be scheduled until VPA policy + is applied. + +7. Disabling VPA is easy and fast ("panic button"), without disrupting existing + Pods. + +### Availability ### +1. Downtime of heavy-weight components (database/recommender) must not block + recreating existing Pods. Components on critical path for Pod creation + (admission controller) are designed to be highly available. + +### Extensibility ### +1. VPA is capable of performing in-place updates once they become available. + +------ +Design +------ + +### Overview ### +(see further sections for details and justification) + +1. We introduce a new type of **API resource**: + `VerticalPodAutoscaler`. It consists of a **label selector** to match Pods, + the **resources policy** (controls how VPA computes the resources), the + **update policy** (controls how changes are applied to Pods) and the + recommended Pod resources (an output field). + +2. **VPA Recommender** is a new component which **consumes utilization signals + and OOM events** for all Pods in the cluster from the + [Metrics Server](https://github.com/kubernetes-incubator/metrics-server). + +3. VPA Recommender **watches all Pods**, keeps calculating fresh recommended + resources for them and **stores the recommendations in the VPA objects**. + +4. Additionally the Recommender **exposes a synchronous API** that takes a Pod + description and returns recommended resources. + +5. All Pod creation requests go through the VPA **Admission Controller**. + If the Pod is matched by any VerticalPodAutoscaler object, the admission + controller **overrides resources** of containers in the Pod with the + recommendation provided by the VPA Recommender. If the Recommender is not + available, it falls back to the recommendation cached in the VPA object. + +6. **VPA Updater** is a component responsible for **real-time updates** of Pods. + If a Pod uses VPA in `"Auto"` mode, the Updater can decide to update it with + recommender resources. + In MVP this is realized by just evicting the Pod in order to have it + recreated with new resources. This approach requires the Pod to belong to a + Replica Set (or some other owner capable of recreating it). + In future the Updater will take advantage of in-place updates, which would + most likely lift this constraint. + Because restarting/rescheduling Pods is disruptive to the service, it must be + rare. + +7. VPA only controls the resource **request** of containers. It sets the limit + to infinity. The request is calculated based on analysis of the current and + previous runs (see [Recommendation model](#recommendation-model) below). + +8. **History Storage** is a component that consumes utilization signals and OOMs + (same data as the Recommender) from the API Server and stores it persistently. + It is used by the Recommender to **initialize its state on startup**. + It can be backed by an arbitrary database. The first implementation will use + [Prometheus](https://github.com/kubernetes/charts/tree/master/stable/prometheus), + at least for the resource utilization part. + +### Architecture overview ### + + +### API ### +We introduce a new type of API object `VerticalPodAutoscaler`, which +consists of the Target, that is a [label selector](https://kubernetes.io/docs/api-reference/v1.5/#labelselector-unversioned) +for matching Pods and two policy sections: the update policy and the resources +policy. +Additionally it holds the most recent recommendation computed by VPA. + +#### VPA API object overview #### +```go +// VerticalPodAutoscalerSpec is the specification of the behavior of the autoscaler. +type VerticalPodAutoscalerSpec { + // A label query that determines the set of pods controlled by the Autoscaler. + // More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors + Selector *metav1.LabelSelector + + // Describes the rules on how changes are applied to the pods. + // +optional + UpdatePolicy PodUpdatePolicy + + // Controls how the autoscaler computes recommended resources. + // +optional + ResourcePolicy PodResourcePolicy +} + +// VerticalPodAutoscalerStatus describes the runtime state of the autoscaler. +type VerticalPodAutoscalerStatus { + // The time when the status was last refreshed. + LastUpdateTime metav1.Time + // The most recently computed amount of resources recommended by the + // autoscaler for the controlled pods. + // +optional + Recommendation RecommendedPodResources + // A free-form human readable message describing the status of the autoscaler. + StatusMessage string +} +``` + +The complete API definition is included [below](#complete_vpa_api_object_definition). + +#### Label Selector #### +The label selector determines which Pods will be scaled according to the given +VPA policy. The Recommender will aggregate signals for all Pods matched by a +given VPA, so it is important that the user set labels to group similarly +behaving Pods under one VPA. + +It is yet to be determined how to resolve conflicts, i.e. when the Pod is +matched by more than one VPA (this is not a VPA-specific problem though). + +#### Update Policy #### +The update policy controls how VPA applies changes. In MVP it consists of a +single field `mode` that enables the feature. + +```json +"updatePolicy" { + "mode": "", +} +``` + +Mode can be set to one of the following: + +1. `"Initial"`: VPA only assigns resources on Pod creation and does not + change them during lifetime of the Pod. +2. `"Auto"` (default): VPA assigns resources on Pod creation and + additionally can update them during lifetime of the Pod, including evicting / + rescheduling the Pod. +3. `"Off"`: VPA never changes Pod resources. The recommender still sets the + recommended resources in the VPA object. This can be used for a “dry run”. + +To disable VPA updates the user can do any of the following: (1) change the +updatePolicy to `"Off"` or (2) delete the VPA or (3) change the Pod labels to no +longer match the VPA selector. + +Note: disabling VPA prevents it from doing further changes, but does not revert +resources of the running Pods, until they are updated. +For example, when running a Deployment, the user would need to perform an update +to revert Pod to originally specified resources. + +#### Resource Policy #### +The resources policy controls how VPA computes the recommended resources. +In MVP it consists of (optional) lower and upper bound on the request of each +container. +The resources policy could later be extended with additional knobs to let the +user tune the recommendation algorithm to their specific use-case. + +#### Recommendation #### +The VPA resource has an output-only field keeping a recent recommendation, +filled by the Recommender. This field can be used to obtain a recent +recommendation even during a temporary unavailability of the Recommender. +The recommendation consists of the recommended target amount of resources as +well as an range (min..max), which can be used by the Updater to make decisions +on when to update the pod. +In the case of a resource crunch the Updater may decide to squeeze pod resources +towards the recommended minimum. +The width of the (min..max) range also reflects the confidence of a +recommendation. For example, for a workload with a very spiky usage it is much +harder to determine the optimal balance between performance and resource +utilization, compared to a workload with stable usage. + +#### Complete VPA API object definition #### + +```go +// VerticalPodAutoscaler is the configuration for a vertical pod +// autoscaler, which automatically manages pod resources based on historical and +// real time resource utilization. +type VerticalPodAutoscaler struct { + metav1.TypeMeta + // Standard object metadata. + // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#metadata + // +optional + metav1.ObjectMeta + + // Specification of the behavior of the autoscaler. + // More info: https://git.k8s.io/community/contributors/devel/api-conventions.md#spec-and-status. + // +optional + Spec VerticalPodAutoscalerSpec + + // Current information about the autoscaler. + // +optional + Status VerticalPodAutoscalerStatus +} + +// VerticalPodAutoscalerSpec is the specification of the behavior of the autoscaler. +type VerticalPodAutoscalerSpec { + // A label query that determines the set of pods controlled by the Autoscaler. + // More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors + Selector *metav1.LabelSelector + + // Describes the rules on how changes are applied to the pods. + // +optional + UpdatePolicy PodUpdatePolicy + + // Controls how the autoscaler computes recommended resources. + // +optional + ResourcePolicy PodResourcePolicy +} + +// VerticalPodAutoscalerStatus describes the runtime state of the autoscaler. +type VerticalPodAutoscalerStatus { + // The time when the status was last refreshed. + LastUpdateTime metav1.Time + // The most recently computed amount of resources recommended by the + // autoscaler for the controlled pods. + // +optional + Recommendation RecommendedPodResources + // A free-form human readable message describing the status of the autoscaler. + StatusMessage string +} + +// UpdateMode controls when autoscaler applies changes to the pod resoures. +type UpdateMode string +const ( + // UpdateModeOff means that autoscaler never changes Pod resources. + // The recommender still sets the recommended resources in the + // VerticalPodAutoscaler object. This can be used for a "dry run". + UpdateModeOff UpdateMode = "Off" + // UpdateModeInitial means that autoscaler only assigns resources on pod + // creation and does not change them during the lifetime of the pod. + UpdateModeInitial UpdateMode = "Initial" + // UpdateModeAuto means that autoscaler assigns resources on pod creation + // and additionally can update them during the lifetime of the pod, + // including evicting / rescheduling the pod. + UpdateModeAuto UpdateMode = "Auto" +) + +// PodUpdatePolicy describes the rules on how changes are applied to the pods. +type PodUpdatePolicy struct { + // Controls when autoscaler applies changes to the pod resoures. + // +optional + UpdateMode UpdateMode +} + +const ( + // DefaultContainerResourcePolicy can be passed as + // ContainerResourcePolicy.Name to specify the default policy. + DefaultContainerResourcePolicy = "*" +) +// ContainerResourcePolicy controls how autoscaler computes the recommended +// resources for a specific container. +type ContainerResourcePolicy struct { + // Name of the container or DefaultContainerResourcePolicy, in which + // case the policy is used by the containers that don't have their own + // policy specified. + Name string + // Whether autoscaler is enabled for the container. Defaults to "On". + // +optional + Mode ContainerScalingMode + // Specifies the minimal amount of resources that will be recommended + // for the container. + // +optional + MinAllowed api.ResourceRequirements + // Specifies the maximum amount of resources that will be recommended + // for the container. + // +optional + MaxAllowed api.ResourceRequirements +} + +// PodResourcePolicy controls how autoscaler computes the recommended resources +// for containers belonging to the pod. +type PodResourcePolicy struct { + // Per-container resource policies. + ContainerPolicies []ContainerResourcePolicy +} + +// ContainerScalingMode controls whether autoscaler is enabled for a speciifc +// container. +type ContainerScalingMode string +const ( + // ContainerScalingModeOn means autoscaling is enabled for a container. + ContainerScalingModeOn ContainerScalingMode = "On" + // ContainerScalingModeOff means autoscaling is disabled for a container. + ContainerScalingModeOff ContainerScalingMode = "Off" +) + +// RecommendedPodResources is the recommendation of resources computed by +// autoscaler. +type RecommendedPodResources struct { + // Resources recommended by the autoscaler for each container. + ContainerRecommendations []RecommendedContainerResources +} + +// RecommendedContainerResources is the recommendation of resources computed by +// autoscaler for a specific container. Respects the container resource policy +// if present in the spec. +type RecommendedContainerResources struct { + // Name of the container. + Name string + // Recommended amount of resources. + Target api.ResourceRequirements + // Minimum recommended amount of resources. + // Running the application with less resources is likely to have + // significant impact on performance/availability. + // +optional + MinRecommended api.ResourceRequirements + // Maximum recommended amount of resources. + // Any resources allocated beyond this value are likely wasted. + // +optional + MaxRecommended api.ResourceRequirements +} +``` + +### Admission Controller ### + +VPA Admission Controller intercepts Pod creation requests. If the Pod is matched +by a VPA config with mode not set to “off”, the controller rewrites the request +by applying recommended resources to the Pod spec. Otherwise it leaves the Pod +spec unchanged. + +The controller gets the recommended resources by fetching +/recommendedPodResources from the Recommender. If the call times out or fails, +the controller falls back to the recommendation cached in the VPA object. +If this is also not available the controller lets the request pass-through +with originally specified resources. + +Note: in future it will be possible to (optionally) enforce using VPA by marking +the Pod as "requiring VPA". This will disallow scheduling the Pod before a +corresponding VPA config is created. The Admission Controller will reject such +Pods if it finds no matching VPA config. This ability will be convenient for the +user who wants to create the VPA config together with submitting the Pod. + +The VPA Admission Controller will be implemented as an +[External Admission Hook](https://kubernetes.io/docs/admin/extensible-admission-controllers/#external-admission-webhooks). +Note however that this depends on the proposed feature to allow +[mutating webhook admission controllers](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/api-machinery/admission_control_extension.md#future-work). + +### Recommender ### +Recommender is the main component of the VPA. It is responsible for +computing recommended resources. On startup the recommender fetches +historical resource utilization of all Pods (regardless of whether +they use VPA) together with the history of Pod OOM events from the +History Storage. It aggregates this data and keeps it in memory. + +During normal operation the recommender consumes real time updates of +resource utilization and new events via the Metrics API from +the [Metrics Server](https://github.com/kubernetes-incubator/metrics-server). +Additionally it watches all Pods and all VPA objects in the +cluster. For every Pod that is matched by some VPA selector the +Recommender computes the recommended resources and sets the +recommendation on the VPA object. + +It is important to realize that one VPA object has one recommendation. +The user is expected to use one VPA to control Pods with similar +resource usage patterns, typically a group of replicas or shards of +a single workload. + +The Recommender acts as an +[extension-apiserver](https://kubernetes.io/docs/concepts/api-extension/apiserver-aggregation/), +exposing a synchronous method that takes a Pod Spec and the Pod metadata +and returns recommended resources. + +#### Recommender API #### + +```POST /recommendationQuery``` + +Request body: +```go +// RecommendationQuery obtains resource recommendation for a pod. +type RecommendationQuery struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Spec is filled in by the caller to request a recommendation. + Spec RecommendationQuerySpec + + // Status is filled in by the server with the recommended pod resources. + // +optional + Status RecommendationQueryStatus +} + +// RecommendationQuerySpec is a request of recommendation for a pod. +type RecommendationQuerySpec struct { + // Pod for which to compute the recommendation. Does not need to exist. + Pod core.Pod +} + +// RecommendationQueryStatus is a response to the recommendation request. +type RecommendationQueryStatus { + // Recommendation holds recommended resources for the pod. + // +optional + Recommendation autoscaler.RecommendedPodResources + // Error indicates that the recommendation was not available. Either + // Recommendation or Error must be present. + // +optional + Error string +} +``` + +Notice that this API method may be called for an existing Pod, as well as for a +yet-to-be-created Pod. + +### Updater ### +VPA Updater is a component responsible for applying recommended resources to +existing Pods. +It monitors all VPA objects and Pods in the cluster, periodically fetching +recommendations for the Pods that are controlled by VPA by calling the +Recommender API. +When recommended resources significantly diverge from actually configured +resources, the Updater may decide to update a Pod. +In MVP (until in-place updates of Pod resources are available) +this means evicting Pods in order to have them recreated with the recommended +resources. + +The Updater relies on other mechanisms (such as Replica Set) to recreate a +deleted Pod. However it does not verify whether such mechanism is actually +configured for the Pod. Such checks could be implemented in the CLI and warn +the user when the VPA would match Pods, that are not automatically restarted. + +While terminating Pods is disruptive and generally undesired, it is sometimes +justified in order to (1) avoid CPU starvation (2) reduce the risk of correlated +OOMs across multiple Pods at random time or (3) save resources over long periods +of time. + +Apart from its own policy on how often a Pod can be evicted, the Updater also +respects the Pod disruption budget, by using Eviction API to evict Pods. + +The Updater only touches pods that point to a VPA with updatePolicy.mode set +to `"Auto"`. + +The Updater will also need to understand how to adjust the recommendation before +applying it to a Pod, based on the current state of the cluster (e.g. quota, +space available on nodes or other scheduling constraints). +Otherwise it may deschedule a Pod permanently. This mechanism is not yet +designed. + +### Recommendation model ### + +VPA controls the request (memory and CPU) of containers. In MVP it always sets +the limit to infinity. It is not yet clear whether there is a use-case for VPA +setting the limit. + +The request is calculated based on analysis of the current and revious runs of +the container and other containers with similar properties (name, image, +command, args). +The recommendation model (MVP) assumes that the memory and CPU consumption are +independent random variables with distribution equal to the one observed in the +last N days (recommended value is N=8 to capture weekly peaks). +A more advanced model in future could attempt to detect trends, periodicity and +other time-related patterns. + +For CPU the objective is to **keep the fraction of time when the container usage +exceeds a high percentage (e.g. 95%) of request below a certain threshold** +(e.g. 1% of time). +In this model the "CPU usage" is defined as mean usage measured over a short +interval. The shorter the measurement interval, the better the quality of +recommendations for spiky, latency sensitive workloads. Minimum reasonable +resolution is 1/min, recommended is 1/sec. + +For memory the objective is to **keep the probability of the container usage +exceeding the request in a specific time window below a certain threshold** +(e.g. below 1% in 24h). The window must be long (≥ 24h) to ensure that evictions +caused by OOM do not visibly affect (a) availability of serving applications +(b) progress of batch computations (a more advanced model could allow user to +specify SLO to control this). + +#### Handling OOMs #### +When a container is evicted due to exceeding available memory, its actual memory +requirements are not known (the amount consumed obviously gives the lower +bound). This is modelled by translating OOM events to artificial memory usage +samples by applying a "safety margin" multiplier to the last observed usage. + +### History Storage ### +VPA defines data access API for providers of historical events and resource +utilization. Initially we will use Prometheus as the reference implementation of +this API, at least for the resource utilization part. The historical events +could be backed by another solution, e.g. +[Infrastore](https://github.com/kubernetes/kubernetes/issues/44095). +Users will be able to plug their own implementations. + +History Storage is populated with real time updates of resources utilization and +events, similarly to the Recommender. The storage keeps at least 8 days of data. +This data is only used to initialize the Recommender on startup. + +### Open questions ### +1. How to resolve conflicts if multiple VPA objects match a Pod. + +2. How to adjust the recommendation before applying it to a specific pod, + based on the current state of the cluster (e.g. quota, space available on + nodes or other scheduling constraints). + +----------- +Future work +----------- + +### Pods that require VPA to start ### +In the current proposal the Pod will be scheduled with originally configured +resources if no matching VPA config is present at the Pod admission time. +This may be undesired behavior. In particular the user may want to create the +VPA config together with submitting the Pod, which leads to a race condition: +the outcome depends on which resource (VPA or the Pod) is processed first. + +In order to address this problem we propose to allow marking Pods with a special +annotation ("requires VPA") that prevents the Admission Controller from allowing +the Pod if a corresponding VPA is not available. + +An alternative would be to introduce a VPA Initializer serving the same purpose. + +### Combining vertical and horizontal scaling ### +In principle it may be possible to use both vertical and horizontal scaling for +a single workload (group of Pods), as long as the two mechanisms operate on +different resources. +The right approach is to let the +[Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) +scale the group based on the _bottleneck_ resource. The Vertical Pod Autoscaler +could then control other resources. Examples: + +1. A CPU-bound workload can be scaled horizontally based on the CPU utilization +while using vertical scaling to adjust memory. + +2. An IO-bound workload can be scaled horizontally based on the IO throughput +while using vertical scaling to adjust both memory and CPU. + +However this is a more advanced form of autoscaling and it is not well supported +by the MVP version of Vertical Pod Autoscaler. The difficulty comes from the +fact that changing the number of instances affects not only the utilization of +the bottleneck resource (which is the principle of horizontal scaling) but +potentially also non-bottleneck resources that are controlled by VPA. +The VPA model will have to be extended to take the size of the group into account +when aggregating the historical resource utilization and when producing a +recommendation, in order to allow combining it with HPA. + +### Batch workloads ### +Batch workloads have different CPU requirements than latency sensitive +workloads. Instead of latency they care about throughput, which means VPA should +base the CPU requirements on average CPU consumption rather than high +percentiles of CPU distribution. + +TODO: describe the recommendation model for the batch workloads and how VPA will +distinguish between batch and serving. A possible approach is to look at +`PodSpec.restartPolicy`. +An alternative would be to let the user specify the latency requirements of the +workload in the `PodResourcePolicy`. + +----------------------- +Alternatives considered +----------------------- + +### Pods point at VPA ### +*REJECTED BECAUSE IT REQUIRES MODIFYING THE POD SPEC* + +#### proposal: #### +Instead of VPA using label selectors, Pod Spec is extended with an optional +field `verticalPodAutoscalerPolicy`, +a [reference](https://kubernetes.io/docs/api-reference/v1/definitions/#_v1_localobjectreference) +to the VPA config. + +#### pros: #### +* Consistency is enforced at the API level: + * At most one VPA can point to a given Pod. + * It is always clear at admission stage whether the Pod should use + VPA or not. No race conditions. +* It is cheap to find the VPA for a given Pod. + +#### cons: #### +* Requires changing the core part of the API (Pod Spec). + +### VPA points at Deployment ### + +#### proposal: #### +VPA has a reference to Deployment object. Doesn’t use label selector to match +Pods. + +#### pros: #### +* More consistent with HPA. + +#### cons: #### +* Extending VPA support from Deployment to other abstractions that manage Pods + requires additional work. VPA must be aware of all such abstractions. +* It is not possible to do a rolling update of the VPA config. + For example setting `max_memory` in the VPA config will apply to the whole + Deployment immediately. +* VPA can’t be shared between deployments. + +### Actuation using the Deployment update mechanism ### + +In this solution the Deployment itself is responsible for actuating VPA +decisions. + +#### Actuation by update of spec #### +In this variant changes of resources are applied similarly to normal changes of +the spec, i.e. using the Deployment rolling update mechanism. + +**pros:** existing clean API (and implementation), one common update policy +(e.g. max surge, max unavailable). + +**cons:** conflicting with user (config) update - update of resources and spec +are tied together (they are executed at the same rate), problem with rollbacks, +problem with pause. Not clear how to handle in-place updates? (this problem has +to be solved regardless of VPA though). + +#### Dedicated method for resource update #### +In this variant Deployment still uses the rolling update mechanism for updating +resources, but update of resources is treated in a special way, so that it can +be performed in parallel with config update. + +**pros:** handles concurrent resources and spec updates, solves resource updates +without VPA, more consistent with HPA, all update logic lives in one place (less +error-prone). + +**cons:** specific to Deployment, high complexity (multiple replica set created +underneath - exposed to the user, can be confusing and error-prone). |
