Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Gardener Horizontal & Vertical Pod Autoscaler, a.k.a. HVPA (v2) #30

Open
13 of 25 tasks
amshuman-kr opened this issue Sep 16, 2019 · 0 comments
Open
13 of 25 tasks
Labels
kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic kind/roadmap Roadmap BLI lifecycle/rotten Nobody worked on this for 12 months (final aging stage)
Milestone

Comments

@amshuman-kr
Copy link
Collaborator

amshuman-kr commented Sep 16, 2019

Feature (What you would like to be added):
Summarise the roadmap for HVPA with links to the corresponding issues.

Motivation (Why is this needed?):
A central place to collect the roadmap as well as the progress.

Approach/Hint to the implement solution (optional):

General Principles
  • The goal of HVPA is to re-use the upstream components HPA and VPA as much as possible for scaling components horizontally and vertically respectively.
    • HPA for recommendation and scaling horizontally.
    • VPA for recommendation for scaling vertically.
  • Where there are gaps in using HPA and VPA simultaneously to scale a given component, introduce functionality to fill those gaps.
    • HPA and VPA are recommended to be mixed only in case HPA is used for custom/external metrics as mentioned here. But in some scenarios it might make sense to use them both even for CPU and memory (e.g. kube-apiserver or ingress etc.)
    • VPA updates the pods directly (via webhooks) wheras HPA scales the upstreamtargetRefs. The VPA approach duplicates/overrides the update mechanism (such as rolling updates etc.) that the upstream targetRefs might have implemented.
  • Where there is functionality missing in either HPA or VPA, introduce them to provide more flexibility during horizontal and vertical scaling. Especially, for the components that experience disruption during scaling.
    • Weight-based scaling horizontally and vertically simultaneously.
    • Support for configurable (at the HVPA resource level) threshold levels to trigger VPA (and possibly HPA) updates to minimise unnecessary scaling of components. Especially, if scaling is disruptive.
  • Support for configurable (at the HVPA resource level) stabilisation window in all the four directions (up/down/out/in) to stabilise scaling of components. Especially, if scaling is disruptive.
  • Support for configurable maintenance window (at the HVPA resource level) for scaling (especially, scaling in/down) for components that do not scale well smoothly (mainly, etcd, but to a lesser extent kube-apiserver as well for WATCH requests). This could be as an alternative or complementary to the stabilisation window mentioned above.
  • Support for flexible update policy for all four scaling directions (Off/Auto/ScaleUp). ScaleUp would only apply scale up and not scale down (vertically or horizontally). This is again from the perspective of components which experience disruption while scaling (mainly, etcd, but to a lesser extent kube-apiserver as well for WATCH requests). For such components, a ScaleUp update policy will ensure that the component can scale up (with some disruption) automatically to meet the workload requirement but not scale down to avoid unnecessary disruption. This would mean over-provisioning for workloads that experience a short upsurge.
  • Alerts when some percentage threshold of the maxAllowed is reached for requests of any container in the targetRef.
  • Support for custom resources as targetRefs.
Tasks
  • HVPA custom resource to include templates for HPA and VPA.
  • Controller logic to deploy and reconcile HPA and VPA resources based on the templates in the HVPA spec.
  • Controller logic to adopt pre-existing HPA and VPA resources if they match the selectors in the HVPA spec.
  • Auto update policy for HPA updates. HPA takes care of both recommendation and updates for horizontal scaling. This implementation of Auto update policy is temporary pending Evaluate options for controlling HPA-based scaling #7.
  • Off, Auto and ScaleUp update policy for VPA updates. VPA is used only for recommendation and not for updates. Fixed with HVPA now supports UpdateMode "off" for HPA and VPA #19.
  • Off update policy for HPA updates. Implemented by not deploying/deleting HPA resource. This implementation if Off update policy is temporary temporary pending Evaluate options for controlling HPA-based scaling #7.
  • Weight-based scaling for VPA updates with any value between 0 and 100 for VPA weight. Fixed with HVPA now supports UpdateMode "off" for HPA and VPA #19.
  • Weight-based scaling for HPA updates with values 0 or 100 for HPA weight.
  • Update proposal/documentation to be in sync with feature and behaviour changes. This is an on-going task.
  • Consolidate and keep up to date FAQs/Recommended Actions documents as a first point of reference to operators/admin. This is an on-going task.
  • Release HVPA implemented so far in different landscapes to gain experience. Prio 1.
  • Enable Auto update policy (i.e. enable scale down) for kube-server to reduce cost implication. ScaleUp update policy would continue for etcd for the time being because it could be disruptive. Prio 1.
  • If an OOMKill or CPU overload happens, override stabilisation window as well as HPA weight to apply the weighted VPA recommendation. Prio 1.
  • Auto scale limits to be in sync with scaling of requests. Prio 1.
  • Unit tests and Integration tests (using Test Machinery). Prio 2.
  • Alerts when some percentage threshold of the maxAllowed is reached for requests of any container in the targetRef. Prio 2.
  • Scale down during a maintenance window. This would be used for components that experience disruption during scaling. Prio 2.
  • Implement and use the Scale subresource in HVPA CRD to control HPA update fully and us HPA only for recommendation. Pending Evaluate options for controlling HPA-based scaling #7. Prio 3.
    • Implement ScaleUp update policy for HPA updates. Prio 3.
    • Change the Off update policy implementation for HPA to deploy/reconcile HPA resource even in the Off mode. Retain the recommendations but block the updates. Prio 3.
    • Weight-based scaling for HPA updates with any value between 0 to 100 as weight for HPA. Prio 3.
  • Submit and drive adoption of KEP for Resources subresource (per container) along the lines of the Scale subresource. This can then be used to implement the support for custom resources as targetRef. Prio 4.
  • Recovery/ramp-up of overloaded/crashing targetRef. Prio 5.
  • Pro-actively throttle/ramp-down soon-to-be overloaded targetRef to avoid crash. Prio 5.
  • Support for custom resources as targetRef. If the KEP for Resources subresources is not yet accepted, then this could be implemented using annotations to supply the desired metadata. Prio 6.
@amshuman-kr amshuman-kr added kind/epic Large multi-story topic kind/enhancement Enhancement, improvement, extension labels Sep 16, 2019
@ghost ghost added lifecycle/stale Nobody worked on this for 6 months (will further age) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Dec 1, 2019
@ghost ghost added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Apr 6, 2020
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Jun 6, 2020
@amshuman-kr amshuman-kr added roadmap/team-internal and removed lifecycle/rotten Nobody worked on this for 12 months (final aging stage) labels Oct 23, 2020
@vlerenc vlerenc changed the title [Feature] HVPA Roadmap ☂️ Gardener Horizontal & Vertical Pod Autoscaler, a.k.a. HVPA (v2) Nov 5, 2020
@vlerenc vlerenc added this to the 2020-Q4 milestone Nov 10, 2020
@vlerenc vlerenc modified the milestones: 2020-Q4, 2021-Q1 Mar 5, 2021
@amshuman-kr amshuman-kr modified the milestones: 2021-Q1, 2021-Q3 Jun 9, 2021
@shreyas-s-rao shreyas-s-rao modified the milestones: 2021-Q3, 2022-Q1 Oct 6, 2021
@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Apr 5, 2022
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Oct 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic kind/roadmap Roadmap BLI lifecycle/rotten Nobody worked on this for 12 months (final aging stage)
Projects
None yet
Development

No branches or pull requests

4 participants