Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flag for exclusive usage of template infos (with warnings) #3609

Closed

Conversation

bpineau
Copy link
Contributor

@bpineau bpineau commented Oct 13, 2020

This is a second attempt at #1021 (might also provides an alternate solution for #2892 and #3608).

Per #1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods).

Some uses cases includes:

Per previous discussion, the later two cases could be covered in the long term by custom node-shape discovery Processors (open to discuss that option too).

@bpineau
Copy link
Contributor Author

bpineau commented Oct 13, 2020

@losipiuk and @MaciekPytel as you took part to the previous attempt discussion.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 13, 2020
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 13, 2020
cluster-autoscaler/core/utils/utils.go Outdated Show resolved Hide resolved
cluster-autoscaler/core/utils/utils.go Outdated Show resolved Hide resolved
cluster-autoscaler/core/utils/utils.go Show resolved Hide resolved
@bpineau bpineau force-pushed the ScaleUpTemplateFromCloudProvider branch from 91e126a to 358f5d0 Compare October 15, 2020 06:05
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 13, 2021
@bpineau bpineau force-pushed the ScaleUpTemplateFromCloudProvider branch from 358f5d0 to 35e2bf6 Compare January 18, 2021 15:34
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bpineau
To complete the pull request process, please assign feiskyer after the PR has been reviewed.
You can assign the PR to them by writing /assign @feiskyer in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mwielgus
Copy link
Contributor

Do we still need this PR? If yes, please resolve conflicts.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2021
This is a second attempt at kubernetes#1021 (might also provides an alternate
solution for kubernetes#2892 and kubernetes#3608).

Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting
to false and describing the limitations (not all cloud providers)
and the risk of using it (loss of accuracy, risk of upscaling
unusable nodes or leaving pending pods).

Some uses cases includes:
* Balance Similar when uspscaling from zero (but I believe kubernetes#3608 is a better approach)
* Edited/updated ASGs/MIGs taints and labels
* Updated instance type

Per previous discussion, the later two cases could be covered in
the long term by custom node-shape discovery Processors (open to
discuss that option too).
@bpineau bpineau force-pushed the ScaleUpTemplateFromCloudProvider branch from 35e2bf6 to 8a4f7c9 Compare January 25, 2021 11:23
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2021
@bpineau
Copy link
Contributor Author

bpineau commented Jan 25, 2021

@mwielgus yes, rebased

@MaciekPytel
Copy link
Contributor

I wonder what's the relationship between this and #3761? It seems that you'd want to use one or the other? If so maybe we can consider packing this into another NodeInfo processor? I think the implementation would be pretty much a cut-paste of the change in this PR.
I know this one doesn't add as much complexity as #3761, but core/ is already one huge code spaghetti and I'm trying to limit changes to it as much as possible to avoid bloating it even further.

@bpineau
Copy link
Contributor Author

bpineau commented Jan 25, 2021

Yes, using #3671 should progressively replace this feature for us (once find alternatives to the habit of updating instance types in place, which came as a side effect of this initially "fix balancing + scale from zero" change, ie. as for #1021 original motivation, but as a second effect here). I can definitely look at making this a Nodeinfo processor.

@mwielgus
Copy link
Contributor

Ok then let's use #3671 and do it as NodeInfo processor/function. Closing this PR in wait for the new one, but feel free to reopen if you prefer to refactor this one.

@mwielgus mwielgus closed this Feb 12, 2021
bpineau added a commit to DataDog/autoscaler that referenced this pull request Apr 8, 2021
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609).
Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type.

Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods).

Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.

`GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.
bpineau added a commit to DataDog/autoscaler that referenced this pull request Apr 8, 2021
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609).
Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type.

Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods).

Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.

`GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.
bpineau added a commit to DataDog/autoscaler that referenced this pull request Apr 8, 2021
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609).
Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type.

Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods).

Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.
bpineau added a commit to DataDog/autoscaler that referenced this pull request Apr 9, 2021
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609).
Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type.

Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods).

Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.
@bpineau
Copy link
Contributor Author

bpineau commented Apr 16, 2021

Re-did it it as NodeInfo processor/function, in PR #4000, would you mind having a look?

bpineau added a commit to DataDog/autoscaler that referenced this pull request Apr 19, 2021
This is a third attempt at kubernetes#1021 (might also provides an alternate solution for kubernetes#2892 and kubernetes#3608 amd kubernetes#3609).
Some uses cases includes: balance Similar when uspscaling from zero, edited/updated ASGs/MIGs taints and labels, updated instance type.

Per kubernetes#1021 discussion, a flag might be acceptable, if defaulting to false and describing the limitations (not all cloud providers) and the risk of using it (loss of accuracy, risk of upscaling unusable nodes or leaving pending pods).

Per kubernetes#3609 discussion, using a NodeInfo processor is prefered.

`GetNodeInfosForGroups()` fate (and the opportunity to split NodeInfoProcessor in NodeInfoProcessor and NodeInfoDecoratorProcessor) left open for discussion as I'd like to collect feedback on refactoring plans here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants