Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AKS cluster agent pool with fix for autoscaling and count. #4543

Closed
wants to merge 5 commits into from

Conversation

sschne
Copy link
Contributor

@sschne sschne commented Oct 7, 2019

This PR is an extension of #4046

Master is merged back into the feature branch.

This PR improves the handling of the count attribute when autoscaling is enabled ( Fixes #4075 ).
When autoscaling is enabled and count attribute is different from the actual count of the currently scaled VMSS, Azure fails to update the agent pool profile and reports

Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="Cannot manually scale AgentPool '' of an AKS cluster with cluster autoscaler enabled. Please disable cluster autoscaler in the cluster to manually scale."

To circumvent this, we need to make the count attribute unavailable to set from the users perspective. Therefore, default value of count is unset and a check is included, so that the user is unable to set count when autoscaling config is present. The correct count is fetched from the Get Agentpool request, when the resource is updated. If the resource is new, a sane default value of MinCount is send with the create request.

@ghost ghost added the size/XL label Oct 7, 2019
@tombuildsstuff
Copy link
Contributor

hi @pather87

Thanks for this PR - apologies for the delay reviewing this!

I've spent some time over the last couple of weeks trying to work out how to consolidate this PR, #4256, #4676, #4046 and #4472, since they're all attempting to solve the same problem (a breaking behavioural change in the AKS API, where non-default node pools now have to be managed via a separate API) in different ways.

After spending some time investigating/experimenting with this I believe the best approach going forward is to introduce a replacement default_node_pool block which is limited to a single element and then deprecate the existing agent_pool_profiles block which can then be removed in 2.0.

This allows existing users to continue to use the agent_pool_profiles field if they need to and migrate across to the default_node_pool object on their own timeline. In addition this allows for Azure regions which haven't rolled these changes out (e.g. China/Germany/Government) to continue to use the existing functionality if necessary. The default_node_pool block can then become Required in 2.0 (at which point the existing agent_pool_profiles block will be removed). At the same time we can handle the other breaking behavioural change mentioned in #4465 by switching the default Node Pool type to VirtualMachineScaleSets.

Whilst this isn't ideal since users will need to migrate at some point - it seems preferable from a UX perspective to manage these as separate resources, rather than inline (which also allows users to order node pool creation/destruction if necessary).

As such whilst we'd like to thank you for this contribution (and apologise again for the delay reviewing it!); ultimately we're going to take a different direction here and thus I'm going to close this PR in favour of #4898 which introduces the new default_node_pool block mentioned above. Once #4898 is merged we should be able to push/rebase #4046 which introduces the new Node Pool resource; at which point we can then rebase #4472 - collectively allowing us to add support for all of this functionality.

Thanks!

@ghost
Copy link

ghost commented Mar 29, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Mar 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Autoscaling enabled AKS Cluster leads to error on terraform apply, tough no changes on aks planned
3 participants