Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured #3312

Merged
merged 2 commits into from
Aug 19, 2020

Conversation

detiber
Copy link
Member

@detiber detiber commented Jul 10, 2020

Currently we define internal types for use with Cluster API and are serializing/deserializing to/from these internal types as we interact with Cluster API resources. I believe this causes confusion since the internal types do not necessarily represent the external Cluster API types that we are interacting with (that may span multiple versions). It also makes adding support for fields that may or may not exist for a subset of Cluster API versions more difficult to maintain (since we are attempting to abstract these differences away as part of the serialization/deserialization process).

This PR removes the use of internal type representations all together in favor of Unstructured directly.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 10, 2020
@detiber
Copy link
Member Author

detiber commented Jul 10, 2020

/assign @elmiko @JoelSpeed

@elmiko
Copy link
Contributor

elmiko commented Jul 10, 2020

Jason, this looks cool. i'm still reviewing and testing, but i'm amazed at how much we can remove switching to the unstructured. nicely done!

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was trying to get this change built so i could test on a local cluster but i keep this hitting this:

$ make clean build
rm -f cluster-autoscaler
CGO_ENABLED=0 GO111MODULE=off GOOS=linux go build --ldflags "-s"  ./...
# k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/component-base/cli/flag
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:51:51: undefined: tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:52:51: undefined: tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
make: *** [Makefile:25: build] Error 2

i tried to clean extra files, and also looked at rebuilding the vendor dir, any advice?

@@ -142,22 +146,32 @@ func BuildClusterAPI(opts config.AutoscalingOptions, do cloudprovider.NodeGroupD
}

// Grab a dynamic interface that we can create informers from
dc, err := dynamic.NewForConfig(externalConfig)
managementClient, err := dynamic.NewForConfig(externalConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we were separating this work into the follow up commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance I just renamed the variable to be more descriptive of it's use, there is no separation between kubeconfigs that are used by the clients with this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cool. i figured that was the case but wanted to be sure.

@elmiko
Copy link
Contributor

elmiko commented Jul 13, 2020

just a follow up, i am seeing this build failure on master as well. so either i have a big problem or the repo does.

edit: i created #3315 just in case

@detiber
Copy link
Member Author

detiber commented Jul 13, 2020

i was trying to get this change built so i could test on a local cluster but i keep this hitting this:

$ make clean build
rm -f cluster-autoscaler
CGO_ENABLED=0 GO111MODULE=off GOOS=linux go build --ldflags "-s"  ./...
# k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/component-base/cli/flag
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:51:51: undefined: tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
vendor/k8s.io/component-base/cli/flag/ciphersuites_flag.go:52:51: undefined: tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
make: *** [Makefile:25: build] Error 2

i tried to clean extra files, and also looked at rebuilding the vendor dir, any advice?

Interesting, I'm not seeing any build errors locally, have you tried using the build-in-docker make target? I'm not sure if it's directly related, but I've seen similar cipher suite mismatch errors when trying to build k/k using the wrong version of Go.

@elmiko
Copy link
Contributor

elmiko commented Jul 13, 2020

Interesting, I'm not seeing any build errors locally, have you tried using the build-in-docker make target? I'm not sure if it's directly related, but I've seen similar cipher suite mismatch errors when trying to build k/k using the wrong version of Go.

i haven't tried the build in docker yet, it was only a few commits ago that things broke. i was just trying to rebase your stuff on top of the old commit. probably easier to just build in docker lol

@elmiko
Copy link
Contributor

elmiko commented Jul 13, 2020

ok, got my build working.

i tested this against a CAPD cluster(joined), i tried out scaling up and down with a workload and deleting the workload, also tested adding a new MachineDeployment and scaling that.

looks good so far, i didn't hit any errors with operation of the autoscaler.

/area provider/cluster-api
/lgtm

@k8s-ci-robot k8s-ci-robot added area/provider/cluster-api Issues or PRs related to Cluster API provider lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Jul 13, 2020
Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a couple of comments that might need an update, otherwise LGTM

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2020
@JoelSpeed
Copy link
Contributor

Looks good, thanks!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2020
@enxebre
Copy link
Member

enxebre commented Jul 17, 2020

thanks a lot @detiber! This looks great. I'm planning to have a deeper look at this on Monday.
Since this is a big refactor and kube 1.19 should be code frozen by now and given the unfortunate lack of e2e testing we have atm for this provider I'd suggest we hold the merge until the ca main branch opens for 1.20 and proceed with any backport if we need so. Does it make sense to you?

@detiber
Copy link
Member Author

detiber commented Jul 17, 2020

@enxebre I would like to avoid holding off until 1.20, only because this helps enable the other functionality for running against workload clusters that exist in a separate management controller vs only against self-managed clusters. That said, I will defer to the larger community to make that decision.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2020
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 20, 2020
Remove internal types for Cluster API and replace with unstructured access
@detiber
Copy link
Member Author

detiber commented Aug 5, 2020

@enxebre now that the 1.19 release has been cut, any objections to proceeding with this?

@enxebre
Copy link
Member

enxebre commented Aug 6, 2020

@MaciekPytel is the main branch targeting 1.20 now?
thanks @detiber!
/approve

PTAL @JoelSpeed @elmiko

@JoelSpeed
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 6, 2020
@detiber
Copy link
Member Author

detiber commented Aug 6, 2020

@MaciekPytel is the main branch targeting 1.20 now?

I'm not Maciek, but yes the main branch is targeting 1.20, the cluster-autoscaler-release-1.19 branch has already be cut along with the 1.19.0 release.

@detiber
Copy link
Member Author

detiber commented Aug 6, 2020

@MaciekPytel this should be good to go now, need cluster-autoscaler approval for the vendor changes

/assign @MaciekPytel

@elmiko
Copy link
Contributor

elmiko commented Aug 6, 2020

thanks Jason!

@ncdc
Copy link
Member

ncdc commented Aug 17, 2020

👋 hi, are there any updates on getting this reviewed for approval? Thanks!

@elmiko
Copy link
Contributor

elmiko commented Aug 17, 2020

i think we are good on this from the CAPI side, just need an autoscaler owner to give approval. cc @mwielgus @MaciekPytel

@ncdc
Copy link
Member

ncdc commented Aug 18, 2020

@elmiko it doesn't look like we need root OWNERS approval, just cluster-autoscaler. Do you think we should ask someone in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/OWNERS?

@elmiko
Copy link
Contributor

elmiko commented Aug 18, 2020

@elmiko it doesn't look like we need root OWNERS approval, just cluster-autoscaler. Do you think we should ask someone in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/OWNERS?

++, thanks for pointing that out Andy =)

cc @aleksandra-malinowska @feiskyer @vivekbagade

@feiskyer
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre, feiskyer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 19, 2020
@k8s-ci-robot k8s-ci-robot merged commit 5159dae into kubernetes:master Aug 19, 2020
benmoss pushed a commit to benmoss/autoscaler that referenced this pull request Sep 28, 2020
[cluster-autoscaler][clusterapi] Remove internal types in favor of unstructured
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
enxebre added a commit to enxebre/autoscaler that referenced this pull request Jul 13, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
navinjoy pushed a commit to navinjoy/autoscaler that referenced this pull request Oct 26, 2022
This ensured that access to replicas during scale down operations were never stale by accessing the API server kubernetes#3104.
This honoured that behaviour while moving to unstructured client kubernetes#3312.
This regressed that behaviour while trying to reduce the API server load kubernetes#4443.
This put back the never stale replicas behaviour at the cost of loading back the API server kubernetes#4634.

Currently on e.g a 48 minutes cluster it does 1.4k get request to the scale subresource.
This PR tries to satisfy both non stale replicas during scale down and prevent the API server from being overloaded. To achieve that it lets targetSize which is called on every autoscaling cluster state loop from come from cache.

Also note that the scale down implementation has changed https://github.com/kubernetes/autoscaler/commits/master/cluster-autoscaler/core/scaledown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/cluster-api Issues or PRs related to Cluster API provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants