Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3597

ryaneorth · 2020-10-09T21:27:15Z

Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously

…r-delete-fix Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously

ryaneorth · 2020-10-09T21:30:26Z

Cherry picking changes from #3570 into 1.19

ryaneorth · 2020-10-09T21:34:59Z

/assign @vivekbagade

MaciekPytel · 2020-10-12T14:07:12Z

/lgtm
/approve

MaciekPytel · 2020-10-12T14:21:30Z

It seems the test failure is in clusterapi and it's completely unrelated to the actual change. The same seems to be the case for #3598.

Tagging Cluster API provider owners for help. @frobware @enxebre @elmiko @hardikdr @detiber @ncdc

ncdc · 2020-10-12T14:22:51Z

@benmoss

elmiko · 2020-10-12T17:41:00Z

@MaciekPytel thanks for highlighting that, i've taken a quick look at the test runs and it seems like we might have some sort of race condition there. i'll keep poking around.

ryaneorth · 2020-10-13T18:45:38Z

@MaciekPytel - can we merge this and #3598 since the failing tests are unrelated to the change?

elmiko · 2020-10-13T20:12:04Z

i'm not quite sure how yet, but this change (and #3598) are responsible for breaking the capi tests. granted, this could be an error in the capi implementation, but when i run the capi unit tests from master i see no issues:

$ stress ./clusterapi.test -test.run=TestControllerNodeGroups -test.cpu=10                                                                                     
24 runs so far, 0 failures                                                                                                                                     
48 runs so far, 0 failures                                                                                                                                     
74 runs so far, 0 failures                                                                                                                                     
100 runs so far, 0 failures                                                                                                                                    
124 runs so far, 0 failures                                                                                                                                    
152 runs so far, 0 failures                                                                                                                                    
176 runs so far, 0 failures                                                                                                                                    
200 runs so far, 0 failures                                                                                                                                    
228 runs so far, 0 failures                                                                                                                                    
252 runs so far, 0 failures                                                                                                                                    
280 runs so far, 0 failures                                                                                                                                    
304 runs so far, 0 failures                                                                                                                                    
332 runs so far, 0 failures                                                                                                                                    
358 runs so far, 0 failures                                                                                                                                    
384 runs so far, 0 failures                                                                                                                                    
408 runs so far, 0 failures

but when i switch to this branch, i immediately get errors:

$ stress ./clusterapi.test -test.run=TestControllerNodeGroups -test.cpu=10      

/tmp/go-stress-20201013T160931-966456074
I1013 16:09:31.702318 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:31.703114 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
--- FAIL: TestControllerNodeGroups (0.11s)
    clusterapi_controller_test.go:1023: expected 0, got 6
I1013 16:09:31.813386 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:31.813404 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:31.917005 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:31.917031 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.020654 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.020686 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.122008 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.122030 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.222599 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.222622 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.326754 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.326833 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
FAIL


ERROR: exit status 1

elmiko · 2020-10-13T20:22:21Z

just to make sure i didn't miss anything, i also ran the tests against the head of the release-1.19 branch

$ stress ./clusterapi.test -test.run=TestControllerNodeGroups -test.cpu=10
24 runs so far, 0 failures
52 runs so far, 0 failures
76 runs so far, 0 failures
104 runs so far, 0 failures
132 runs so far, 0 failures
160 runs so far, 0 failures
185 runs so far, 0 failures
212 runs so far, 0 failures
240 runs so far, 0 failures
264 runs so far, 0 failures
292 runs so far, 0 failures
316 runs so far, 0 failures
344 runs so far, 0 failures
368 runs so far, 0 failures
395 runs so far, 0 failures
420 runs so far, 0 failures

ryaneorth · 2020-10-14T15:24:35Z

Thanks @elmiko ! I also don't see any reason why this change would cause these tests to fail.

elmiko · 2020-10-14T16:50:21Z

@ryaneorth i'm not sure exactly why it's causing the failure, i can probably do a little more digging tomorrow or perhaps on tuesday.

ryaneorth · 2020-10-14T17:19:32Z

thanks! I'll also try to dig in a bit

elmiko · 2020-10-14T20:13:47Z

ok, figured this out. i thought it sounded familiar in the beginning but it didn't occur to me until i noticed the backports.

we need to cherry pick #3441 as well to ensure that those tests work in 1.19. i'm not sure about the failures in 1.18, but it might be needed there as well since we backported the unstructured changes to the cluster-api provider.

edit:

the good news is that this is just a failure of the unit tests and not an indication of deeper failure in the actual provider code.

ryaneorth · 2020-10-14T20:27:16Z

Great find @elmiko ! I've submitted two new PRs for backporting #3441 to 1.18 and 1.19:

#3612
#3613

…9' into cherry-pick-3570-1.19

k8s-ci-robot · 2020-10-14T21:17:16Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

ryaneorth · 2020-10-14T21:56:00Z

CLA signed

mwielgus

/lgtm
/approve

k8s-ci-robot · 2020-10-15T08:32:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [MaciekPytel,mwielgus]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Merge pull request kubernetes#3570 from towca/jtuznik/scale-down-afte…

2b98e02

…r-delete-fix Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 9, 2020

k8s-ci-robot requested review from aleksandra-malinowska and feiskyer October 9, 2020 21:27

ryaneorth mentioned this pull request Oct 9, 2020

scale-down-delay-after-delete parameter doesn't work properly #3568

Closed

k8s-ci-robot assigned vivekbagade Oct 9, 2020

k8s-ci-robot assigned MaciekPytel Oct 12, 2020

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 12, 2020

MaciekPytel mentioned this pull request Oct 14, 2020

Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3600

Merged

Merge remote-tracking branch 'upstream/cluster-autoscaler-release-1.1…

a02e9a5

…9' into cherry-pick-3570-1.19

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 14, 2020

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 14, 2020

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 14, 2020

mwielgus approved these changes Oct 15, 2020

View reviewed changes

k8s-ci-robot assigned mwielgus Oct 15, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2020

k8s-ci-robot merged commit 814935b into kubernetes:cluster-autoscaler-release-1.19 Oct 15, 2020

MaciekPytel mentioned this pull request Nov 2, 2020

[cluster-autoscaler] Backport fixes for packet provider to release-1.17 #3656

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3597

Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3597

ryaneorth commented Oct 9, 2020

ryaneorth commented Oct 9, 2020

ryaneorth commented Oct 9, 2020

MaciekPytel commented Oct 12, 2020

MaciekPytel commented Oct 12, 2020

ncdc commented Oct 12, 2020

elmiko commented Oct 12, 2020 •

edited

Loading

ryaneorth commented Oct 13, 2020

elmiko commented Oct 13, 2020

elmiko commented Oct 13, 2020

ryaneorth commented Oct 14, 2020

elmiko commented Oct 14, 2020

ryaneorth commented Oct 14, 2020

elmiko commented Oct 14, 2020 •

edited

Loading

ryaneorth commented Oct 14, 2020

k8s-ci-robot commented Oct 14, 2020

ryaneorth commented Oct 14, 2020

mwielgus left a comment

k8s-ci-robot commented Oct 15, 2020

Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3597

Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3597

Conversation

ryaneorth commented Oct 9, 2020

ryaneorth commented Oct 9, 2020

ryaneorth commented Oct 9, 2020

MaciekPytel commented Oct 12, 2020

MaciekPytel commented Oct 12, 2020

ncdc commented Oct 12, 2020

elmiko commented Oct 12, 2020 • edited Loading

ryaneorth commented Oct 13, 2020

elmiko commented Oct 13, 2020

elmiko commented Oct 13, 2020

ryaneorth commented Oct 14, 2020

elmiko commented Oct 14, 2020

ryaneorth commented Oct 14, 2020

elmiko commented Oct 14, 2020 • edited Loading

ryaneorth commented Oct 14, 2020

k8s-ci-robot commented Oct 14, 2020

ryaneorth commented Oct 14, 2020

mwielgus left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 15, 2020

elmiko commented Oct 12, 2020 •

edited

Loading

elmiko commented Oct 14, 2020 •

edited

Loading