Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge pull request #3570 from towca/jtuznik/scale-down-after-delete-fix #3597

Conversation

ryaneorth
Copy link

Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously

…r-delete-fix

Remove ScaleDownNodeDeleted status since we no longer delete nodes synchronously
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 9, 2020
@ryaneorth
Copy link
Author

Cherry picking changes from #3570 into 1.19

@ryaneorth
Copy link
Author

/assign @vivekbagade

@MaciekPytel
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Oct 12, 2020
@MaciekPytel
Copy link
Contributor

It seems the test failure is in clusterapi and it's completely unrelated to the actual change. The same seems to be the case for #3598.

Tagging Cluster API provider owners for help. @frobware @enxebre @elmiko @hardikdr @detiber @ncdc

@ncdc
Copy link
Member

ncdc commented Oct 12, 2020

@benmoss

@elmiko
Copy link
Contributor

elmiko commented Oct 12, 2020

@MaciekPytel thanks for highlighting that, i've taken a quick look at the test runs and it seems like we might have some sort of race condition there. i'll keep poking around.

@ryaneorth
Copy link
Author

@MaciekPytel - can we merge this and #3598 since the failing tests are unrelated to the change?

@elmiko
Copy link
Contributor

elmiko commented Oct 13, 2020

i'm not quite sure how yet, but this change (and #3598) are responsible for breaking the capi tests. granted, this could be an error in the capi implementation, but when i run the capi unit tests from master i see no issues:

$ stress ./clusterapi.test -test.run=TestControllerNodeGroups -test.cpu=10                                                                                     
24 runs so far, 0 failures                                                                                                                                     
48 runs so far, 0 failures                                                                                                                                     
74 runs so far, 0 failures                                                                                                                                     
100 runs so far, 0 failures                                                                                                                                    
124 runs so far, 0 failures                                                                                                                                    
152 runs so far, 0 failures                                                                                                                                    
176 runs so far, 0 failures                                                                                                                                    
200 runs so far, 0 failures                                                                                                                                    
228 runs so far, 0 failures                                                                                                                                    
252 runs so far, 0 failures                                                                                                                                    
280 runs so far, 0 failures                                                                                                                                    
304 runs so far, 0 failures                                                                                                                                    
332 runs so far, 0 failures                                                                                                                                    
358 runs so far, 0 failures                                                                                                                                    
384 runs so far, 0 failures                                                                                                                                    
408 runs so far, 0 failures                                                                                                                                    

but when i switch to this branch, i immediately get errors:

$ stress ./clusterapi.test -test.run=TestControllerNodeGroups -test.cpu=10      

/tmp/go-stress-20201013T160931-966456074
I1013 16:09:31.702318 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:31.703114 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
--- FAIL: TestControllerNodeGroups (0.11s)
    clusterapi_controller_test.go:1023: expected 0, got 6
I1013 16:09:31.813386 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:31.813404 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:31.917005 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:31.917031 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.020654 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.020686 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.122008 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.122030 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.222599 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.222622 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
I1013 16:09:32.326754 2330419 clusterapi_controller.go:334] Using version "v1alpha3" for API group "cluster.x-k8s.io"
I1013 16:09:32.326833 2330419 clusterapi_controller.go:411] Resource "machinedeployments" available
FAIL


ERROR: exit status 1

@elmiko
Copy link
Contributor

elmiko commented Oct 13, 2020

just to make sure i didn't miss anything, i also ran the tests against the head of the release-1.19 branch

$ stress ./clusterapi.test -test.run=TestControllerNodeGroups -test.cpu=10
24 runs so far, 0 failures
52 runs so far, 0 failures
76 runs so far, 0 failures
104 runs so far, 0 failures
132 runs so far, 0 failures
160 runs so far, 0 failures
185 runs so far, 0 failures
212 runs so far, 0 failures
240 runs so far, 0 failures
264 runs so far, 0 failures
292 runs so far, 0 failures
316 runs so far, 0 failures
344 runs so far, 0 failures
368 runs so far, 0 failures
395 runs so far, 0 failures
420 runs so far, 0 failures

@ryaneorth
Copy link
Author

Thanks @elmiko ! I also don't see any reason why this change would cause these tests to fail.

@elmiko
Copy link
Contributor

elmiko commented Oct 14, 2020

@ryaneorth i'm not sure exactly why it's causing the failure, i can probably do a little more digging tomorrow or perhaps on tuesday.

@ryaneorth
Copy link
Author

thanks! I'll also try to dig in a bit

@elmiko
Copy link
Contributor

elmiko commented Oct 14, 2020

ok, figured this out. i thought it sounded familiar in the beginning but it didn't occur to me until i noticed the backports.

we need to cherry pick #3441 as well to ensure that those tests work in 1.19. i'm not sure about the failures in 1.18, but it might be needed there as well since we backported the unstructured changes to the cluster-api provider.

edit:

the good news is that this is just a failure of the unit tests and not an indication of deeper failure in the actual provider code.

@ryaneorth
Copy link
Author

Great find @elmiko ! I've submitted two new PRs for backporting #3441 to 1.18 and 1.19:

#3612
#3613

@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 14, 2020
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
  • If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
  • Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: [email protected]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Oct 14, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 14, 2020
@ryaneorth
Copy link
Author

CLA signed

Copy link
Contributor

@mwielgus mwielgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 15, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MaciekPytel, mwielgus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 814935b into kubernetes:cluster-autoscaler-release-1.19 Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants