Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: <carry>: openshift: Implement scale from zero #137

Conversation

elmiko
Copy link

@elmiko elmiko commented Mar 12, 2020

This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:

apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
      machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
      machine.openshift.io/vCPU: "2"
      machine.openshift.io/memoryMb: 8G
      machine.openshift.io/GPU: "1"
      machine.openshift.io/maxPods: "100"

please note that machine.openshift.io/maxPods is not required for scale to/from zero operations.

this PR is a continuation of the work started in #110

Counterpart PRs:
openshift/cluster-api-provider-aws#301
openshift/cluster-api-provider-azure#112
Design details:
openshift/enhancements#186

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 12, 2020
@enxebre
Copy link
Member

enxebre commented Mar 12, 2020

Counterpart PRs:
openshift/cluster-api-provider-aws#301
openshift/cluster-api-provider-azure#112
Design details:
openshift/enhancements#186

Can we update the description to match the annotations in the links above and include this links as well?

@elmiko elmiko force-pushed the openshift/scale-from-0-using-annotations branch from 65be1d7 to e572aba Compare March 16, 2020 20:48
@elmiko
Copy link
Author

elmiko commented Mar 16, 2020

updated to incorporate the comments here, and squashed my additions into a single commit.

@elmiko elmiko force-pushed the openshift/scale-from-0-using-annotations branch from e572aba to 8dbc526 Compare March 17, 2020 14:23
@enxebre
Copy link
Member

enxebre commented Mar 18, 2020

This is looking great @elmiko. We need to rebase as #137 got in.
In parallel can we put a PR against https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/autoscaler/autoscaler.go#L209 to include and validate scaling from zero scenario?

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2020
@elmiko elmiko force-pushed the openshift/scale-from-0-using-annotations branch from 8dbc526 to 9c4af16 Compare March 18, 2020 13:27
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2020
@elmiko
Copy link
Author

elmiko commented Mar 18, 2020

rebased, still working on the outstanding comments

@enxebre
Copy link
Member

enxebre commented Mar 23, 2020

can we conflate both commits into one? the split is meaningless since the second one is introducing changes over the first one but we never had support for the first one.

@elmiko
Copy link
Author

elmiko commented Mar 23, 2020

This is looking great @elmiko. We need to rebase as #137 got in.

sounds good, i will rebase again and squash the 2 commits into one.

In parallel can we put a PR against https://github.com/openshift/cluster-api-actuator-pkg/blob/master/pkg/autoscaler/autoscaler.go#L209 to include and validate scaling from zero scenario?

i am still working through my understanding of the test framework, but i will give it my best =)

@elmiko elmiko force-pushed the openshift/scale-from-0-using-annotations branch 5 times, most recently from 9bcacb3 to 17014e1 Compare March 31, 2020 22:31
@elmiko
Copy link
Author

elmiko commented Mar 31, 2020

@JoelSpeed ptal, i think i've covered all the fields we are manually copying through the converters, even the timestamps!

after looking deeper, we are not actually copying more from the underlying machines, etc... so i feel good that this test should be covering everything we have added.

i have squashed my extra commit, and i just need to clean up the last few comments about the varying unit types for the memory/cpu stuff.

@elmiko elmiko force-pushed the openshift/scale-from-0-using-annotations branch from 17014e1 to c06ed43 Compare March 31, 2020 22:46
@elmiko
Copy link
Author

elmiko commented Mar 31, 2020

one more comment for tonight, i cleaned up the tests a little more (to make the converters stuff look similar to the others).

@JoelSpeed i also added a few more unit type tests, i'm curious how many of these should we add? i'm avoiding creating some matrix of every possible combo... ;)

@JoelSpeed
Copy link

I've reviewed this and am happy for this to merge on the 2 provisos:

  • We follow up on the cloud providers machineset controllers to set the memory units properly
  • We follow up to do fuzzing and improve the conversion stuff

@JoelSpeed i also added a few more unit type tests, i'm curious how many of these should we add? i'm avoiding creating some matrix of every possible combo... ;)

I think what you've got is good for now, though we aren't currently testing all of the code in that file, we aren't testing the structured to unstructured converters at the moment so that's why my second proviso is there, this is the kind of stuff that is easily gonna trip someone up as explained offline yesterday

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2020
@enxebre
Copy link
Member

enxebre commented Apr 1, 2020

related to @JoelSpeed comment above kubernetes#3011
/approve
/retest
/hold
Please un hold once it's verified this passes the validation here openshift/cluster-api-actuator-pkg#140

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2020
@elmiko
Copy link
Author

elmiko commented Apr 1, 2020

Please un hold once it's verified this passes the validation here openshift/cluster-api-actuator-pkg#140

ack, i will spend some time with those tests and then post results here when it's working.

@elmiko
Copy link
Author

elmiko commented Apr 1, 2020

i am fairly confident that this code is passing the end-to-end tests from cluster-api-actuator-pkg#140.

i have run the "from/to zero" tests and appear to be working as expected:

$ hack/ci-integration.sh -v -focus="zero"  
...
[3] • [SLOW TEST:398.530 seconds]
[3] [Feature:Machines] Autoscaler should
[3] /home/mike/workspace/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:210
[3]   It scales from/to zero
[3]   /home/mike/workspace/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:453
[3] ------------------------------
[3] 
[3] Ran 1 of 5 Specs in 398.534 seconds
[3] SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 4 Skipped
[3] PASS

Ginkgo ran 1 suite in 6m43.904044806s
Test Suite Passed

i am following up with the full suite but i am seeing non-related issues.

@elmiko
Copy link
Author

elmiko commented Apr 1, 2020

/retest

This allows a Machine{Set,Deployment} to scale up/down from 0,
providing the following annotations are set:

```yaml
apiVersion: v1
items:
- apiVersion: machine.openshift.io/v1beta1
  kind: MachineSet
  metadata:
    annotations:
      machine.openshift.io/cluster-api-autoscaler-node-group-min-size: "0"
      machine.openshift.io/cluster-api-autoscaler-node-group-max-size: "6"
      machine.openshift.io/vCPU: "2"
      machine.openshift.io/memoryMb: 8G
      machine.openshift.io/GPU: "1"
      machine.openshift.io/maxPods: "100"
```

Note that `machine.openshift.io/GPU` and `machine.openshift.io/maxPods`
are optional.
@elmiko elmiko force-pushed the openshift/scale-from-0-using-annotations branch from c06ed43 to fe61822 Compare April 1, 2020 18:46
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 1, 2020
@enxebre
Copy link
Member

enxebre commented Apr 2, 2020

/retest

@elmiko
Copy link
Author

elmiko commented Apr 2, 2020

as per our conversation on slack, i am going to remove the hold given that the proposed tests are passing.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 2, 2020
@elmiko
Copy link
Author

elmiko commented Apr 2, 2020

/retest

@elmiko
Copy link
Author

elmiko commented Apr 2, 2020

/test e2e-azure-operator

Copy link

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 3, 2020
@openshift-merge-robot openshift-merge-robot merged commit ec12030 into openshift:master Apr 3, 2020
@elmiko elmiko deleted the openshift/scale-from-0-using-annotations branch April 7, 2020 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants