Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate VMSS (Virtual Machine Scale Sets) #113

Closed
justaugustus opened this issue Mar 4, 2019 · 33 comments
Closed

Investigate VMSS (Virtual Machine Scale Sets) #113

justaugustus opened this issue Mar 4, 2019 · 33 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@justaugustus
Copy link
Contributor

/kind feature
/assign
/milestone next

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 4, 2019
@k8s-ci-robot k8s-ci-robot added this to the next milestone Mar 4, 2019
@justaugustus
Copy link
Contributor Author

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Mar 6, 2019
@alexeldeib
Copy link
Contributor

alexeldeib commented Apr 18, 2019

I'm also interested in this, not sure how high priority this should be considered. VMSS makes a lot of sense for implementing a higher level machineset or machinedeployment, but it's not strictly necessary for the Machine spec.

Off the top of my head one immediate question is whether this would be user-toggleable or not. If so, something in the MachineSet** spec needs to communicate that. If not, one thing to keep in mind is the providerID format will suddenly be slightly different.

@alexeldeib
Copy link
Contributor

@justaugustus any thoughts?

@feiskyer
Copy link
Member

@justaugustus @alexeldeib any blocking issues on VMSS support? We'd like to prioritize this.

@alexeldeib
Copy link
Contributor

@feiskyer Not as far as I'm aware. This seems like an easy win to me.

@justaugustus
Copy link
Contributor Author

It's a win for everyone, but I wouldn't necessarily call it an easy one.
What we need to consider here is the fact that there is no native paradigm for VMSS in Cluster API.

As the VMSS and virtual machine implementations for Azure are different, I would request that this be something toggleable for users.

Juan-Lee expressed interest in working on this, so assigning to him.
/assign @juan-lee

Also tagging @detiber, as I believe we were talking about what the right way to make this happen is, during KubeCon.

@justaugustus
Copy link
Contributor Author

/unassign

@detiber
Copy link
Member

detiber commented Jun 6, 2019

There are two potential approaches:

  • Add an (optional) extension point for MachineSets that could be leveraged to directly interact with VMSS (or other cloud-based scaling mechanisms).
  • Add a new type similar to MachineSet with an extension point for interacting with VMSS (or other cloud-based scaling mechanisms)

The second option would probably be the least controversial in the community, since there is some concern that overloading MachineSets could potentially lead to user confusion.

The proper place to start for this would be to start putting together a design proposal using this template and bringing it up for discussion at the weekly Cluster API meeting.

@alexeldeib
Copy link
Contributor

@detiber if you're planning to bring this up in cluster-api weekly, would you mind pushing it to the meeting on the 19th rather than the 12th? I was hoping to attend the kubebuilder monthly meeting which conflicts with this. I've been traveling and mostly have tried to keep up async with meetings, docs, and slack, but I'd like to attend in person if the group will discuss ASG/VMSS/MIG proposals.

If that's too much to ask, I'd be happy to collaborate on a proposal in advance of a meeting where I won't be present. Or if you end up hashing something out with @juan-lee, I'd love it if that were a public doc from Day 1 (at the very least for viewing, I can understand it may be difficult to crowdsource everything from scratch with too many cooks in the kitchen).

I agree it should be an extension/toggle at MachineSet level (which is partially why I said this is an 'easy win' -- I'm struggling to imagine something much different which fits into the model of cluster API; details can be hashed out).

For the use cases I'm interested in, using cluster API without this functionality is basically a non-starter (partially due to connection with HA). I'd like to see this move forward.

@justaugustus
Copy link
Contributor Author

@detiber -- thanks for the feedback!!

@alexeldeib -- just to be clear, this is assigned to @juan-lee, not Jason (I just tagged him in for feedback). Once a public proposal is put together, we can look at getting it on the calendar with the other Cluster API folks.

@detiber
Copy link
Member

detiber commented Jun 6, 2019

I won't have time for a while to work on a proposal for this, but I'm happy to review/provide feedback. I also wasn't planning on adding it explicitly as an agenda item to the Cluster API meeting until a proposal exists.

@juan-lee
Copy link
Contributor

juan-lee commented Jun 6, 2019

@justaugustus @detiber @alexeldeib

Here's my plan:

@CecileRobertMichon and I will work on a design and prototype based on the current state of cluster-api. This exercise should lead us to concrete requirements that will inform a proposal to upstream cluster-api. It's entirely possible that we won't encounter any blockers and our feedback is mostly around usability. Please let me know if there are any objections to this plan.

@justaugustus
Copy link
Contributor Author

@juan-lee -- sounds good! Please link the Google Doc here once it's in motion.

@puja108
Copy link
Member

puja108 commented Jun 11, 2019

We are using VMSS already, but from talking to Azure folks it seems not everyone is convinced you should use them, yet. At least it seems that Azure people in this thread would support it, which makes me happy.

In general, we would welcome a cluster API mode that allows for higher level provider abstractions to machines (VMSS/ASG/MIG) as there're some good arguments for using them, e.g. AWS is adding lots of features to ASGs that make manual management of reserved vs spot vs normal instances obsolete (/cc @teemow). The approaches suggested by @detiber should be applicable to any of those.

@juan-lee
Copy link
Contributor

@puja108 can you provide some additional context around Azure folks not being convinced of using VMSS? Are we talking about just for k8s or broader?

@puja108
Copy link
Member

puja108 commented Jun 17, 2019

@juan-lee I think it was a broader/general hesitancy to use VMSS as they are still relatively new and there wasn't much experience with it. I sadly have no deeper insights there and as I said at least there's also some Azure folks who would support using VMSS, so I'm optimistic there.

@CecileRobertMichon
Copy link
Contributor

Here a first iteration of a proposed design for implementing VMSS given the current state:
https://docs.google.com/document/d/1nbOqCIC0-ezdMXubZIV6EQrzD0QYPrpcdCBB4oSjWeQ/edit?usp=sharing

@detiber @justaugustus PTAL

@CecileRobertMichon
Copy link
Contributor

@detiber @justaugustus I don't see any comments from you on the doc, have you had a chance to review it? Should I share it with a wider audience?

@justaugustus
Copy link
Contributor Author

@CecileRobertMichon @juan-lee -- Will review this week!
Email sent to SIG CL and Azure for comments: https://groups.google.com/d/topic/kubernetes-sig-cluster-lifecycle/hhTm8u4rSMU/discussion

cc: @ritazh

@ncdc
Copy link
Contributor

ncdc commented Jul 10, 2019

I wrote a comment in the proposal document, but I'd really like to see support for scaling sets/groups (Azure VMSS, AWS ASG, etc) in Cluster API itself, and not implemented entirely independently per provider. This could be added to MachineSet, or via a new type similar to MachineSet but specifically for taking advantage of provider-specific scaling functionality.

@vincepri
Copy link
Member

+1 to what @ncdc mentioned, having support in CAPI (especially now that we're working on v1alpha2) would be a major win for the project.

If you can join, I'd love if one of you can bring the discussion in today's Cluster API community meeting https://docs.google.com/document/d/1Ys-DOR5UsgbMEeciuG0HOgDQc8kZsaWIWJeKJ1-UfbY/edit

@CecileRobertMichon
Copy link
Contributor

@vincepri sounds great. Unfortunately I'm not able to make it today but @juan-lee and I will both attend next week's meeting. Hopefully that will also give time for others to share their perspective on the doc before we have a discussion.

@vincepri
Copy link
Member

Sounds great, thanks!

@justaugustus
Copy link
Contributor Author

/assign
@juan-lee and I will be working on the MachinePool + AzureMachinePool (VMSS) proposal(s).

@CecileRobertMichon
Copy link
Contributor

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2020
@awesomenix
Copy link
Contributor

/remove-lifecycle stale

@awesomenix
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 7, 2020
@devigned
Copy link
Contributor

/assign

@devigned
Copy link
Contributor

/active

@CecileRobertMichon
Copy link
Contributor

/close

@devigned can you please open a new issue to track implementing MachinePools?

@k8s-ci-robot
Copy link
Contributor

@CecileRobertMichon: Closing this issue.

In response to this:

/close

@devigned can you please open a new issue to track implementing MachinePools?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests