Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented cloud provider initialization #208

Merged
merged 2 commits into from
Jan 29, 2021

Conversation

ellistarn
Copy link
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

docs/aws/README.md Outdated Show resolved Hide resolved
docs/examples/provisioner/provisioner.yaml Outdated Show resolved Hide resolved
)

const (
DefaultLaunchTemplateName = "KarpenterDefaultLaunchTemplate"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has more than 1 cluster in the same region, do we want to use the launch template for both the clusters?
We should make it cluster specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This code probably needs to be factored as part of the reconciliation loop (lazily), because you could imagine wanting one of these for each resource. I'm gonna make a TODO for this.

pkg/cloudprovider/aws/capacity.go Outdated Show resolved Hide resolved
pkg/cloudprovider/aws/capacity.go Outdated Show resolved Hide resolved
docs/aws/README.md Outdated Show resolved Hide resolved
@ellistarn ellistarn changed the title WIP for Prateek Implemented cloud provider discovery work Jan 28, 2021
@ellistarn ellistarn changed the title Implemented cloud provider discovery work Implemented cloud provider initialization Jan 28, 2021
"ec2:DescribeSubnets",
"eks:DescribeCluster",
"iam:GetRole",
"iam:CreateRole",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a scary permission. I think it would be better to accept a pre-created role ARN as a parameter somewhere, and don't create the Instance Role ourselves.

(Probably similar with some of the other permissions too, but this one jumps out immediately.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i was thinking about this. Here's how I was thinking about it:
For demo purposes, we can grant this permission and bootstrap the resources ourselves. For the production version, users create the resources themselves. The code already supports this if you want to remove these permissions and BYO Role or LaunchTemplate. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, if this is for demo purposes we should move it to karpenter-aws-demo
Here we should keep the minimum permissions we need, if we think we should never be creating profile/roles ourselves (if they don't exist), we should document and let users create them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you here. Let's add this to the list of design considerations. In the short term, I have some changes here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, 'demo' features have a tendency to leak out and become the 'default' install. This one in particular seems like it's worth some effort to think about how we avoid it - even if the install instructions turn out to be two commands instead of one.

return createInstanceRoleOutput.InstanceProfile
}

func instanceRoleOrDie(IAMAPI iamiface.IAMAPI, name string) *iam.Role {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we change name -> roleName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of brief, context specific names. I learned this habit from Kubernetes (i.e. every resource name is just "name") and I quite like it. There are cases where you do need to specify if it's not clear from context, but I feel it's reasonably clear here. I you feel strongly that it's ambiguous, I'll happily change it as you ask.

*cluster.CertificateAuthority.Data,
*cluster.Endpoint,
)))),
ImageId: aws.String("ami-0532808ed453f9ca3"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not work in every region, can we add a TODO here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -43,7 +43,7 @@ func (c *Controller) Owns() []controllers.Object {
}

func (c *Controller) Interval() time.Duration {
return 10 * time.Second
return 60 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we changing it to 1 minute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid overrun until your changes are in.

- nodes
- pods
verbs:
- create
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add create to nodes and pods resources earlier in the file on line # 60-61

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally not wanted to allow create pods, but the more I think about it, we probably want to do this for sharding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait nevermind, we'd want create deployment, or something. Leaving as is.

@ellistarn ellistarn merged commit 24807c7 into aws:provisioner-work Jan 29, 2021
@ellistarn ellistarn deleted the provisioner branch January 29, 2021 18:00
ellistarn added a commit to ellistarn/karpenter-provider-aws that referenced this pull request Feb 4, 2021
* Implemented Cloud Provider initialization and discovery logic.

* Implemented cloud provider initialization
ellistarn added a commit that referenced this pull request Feb 5, 2021
* Provisioner api (#198)

* First commit for provisioner spec

* remove comented code

* Added a simple controller skeleton (#199)

* Adding capacity API client (#202)

* Adding fleet API client

* Add capacity provisioner and refine the interface definitions

* Provisioner skeleton code (#203)

* Wired up controller to main and trivial test (#205)

* More tweaks to test wiring (#207)

* Add end to end functionality for pending pods (#209)

* add end to end functionality

* simplify functions after PR feedback

* simplify create fleet method

* fix comments, remove new line char and remove private create method

* Implemented cloud provider initialization (#208)

* Implemented Cloud Provider initialization and discovery logic.

* Implemented cloud provider initialization

* Cloudprovider improvements: Lazy Initialization, IAM Separation, and Topology Support (#213)

* Create node objects and bind pods to nodes (#215)

* Create node objects and bind pods to nodes

* fix error check

* fix error check

* Reoriented Capacity CloudProvider around BinPacking (#214)

* Setup packing interface and refactored instance provider for launching constrained instances (#216)

* Pausing on packing

* Setup packing interface and refactored instance provider for launching constrained instances

Co-authored-by: Prateek Gogia <[email protected]>
ellistarn added a commit that referenced this pull request Feb 10, 2021
* Switches presubmits to track main. (#139)

* Switches presubmits to track main.

* Address https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

* Added WORKING_GROUP (#140)

* Readme edits + organization docs (#142)

* readme and admin docs

* faq edits

* doc edits

* edits and organization

* readme edits

* readme edit

* Remove Queue reference from NodeGroup overprovisioning example (#145)

* correct a typo in README.md (#143)

* readme typo (#146)

* Capitalise TERMS.md link (#147)

* Update OWNERS (#150)

* Update README.md (#149)

* Fix installation to target `main` branch instead of `master` (#153)

* Cleaned up FAQs, Working Group Invite, and other typos (#148)

* Update OWNERS (#155)

add tabern

* Switches to Allocatable from Capacity and exposes more human readable (#154)

values in capacity reservation status.

* Update FAQs.md (#156)

* Update DESIGN.md (#158)

Fixed typos

* Update README.md (#160)

* Update README.md

* Update README.md

* Update README.md

* Initial github banner image (#162)

* Initial github banner image

* Small image fix and add to readme

* Remove repo name from top of readme

* Smaller readme banner. Add repo preview

* HA Bottoms out at 1 DesiredReplica for Utilization and Value metrics (#161)

* Update README.md (#163)

* changed consumer context from customer to user (#168)

* changed consumer context from customer to user (#167)

* Reorganized OWNERS (#169)

* grammar & syntax updates in docs/DESIGN.md (#170)

* Add unit test for scalable node group controller (#157)

* Add unit test for scalable node group controller

* Add some more tests

* Add error checks for SetReplicas call

* Fix Unit tests

* rename fakeError

* update tests

* rename a few things

* rename fake error object

* Fixed pointers comparison

* Simplify tests a little

* Updated release image to use public ecr (#171)

* Improved release automation. Cut pre-release for v0.1.1 (#174)

* Changed desiredReplicas to be a pointer to allow for displaying scale to (#172)

* Changed desiredReplicas to be a pointer to allow for displaying scale to
zero.

* Updated tests

* Created a minimal Helm Chart (#175)

* Implemented a minimal helm chart with release automation using Github
Pages

* Removed stray files

* Updated chart

* Updated install instructions (#176)

* Updated install instructions

* Fixed a typo

* Improved calendar invite for Working Group (#177)

* Add working group meeting notes (#180)

* Add working group meeting notes (#182)

* Create NOTICE (#178)

* Fix errors in hack/quick-install.sh (#185)

Fix issues:

1/ Command helm upgrade --install karpenter charts/karpenter fails.

2/ Uninstall: Current order of operation leaves deployment: karpenter and statefulset: prometheus-karpenter-monitor remaining:

* Fixed a few typos in developer guide (#187)

The developer guide hadn't been updated after some of the make commands changed name. Also corrected an environment variable name.

* Add notes from working group meeting (#192)

* ScheduledCapacity API (#194)

* fixed a small typo in metricsproducer (#195)

* Add notes from working group meeting (#196)

* Implemented Scheduled Capacity Metrics Producer (#200)

* fixing merge problems

* Implemented Scheduled Capacity Metrics Producer

* Design Doc for ScheduledCapacity (#197)

* Delete karpenter.ics (#204)

Deleted in ics invite favor of google calendar.

* enable build on provisioner branch too (#206)

* Validations for MetricsProducers (#211)

* MetricsProducer Schedule Validation

* cleaned up validation efforts for non-schedules

* Added Validation checks in all resource reconcile loops

* Add working group notes (#217)

* add working group notes

* add some more notes to WG notes

* Rebase provisioner-work into main. (#219)

* Provisioner api (#198)

* First commit for provisioner spec

* remove comented code

* Added a simple controller skeleton (#199)

* Adding capacity API client (#202)

* Adding fleet API client

* Add capacity provisioner and refine the interface definitions

* Provisioner skeleton code (#203)

* Wired up controller to main and trivial test (#205)

* More tweaks to test wiring (#207)

* Add end to end functionality for pending pods (#209)

* add end to end functionality

* simplify functions after PR feedback

* simplify create fleet method

* fix comments, remove new line char and remove private create method

* Implemented cloud provider initialization (#208)

* Implemented Cloud Provider initialization and discovery logic.

* Implemented cloud provider initialization

* Cloudprovider improvements: Lazy Initialization, IAM Separation, and Topology Support (#213)

* Create node objects and bind pods to nodes (#215)

* Create node objects and bind pods to nodes

* fix error check

* fix error check

* Reoriented Capacity CloudProvider around BinPacking (#214)

* Setup packing interface and refactored instance provider for launching constrained instances (#216)

* Pausing on packing

* Setup packing interface and refactored instance provider for launching constrained instances

Co-authored-by: Prateek Gogia <[email protected]>

* Defaults for Metrics Producers (#222)

* Refactor cloud provider, add packing package (#221)

* refactor cloud provider, add packing package

* Refactored allocator and improve logging (#218)

* Skeleton code for reallocator (#235)

* Generated v0.1.2 release

Co-authored-by: Nate Taber <[email protected]>
Co-authored-by: Guy Templeton <[email protected]>
Co-authored-by: dherman <[email protected]>
Co-authored-by: Guy Templeton <[email protected]>
Co-authored-by: Tom Kerkhove <[email protected]>
Co-authored-by: alethatarn <[email protected]>
Co-authored-by: Justin Garrison <[email protected]>
Co-authored-by: Cameron Senese <[email protected]>
Co-authored-by: Prateek Gogia <[email protected]>
Co-authored-by: Prateek Gogia <[email protected]>
Co-authored-by: Henri Yandell <[email protected]>
Co-authored-by: Jacob Gabrielson <[email protected]>
Co-authored-by: Nick Tran <[email protected]>
gfcroft pushed a commit to gfcroft/karpenter-provider-aws that referenced this pull request Nov 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants