-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319
Make all GCP clusters support the instance types 4, 16, and 64 CPU highmem nodes #3319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@consideRatio, you're totally right. I did went with using more generic names like: Happy to change them back to generic names, including the template. It did feel awkward to call that one |
Notes form sync chat.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wieee thank you for working this @GeorgianaElena!!
There were some style changes to not adopt trailing commas, and I looked into this to conclude the autoformatting by terraform fmt
doesn't help with enforcing that. Since its not enforced by autoformatting and requires manual thought, we shouldn't bother thinking about it further I think!
Let's go for a merge!
Out of scope for this PR, but related: I got an idea on how we can help nudge a transition of things over time btw! Next to each thing we want to change in each terraform/eksctl file - we inline a comment like "FIXME: Update this to ... when given the chance". Like that we provide a quite easy to resolve fixme note when something else is done anyhow by someone for example doing k8s upgrade maintenance. |
Thank you @consideRatio! I've just added a commit to add the suggested fixme comments and the trailing commas and will now start terraform plan + apply the chnages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wiee nice!
UpdateI have ran terraform plan & apply for all but three clusters. In the top comment I pasted the The ones without a check I did not ran 1. leapLeap appears to have 2. qclIt seems it wishes to make some updates to the existing nodepools. Don't undestand why? 3. pangeo-hubsAt the end of the plan output, it says:
|
@GeorgianaElena for pangeo hubs, you'll have to log into the gcloud cli using your Columbia email I think |
Thanks @sgibson91. I did that, but I think the issuse is that my columbia account doesn't have permissions to access the sops decryption key which is stored in the UpdateYes, granting my columbia account kms encryptor/decryptor permissions in the two-eye-two-see project fixed the issue for Remaining clusters with terraform apply issues: leap and qcl |
LEAP:
I think node auto-provisioning has been enabled as part of Yuvi trialing things in #3287, and that relies on adjusting the GKE managed cluster autoscaler which isn't running inside the k8s cluster as it does on EKS. QCL:
What kind of updates? Are they related to k8s node versions? I then suspect its a remnant of a k8s cluster upgrade, where node pools wasn't updated as part of the k8s api-server being upgraded. |
OMG amazing summary of your actions in the PR description @GeorgianaElena, looking now! Hmmm, so node pools are being downgraded... |
@GeorgianaElena I think its fine that we update This cluster was created without pinned k8s versions for the nodes and hasn't been upgraded to align all nodes version since the pinning was introduced in the terraform config. |
@GeorgianaElena I went for the QCL node pools changes while they remained inactive - QCL complete! |
@GeorgianaElena I'm quite confident that #3287 was causing the LEAP issues. I suggest we simply merge this as it is for now though as I don't think its in scope for this PR to resolve it. |
Amazing! Thank you @consideRatio <3
Thanks @consideRatio! Then I will merge this now since almost everything was terraform applied. |
Follow-up to #3304
Fixes #3256
TODO
Run
terraform plan & apply for
:terraform plan -var-file=projects/callysto.tfvars
terraform plan -var-file=projects/callysto.tfvars
terraform plan -var-file=projects/cloudbank.tfvars
terraform plan -var-file=projects/hhmi.tfvars
terraform plan -var-file=projects/linked-earth.tfvars
terraform plan -var-file=projects/m2lines.tfvars
terraform plan -var-file=projects/pilot-hubs.tfvars
terraform plan -var-file=projects/leap.tfvars
terraform plan -var-file=projects/qcl.tfvars
terraform plan -var-file=projects/pangeo-hubs.tfvars