Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve managed node group bootstrap revisited #1577

Merged
merged 1 commit into from
Nov 2, 2021
Merged

feat: Improve managed node group bootstrap revisited #1577

merged 1 commit into from
Nov 2, 2021

Conversation

stevehipwell
Copy link
Contributor

@stevehipwell stevehipwell commented Sep 9, 2021

PR o'clock

Description

This PR could replace #1433.

The managed node group bootstrap pattern has been modified to set the /etc/eks/bootstap.sh environment variables to control the bootstrap process. To do this a new bootstrap_env variable has been added which sets env variables to be sourced into /etc/eks/bootstrap.sh from a new /etc/profile.d/eks-bootstrap.sh file. For power users pre_userdata can append additional values or dynamic functionality to this file.

Checklist

@ArchiFleKs
Copy link
Contributor

How does this work with custom AMI without merging userdata and calling boostrap manually, are this labels still set on the launch template user data:

/etc/eks/bootstrap.sh mycluster --kubelet-extra-args '--node-labels=eks.amazonaws.com/nodegroup-image=ami-072fbef2a16895c26,eks.amazonaws.com/capacityType=ON_DEMAND,eks.amazonaws.com/sourceLaunchTemplateVersion=1,eks.amazonaws.com/nodegroup=my-cluster-b-20210713121718836600000009,role=worker,eks.amazonaws.com/sourceLaunchTemplateId=lt-0034c7256eaa78b73' --b64-cluster-ca $B64_CLUSTER_CA --apiserver-endpoint $API_SERVER_URL --dns-cluster-ip $K8S_CLUSTER_DNS_IP

Which are the automatically merged bootstrap script when not using custom ami

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs this doesn't change the current behaviour, so I'm not quite sure what the question is? If an AMI is provided then we call /etc/eks/bootstrap.sh mycluster at the end of our user data, if it's not our user data is merged with the default userdata which makes this call.

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs I think I see your point, let me make a change.

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs this looks like a defect in the current release for custom AMIs? I can see there is also a defect in my PR as bootstrap_env.KUBELET_EXTRA_ARGS will loose out to the --kubelet-extra-args input so I'll fix this.

@ArchiFleKs
Copy link
Contributor

Yes you are right the issue comes from another commit. I'm not sure it is a defect I have never used a custom AMI I'm just wondering if this label are also set when using custom AMI or if we loose them because of having to call the bootstrap command manually.

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs based on the code and AWS docs I suspect you would need to replicate the bootstrap args or set the env variables for a custom AMI to work as expected.

@stevehipwell
Copy link
Contributor Author

@ArchiFleKs could you take a look at the changes to fix the behaviour when no ami_id is passed in? I think that a separate PR should be opened to set the default bootstrap arguments when ami_id is set.

@stevehipwell
Copy link
Contributor Author

stevehipwell commented Sep 9, 2021

@daroga0002 is this the type of generic you were looking for? The only pain point with a generic solution like this is that the default values will be overridden as a single value due to TF limitations.

@ArchiFleKs
Copy link
Contributor

@ArchiFleKs could you take a look at the changes to fix the behaviour when no ami_id is passed in? I think that a separate PR should be opened to set the default bootstrap arguments when ami_id is set.

I think with the if statement It should be ok with standard AMI

@ArchiFleKs
Copy link
Contributor

How would someone wanting to use containerd and max pod with VPC CNI > 1.9 would do it with this PR compare to the older one ?

@stevehipwell
Copy link
Contributor Author

How would someone wanting to use containerd and max pod with VPC CNI > 1.9 would do it with this PR compare to the older one ?

@ArchiFleKs the following config would do this assuming that you wanted 110 pods per node.

node_groups = {
  my_group = {
    bootstrap_env = {
      CONTAINER_RUNTIME = "containerd"
      USE_MAX_PODS      = false
    }
    kubelet_extra_args = "--max-pods=110"
  }
}

@daroga0002
Copy link
Contributor

@stevehipwell just to let you know I seeing your good job 🎉 and have your PRs on list to review (and test).

I will try to find a time soon for this

@stevehipwell
Copy link
Contributor Author

@daroga0002 would you like me to close #1433 so as to not confuse matters? I think this one is preferred by the maintainers?

@daroga0002
Copy link
Contributor

@daroga0002 would you like me to close #1433 so as to not confuse matters? I think this one is preferred by the maintainers?

lets keep this open as I will took look on both

@ArchiFleKs
Copy link
Contributor

ArchiFleKs commented Sep 18, 2021

How would someone wanting to use containerd and max pod with VPC CNI > 1.9 would do it with this PR compare to the older one ?

@ArchiFleKs the following config would do this assuming that you wanted 110 pods per node.

node_groups = {

  my_group = {

    bootstrap_env = {

      CONTAINER_RUNTIME = "containerd"

      USE_MAX_PODS      = false

    }

    kubelet_extra_args = "--max-pods=110"

  }

}

Just tested this, working as expected, I think this PR is better than the old one because this allow to use pretty much any environments variable available in eks bootstrap script without having to do another PR here.

True it is less obvious to the users but I think we can give example instead.

@jaimehrubiks
Copy link
Contributor

I like it. Just need to make sure that it satisfies the needs for people using custom AMI, but for the rest, it looks good

@stevehipwell
Copy link
Contributor Author

@jaimehrubiks I'd take #1580 first and then add these changes on top of that; so custom AMIs are supported correctly.

@FabioAntunes
Copy link

Hey folks quick question, without this PR is there any other way to enable containerd in our managed nodes?

@stevehipwell
Copy link
Contributor Author

@FabioAntunes you can implement the code in this PR directly into your userdata to achieve the same behaviour, this is what I'm doing until the PR is merged.

@stevehipwell
Copy link
Contributor Author

@daroga0002 @antonbabenko @ArchiFleKs this has been rebased to work with the changes in #1580.

@stevehipwell
Copy link
Contributor Author

@daroga0002 this has been rebased on your data source fix, could we look at getting this in the next release?

@stevehipwell
Copy link
Contributor Author

@daroga0002 I've rebased again with the MNG AMI fix.

@thpham
Copy link

thpham commented Oct 18, 2021

Thank you for the PR, I used it on my fork and I confirm it works as expected !

@stevehipwell
Copy link
Contributor Author

@daroga0002 how are you doing for availability? Could you take a look at this PR?

@stevehipwell
Copy link
Contributor Author

@daroga0002 @antonbabenko it'd be good to get this merged as I'd like to open a PR on top of this for Bottlerocket now it's supported.

https://aws.amazon.com/about-aws/whats-new/2021/10/amazon-eks-nodes-groups-bottlerocket/

Copy link
Contributor

@daroga0002 daroga0002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevehipwell thank you for contribution 🎉

I have reviewed and tested this and works fine.

Diff between new and old approach in userdata:

13c13,15
< export ADDITIONAL_KUBELET_EXTRA_ARGS=""
---
> export CONTAINER_RUNTIME="containerd"
> export USE_MAX_PODS="false"
> export ADDITIONAL_KUBELET_EXTRA_ARGS="--max-pods=110"
32c34

+ node label on launch template which I omitted in diff.

@daroga0002
Copy link
Contributor

@antonbabenko lets merge this to master

@antonbabenko antonbabenko merged commit 949bebe into terraform-aws-modules:master Nov 2, 2021
@antonbabenko
Copy link
Member

Here we go! 🎉

v17.23.0 has been just released.

@stevehipwell stevehipwell deleted the managed-node-group-bootstrap-env branch November 2, 2021 10:23
@stevehipwell
Copy link
Contributor Author

Thank you @daroga0002 & @antonbabenko, I'll open a PR shortly to support Bottlerocket in MNGs. Is there any news on a refactored pattern for un-managed launch template nodes?

@daroga0002
Copy link
Contributor

Is there any news on a refactored pattern for un-managed launch template nodes?

I dont think so, in general there was quite good PR:
#858
but now it is outdated and probably require some effort to fix and port new features into that.

Additional consideration is how to make it more consistent with node_groups submodule as some code probably can be shared (like launch template).

@stevehipwell
Copy link
Contributor Author

@daroga0002 it might make more sense to add a new input and create a sibling module to the MNG to create non-MNGs without needing a breaking change. The launch template code could also then be extracted into a shared module, although that might want to wait until both are stable.

@daroga0002
Copy link
Contributor

daroga0002 commented Nov 2, 2021

it might make more sense to add a new input and create a sibling module to the MNG to create non-MNGs without needing a breaking change.

maybe, but from other side it will be nightmare to maintain it.

I think we need here finally release some major breaking change and fixing a lot of legacy design issues

@stevehipwell
Copy link
Contributor Author

maybe, but from other side it will be nightmare to maintain it.

I'd propose doing this first and then doing the big breaking changes refactor to remove legacy issues and side effects. It might even line it up with the future Terraform auto state mapping (if the removal of count isn't already supported) so it's only a breaking change if you're using the legacy patterns and side effects. I'd be keen to be a part of this work.

@jurabek
Copy link

jurabek commented Nov 10, 2021

Hi, @ArchiFleKs is it possible to set CONTAINER_RUNTIME=containerd somehow when you use worker_groups instead node_groups? I tried to set bootstrap_env to worker_groups itself

bootstrap_env = {
      CONTAINER_RUNTIME = "containerd"
}

I really appreciate any suggestion here,
Thanks

@ArchiFleKs
Copy link
Contributor

Hi, @ArchiFleKs is it possible to set CONTAINER_RUNTIME=containerd somehow when you use worker_groups instead node_groups? I tried to set bootstrap_env to worker_groups itself

bootstrap_env = {
      CONTAINER_RUNTIME = "containerd"
}

I really appreciate any suggestion here, Thanks

Unsure I'm not using work_groups anymore, maybe @stevehipwell can give you a better answer but I don't think you can based on this

@stevehipwell
Copy link
Contributor Author

stevehipwell commented Nov 10, 2021

@jurabek bootstrap_env is currently only supported in MNGs, but I am planning on opening a PR to add it to worker groups. In the meantime you can use the pre_userdata to set the variables directly (this is safe as only MNGs merge userdata).

pre_userdata = <<-EOT
  export CONTAINER_RUNTIME="containerd"
EOT

@jurabek
Copy link

jurabek commented Nov 10, 2021

@stevehipwell that worked for me, thank you very much.

spr-mweber3 pushed a commit to spring-media/terraform-aws-eks that referenced this pull request Dec 1, 2021
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants