-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Fix custom AMI bootstrap #1580
feat: Fix custom AMI bootstrap #1580
Conversation
Looks good. I’m off until Tuesday I don’t have the time to test it. My bad but the “role” was a custom label added by me and not something added by AWS. I think it can be removed |
Thanks @ArchiFleKs, I did wonder about the role... |
I think
|
@ArchiFleKs I did wonder when they started adding the CSI labels, we add them ourselves until the CSI driver starts using the correct topology labels. The other labels you list would be impossible to inject as they're added to the calculated launch template that's created after the MNG is created. @andreyBar your PR only supported custom AMIs built from the EKS optimised image or that have added a /etc/eks/bootstrtap.sh script to mimic the behaviour of one. The call to /etc/eks/bootstrtap.sh wouldn't work correctly without providing custom user data and any labels set would be ignored. This PR corrects this behaviour so if you just want to use a fixed AMI you can do so without needing to understand how /etc/eks/bootstrtap.sh works. |
@antonbabenko could you take a look at this PR? |
Sorry, I don't have the capacity to take a look at this one for real (run tests, think about edge cases, compatibility, etc). Maybe @daroga0002 can put this in his queue for reviews? |
Yup, I have it on TODO list |
@stevehipwell sorry for delay but I was sick last days, I will find some time during weekend for this but could you rebase this PR to make it easier for testing. |
@daroga0002 I hope you're feeling better now? I've rebased. |
@stevehipwell thank you for your contribution 🎉 , I reviewed PR and I think we can merge it as this should solve multiple issues with custom AMI managed groups. |
@antonbabenko lets merge this and wait with releasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a minor comment about shell script. The rest looks good to me.
# Allow user supplied pre userdata code | ||
sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" ${kubelet_extra_args}"' /etc/eks/bootstrap.sh | ||
%{endif ~} | ||
%{if length(pre_userdata) > 0 ~} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to have this if/endif
block but just include ${pre_userdata}
always as it was before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work without the if block, but it's tidier to use it so the resulting script has consistent white space. It also makes it clear in the template that it's an optional variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this if
. It will be tidy without it. I will merge it right away.
# Set variables | ||
API_SERVER_URL=${cluster_endpoint} | ||
B64_CLUSTER_CA=${cluster_ca} | ||
K8S_CLUSTER_DNS_IP=172.20.0.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one issue as this should be variable
Line 28 in 253f927
service_ipv4_cidr = var.cluster_service_ipv4_cidr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daroga0002 I think you're right. I'm currently on vacation, so can't make this change until I'm back next week. If you can modify my PR that's fine. Otherwise setting the optimised variable to false would let this to be set by providing the whole user data, I could then update the other PR to make this customisable as it covers this type of behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np, we can wait, have a great vacations 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daroga0002 this should be ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevehipwell are you sure you added change as I still see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daroga002 sorry, manic day in the office. I only rebased, I'll add the changes first thing in the morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daroga0002 I've updated the userdata template to be simpler and so that all custom AMIs use the same initial logic for setting the variables (this is designed ready to support #1577). After checking the bootstrap.sh code I've removed the manual setting of K8S_CLUSTER_DNS_IP
as bootstrap.sh will handle this correctly; #1577 will enable this to be fully customisable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antonbabenko it looks good, as per me we can merge it. This will be not breaking or impacting change as it will silently update launch template and update autoscaling group (user will require to roll instances to use new features)
If we will merge this then I think we can make a release (there is also #1584 which was tested and can be included into release)
# Allow user supplied pre userdata code | ||
sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" ${kubelet_extra_args}"' /etc/eks/bootstrap.sh | ||
%{endif ~} | ||
%{if length(pre_userdata) > 0 ~} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this if
. It will be tidy without it. I will merge it right away.
@antonbabenko I've removed the |
Thanks @stevehipwell for the contribution! I think it will be released at the beginning of next week. Waiting for the last PR by @daroga0002 before the release. |
Hi all, This change is now breaking my automatic construction of enviroments : This is due to the data block that got introduced (without a dependency) in the node_groups module in local.tf. |
@JordyWTC what do you mean about the data resource being added without a dependency? Also I assume that you're running this off a branch as this has not been released yet? |
Hi @stevehipwell , In the locals.tf there's a data block added on aws_eks_cluster, but if the eks cluster still needs to be created, than you will get the error that the resource cannot be found. So i actually expect that the data block does not call var.cluster_name but module.aws_eks_cluster.id |
@JordyWTC that's not how I read the behaviour. The |
Hi @stevehipwell , yes you are right i missed the cluster_name being derrived from the locals.tf.
But it does have the name from the cluster stated(which i parse via module), which is strange as there is no cluster created yet and thus the locals variable should show an empty value.
i am not giving the create_eks value as this is by default true. |
@JordyWTC I'm not sure how you can be seeing the behaviour you are, not that I'm saying you're not. The module includes a custom depends on hack which should stop the MNGs being created before the cluster is ready. Do you have any more details such as your TF version etc? |
Hi @stevehipwell, I indeed agree with you that i should not be able to see this behavior. I just did another test and indeed it happens that the var.cluster_name in the nodegroup_modules is getting its value from what i am sending via terragrunt -> terraform. Running terragrunt apply from the eks part.
Terragrunt code in eks folder :
The terraform module terragrunt is calling is than doing as described also in earlier comment :
|
@daroga0002 does this look like anything you've seen before? @JordyWTC do you get the same behaviour if you call Terraform directly? |
@JordyWTC This PR was merged after (The last commit on |
I suspect there is some magic in user side, as even in eks module he is not using node groups either worker groups. |
@endrec, you are right the ref 17.20.0 i have put in today to make it work, sorry for the unclearance. The problem happens when i am doing if via ref master @stevehipwell @daroga0002 from terraform it self it indeed is working, so this seems like a terragrunt problem / the way we are using terragrunt in this specific scenario, so i am not sure if other terragrunt user will experience the same issue. Might it be a fix that instead of using var.customer_name in the data block that module.aws_eks_cluster.id will be used? |
@JordyWTC that is exactly how it's currently working. |
@stevehipwell i did a reforge on terragrunt(latest version) to immediately having the source from this github and i am running in the same problem. That means that everyone who is using terragrunt and sourcing this module will run into the same problem.
I have no idea why terragrunt in this case is already trying to access the datablock while there is a dependency in the node_groups.tf, i will raise this at terragrunt as well. For me the workaround now is just to call in the ref from an earlier version. |
@JordyWTC the implementation of |
Thanks @stevehipwell , i will raise it with TG. |
@JordyWTC , I happen to use terragrunt in my setup, so for a quick sanity check I updated the module source to
|
Thanks for info @endrec, ill be diving further in my code :) |
I found a issue with missing conditional, I opened #1632 and working on fix |
… is EKS optimized and has a custom ipv4 CIDR The PR (terraform-aws-modules#1580) is passing the "apiserver-endpoint" and "b64-cluster-ca", which causes the SERVICE_IPV4_CIDR empty (https://github.com/awslabs/amazon-eks-ami/blob/v20211206/files/bootstrap.sh#L366). Because of that, the script fallbacks always to 10.100.0.10 or 172.20.0.10. Defining the ipv4 cidr ensures that the bootstrap script configures the DNS server correctly on the kubelet service, allowing pods to resolve DNS names.
…raform-aws-modules#1717) The PR (terraform-aws-modules#1580) is passing the "apiserver-endpoint" and "b64-cluster-ca", which causes the SERVICE_IPV4_CIDR empty (https://github.com/awslabs/amazon-eks-ami/blob/v20211206/files/bootstrap.sh#L366). Because of that, the script fallbacks always to 10.100.0.10 or 172.20.0.10. Defining the ipv4 cidr ensures that the bootstrap script configures the DNS server correctly on the kubelet service, allowing pods to resolve DNS names.
I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
PR o'clock
Description
This PR fixes the issues introduced in PR #1473 which would have made it impossible to start a fully customised image without requiring a
bootstrap.sh
script.Checklist