Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] useInstanceMetadataHostname not available in Terraform rancher2_cluster resource #41059

Closed
timvandruenen opened this issue Apr 4, 2023 · 18 comments · May be fixed by rancher/terraform-provider-rancher2#1271
Labels
area/terraform kind/enhancement Issues that improve or augment existing functionality status/stale

Comments

@timvandruenen
Copy link

timvandruenen commented Apr 4, 2023

Is your feature request related to a problem? Please describe.
Since Rancher v2.6.11 it is now mandatory in some cases for RKE clusters in AWS to set a newly introduced config named useInstanceMetadataHostname under rancherKubernetesEngineConfig.cloudProvider.awsCloudProvider in your Rancher cluster config.

Upgrading from Rancher v2.6.10 to v2.6.11 will break your clusters if you don't, we've learned that the hard way. All nodes in the cluster lost most TCP connectivity between each other + from the AWS Load Balancer. So for now we've set useInstanceMetadataHostname to true manually in our cluster config via the Rancher UI.

I can't seem to find anything related to this in the rancher2_cluster resource in Terraform, making it impossible to create new clusters or update existing clusters via Terraform in our case.

Some read/related issues:
https://github.com/rancher/rancher/releases/tag/v2.6.11 (see last item under Rancher Behavioural Changes)
#22416
#37634

Note that in the issues and the Rancher v2.6.11 release notes mentioned above the new option is called useInstanceHostnameMetadata and in our Rancher cluster config it appeared as useInstanceMetadataHostname with the default value false after upgrading from Rancher v2.6.10 to v2.6.11. So even if we adhered to the newly added option as stated in the release notes and changed useInstanceHostnameMetadata its value from false to true, it wouldn't have helped because of the wrong name being documented.

I also can't seem to find anything in the Rancher documentation related to useInstanceHostnameMetadata or useInstanceMetadataHostname making it hard to find out which one to actually use, instead of just breaking your clusters and manually modify the newly added boolean to true afterwards. Checked https://github.com/rancher/rke1-docs/blob/release/v2.7.2/docs/config-options/cloud-providers/aws/aws.md but couldn't find anything related.

Describe the solution you'd like

  1. Add useInstanceMetadataHostname to rancher2_cluster cloud_provider resource in Terraform: https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster#cloud_provider.
  2. Add information about this option to the Rancher docs: https://rke.docs.rancher.com/config-options/cloud-providers/aws.
  3. Make it way more clear that Rancher RKE1 clusters with the AWS Cloud Provides enabled and based on EC2 should be really careful while upgrading to Rancher >= v2.6.11.

Describe alternatives you've considered
Manually adding --hostname-override to each node, but that doesn't seem to work for the AWS cloud provider either: https://rke.docs.rancher.com/config-options/nodes#overriding-the-hostname

There is an exception for the AWS cloud provider, where the hostname_override field will be explicitly ignored.

Additional context
We should probably have read and tested better, but its a nasty change and pushing such config by default and not documenting it properly really hurts. I can imagine we're not the only one affected by this change, so maybe we should make it more clear or even push a warning to the customer before they touch their cluster config in Rancher UI/Terraform in Rancher >= v2.6.11?

Not even sure what the actual upgrade path would be in this case. You can't add a the config to a cluster before Rancher v2.6.11 because it didn't exist yet, I assume (not tested yet)? So you should first update Rancher to v2.6.11, then quickly add the useInstanceHostnameMetadata: true to all your cluster configs before Rancher starts push the default value (false) to them?

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2023

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@timvandruenen
Copy link
Author

Still relevant. It is still impossible to update/bootstrap a RKE cluster with Terraform in some circumstances because it isn't possible to modify useInstanceMetadataHostname.

@iTaybb
Copy link

iTaybb commented Jun 8, 2023

That's a real issue. I've hacked my way around by monkey patching the terraform provider and compiling it myself.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2023

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@iTaybb
Copy link

iTaybb commented Aug 8, 2023

Still relevant.

@timvandruenen
Copy link
Author

@Oats87 any idea if and when this could be picket up?
Could we help with something? In the end, the only thing which needs to be added is a feature named something like useInstanceMetadataHostname which end up as useInstanceMetadataHostname in the Rancher cluster yaml.

@github-actions
Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@iTaybb
Copy link

iTaybb commented Oct 14, 2023

Still relevant.

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@timvandruenen
Copy link
Author

Still relevant.

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@iTaybb
Copy link

iTaybb commented Feb 14, 2024 via email

@chfrank-cgn
Copy link

@iTaybb
Copy link

iTaybb commented May 8, 2024

Still relevant.

Copy link
Contributor

github-actions bot commented Jul 8, 2024

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

@iTaybb
Copy link

iTaybb commented Jul 8, 2024

Still relevant

@iTaybb
Copy link

iTaybb commented Aug 20, 2024

Still relevant

Copy link
Contributor

This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/terraform kind/enhancement Issues that improve or augment existing functionality status/stale
Projects
None yet
4 participants