-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure EC2 instance metadata options #4037
Conversation
Welcome @muraee! |
Hi @muraee. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
@Skarlso do I need to include the new API field in |
dd65992
to
b04af99
Compare
Nope, there is a bunch of stuff regarding conversions. I'll include something tomorrow, I'm exhausted for today. Try looking up the git history of the conversion.go file and *_conversion.go files in the v1 folder. That will give you some hints as to how to proceed with them. |
I fixed the conversion issues, I hope this is the correct way. |
Nope, sadly, it's not. :D I'll try to point you to the right thing today. Sorry, I'm just so frigging busy. |
@Skarlso could you please take a look if you have got time this week. Thanks :) |
We have a code freeze atm to fix the testing infrastructure. But I'll try to tackle this this week. |
@@ -43,6 +43,14 @@ func Convert_v1beta2_NetworkStatus_To_v1beta1_NetworkStatus(in *v1beta2.NetworkS | |||
return autoConvert_v1beta2_NetworkStatus_To_v1beta1_NetworkStatus(in, out, s) | |||
} | |||
|
|||
func Convert_v1beta2_AWSMachineSpec_To_v1beta1_AWSMachineSpec(in *v1beta2.AWSMachineSpec, out *AWSMachineSpec, s conversion.Scope) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what I believe is missing with the conversion that @Skarlso mentioned is that the conversion currently discards the new fields when converting down to v1beta1, so a conversion down to v1beta1 and back to v1beta2 will result in data loss.
To fix this, the additional fields that are not present on v1beta1 but exist on v1beta2 are encoded as a value on an annotation, and extracted back out when converting from v1beta1 to v1beta2.
Here's an example from one of my earlier PR's where @Skarlso helped me previously with the same sort of conversion logic: https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/3757/files#r989941779 and "sigs.k8s.io/controller-runtime/pkg/conversion"
contains a number of helper methods related to conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Cameron. Sorry, I have my hands full with trying to fix this frigging test infra. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the hint @cnmcavoy
983e7b8
to
6804a87
Compare
@Skarlso all tests passed, could I have a review? |
Okay this is looking better now. I'll give it a better look later today. |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Is there a way we could verify this in one of the tests in E2E?
It would also be good if we add documentation regarding this being enabled by default.
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test pull-cluster-api-provider-aws-e2e |
@muraee there is error in new assertion added, could you please check the e2e failure? |
/test pull-cluster-api-provider-aws-e2e |
bf6c360
to
e734f9e
Compare
/test pull-cluster-api-provider-aws-e2e |
e734f9e
to
a1e18c6
Compare
/test pull-cluster-api-provider-aws-e2e |
a1e18c6
to
2b5062f
Compare
/test pull-cluster-api-provider-aws-e2e |
@muraee: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@Ankitasw e2e passed when options specified explicitly. Seems the |
@muraee could you please squash your commits before we do the merge? |
// ModifyInstanceMetadataOptions modifies the metadata options of the given EC2 instance. | ||
func (s *Service) ModifyInstanceMetadataOptions(instanceID string, options *infrav1.InstanceMetadataOptions) error { | ||
input := &ec2.ModifyInstanceMetadataOptionsInput{ | ||
HttpEndpoint: aws.String(string(options.HTTPEndpoint)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When creating a cluster, I feel I meet a regression exceptions continuously with latest code related to the options here. Paste the trace for your awareness...
E0308 07:31:06.070432 17 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 499 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x26b8c20, 0x440c890})
/Users/yikew/Projects/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0xe0
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/Users/yikew/Projects/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:110 +0xf0
panic({0x26b8c20, 0x440c890})
/usr/local/go/src/runtime/panic.go:890 +0x264
sigs.k8s.io/cluster-api-provider-aws/v2/pkg/cloud/services/ec2.(*Service).ModifyInstanceMetadataOptions(0x400134be60, {0x4001053fc8, 0x13}, 0x0)
/Users/yikew/Projects/src/github.com/kubernetes-sigs/cluster-api-provider-aws/pkg/cloud/services/ec2/instances.go:971 +0x34
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).ensureInstanceMetadataOptions(0x40005e06c0, {0x2ffb9b0, 0x400134be60}, 0x4000f734a0, 0x40011222c0)
/Users/yikew/Projects/src/github.com/kubernetes-sigs/cluster-api-provider-aws/controllers/awsmachine_controller.go:1121 +0xf0
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileOperationalState(0x40005e06c0, {0x2ffb9b0, 0x400134be60}, 0x40015731a0, 0x4000f734a0)
/Users/yikew/Projects/src/github.com/kubernetes-sigs/cluster-api-provider-aws/controllers/awsmachine_controller.go:616 +0x28c
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).reconcileNormal(0x40005e06c0, {0x40005b5090, 0x40010988e8}, 0x40015731a0, {0x2ffb7f0, 0x400020c310}, {0x3003088, 0x400020c310}, {0x3001be0, 0x400020c310}, ...)
/Users/yikew/Projects/src/github.com/kubernetes-sigs/cluster-api-provider-aws/controllers/awsmachine_controller.go:588 +0x1624
sigs.k8s.io/cluster-api-provider-aws/v2/controllers.(*AWSMachineReconciler).Reconcile(0x40005e06c0, {0x2fee338, 0x4000b39680}, {{{0x40016b7a20, 0x7}, {0x4001030840, 0x16}}})
/Users/yikew/Projects/src/github.com/kubernetes-sigs/cluster-api-provider-aws/controllers/awsmachine_controller.go:224 +0x93c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x4000330c80, {0x2fee338, 0x4000b39680}, {{{0x40016b7a20, 0x7}, {0x4001030840, 0x16}}})
/Users/yikew/Projects/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:121 +0xec
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0x4000330c80, {0x2fee338, 0x4000b39680}, {0x27acba0, 0x40000f5f20})
/Users/yikew/Projects/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:320 +0x2f8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0x4000330c80, {0x2fee290, 0x40001a4300})
/Users/yikew/Projects/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:273 +0x2b8
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/Users/yikew/Projects/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234 +0xa8
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
/Users/yikew/Projects/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:230 +0x400
E0308 07:31:06.070538 17 controller.go:326] "Reconciler error" err="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" controller="awsmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSMachine" AWSMachine="default/c6-control-plane-bswb4" namespace="default" name="c6-control-plane-bswb4" reconcileID=18c05f9f-f837-4366-ab61-e6ce9c225db8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird that this is not caught in E2E tests. @muraee could you please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't fail the cluster creation, only raise exceptions continuously in the capa logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not getting set by defaults, I will raise another PR to fix this, might have missed author's comment regarding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing is we noticed another issue with awsmachine webhooks so that could be the reason, thius defaults are not getting set.
I will wait for that issue to fix first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx @Ankitasw, yes it seems the defaults are not getting set for some reason, Although I tried that before and it was working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I hit this exact issue in a separate change. Assuming this is the case, then using the defaulting webhook is not enough. This is roughly similar to the code I think is missing: https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/4112/files#diff-0b559bbd149f0e6d54d789235423f66fa8906fdc6ee9c99b9e85db912912011eR192
SetDefaults_AWSMachineSpec
needs to be invoked from the AWSMachine controller reconciliation loop.
The patch site is also very important, see the comment.
// The defaulting must happen before `NewMachineScope` is called since otherwise we keep detecting
// differences that result in patch operations.
... do defaulting
Specifically, what is suspect is happening is existing AWSMachines that were created before your change can't receive the defaults, since they already exist. So we need to also set the defaults in the controller, at least until the resource is upgraded to another major apiVersion, then we can stop defaulting in the controller.
e2e tests would have trouble catching this kind of issue, because you can only see them if you create a cluster on the previous version, then upgrade to this change. I guess AWSMachine controller reconciliation isn't tested in an upgrade scenario in the e2e?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#4147 should fix it
@muraee I didn't see where bastion machine is configured to IMDSv2 as the default. From my testing, it is still IMDSv1. Could you help double confirm? Do we need it to be IMDSv2 as well? If yes, I'll create an issue. |
@wyike right, the bastion is not configured to use IMDSv2 in this change since it is created/managed by the awscluster_controller. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Allows configuration of EC2 instance metadata options to add IMDSv2 support.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #3744
Special notes for your reviewer:
Checklist:
Release Note: