✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875

sebltm · 2024-03-18T09:14:26Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR adds to the v1beta2 definition for the AWSMachinePool and AWSManagedMachinePool with a new field lifecycleHooks which is a list of:

name: <the name of the lifecycle hook>
notificationTargetARN: <ARN of resource where to send the lifecycle event; optional>
roleARN: <ARN of role to be used when sending notifications; optional>
lifecycleTransition: <autoscaling:EC2_INSTANCE_LAUNCHING/EC2_INSTANCE_TERMINATING>
heartbeatTimeout: <duration of the heartbeat timeout; optional>
defaultResult: <CONTINUE/ABANDON; optional>
notificationMetadata: <some metadata to add to the notification; optional>

The matching webhooks are updated to validate the lifecycle hooks as they are added to the Custom Resource.
The matching reconcilers are updated to enable reconciling those lifecycle hooks: if the lifecycle hook is present in the Custom Resource but not in the cloud, it is created. And if there is a lifecycle hook present in the cloud but not declared in the Custom Resource then it is removed.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #4020

AWS supports Lifecycle Hooks before/after performing certain actions on an ASG. For example, before scaling in (removing) a node, the ASG can publish an event in an SQS queue which can them be consumed by the node-termination-handler to ensure its proper removal from Kubernetes (it will cordon, drain the node and wait for a period of time for applications to be removed before allowing the Autoscaling Group to terminate the instance).

This allows Kubernetes or other components to be aware of the node's lifecycle and take appropriate actions

Special notes for your reviewer:

Checklist:

Release note:

Adding support for custom Lifecycle Hooks in AWSMachinePools for external hooks (e.g support for the aws-node-termination-handler with SQS)

k8s-ci-robot · 2024-03-18T09:14:35Z

Welcome @sebltm!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-aws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-aws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-03-18T09:14:36Z

Hi @sebltm. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

AndiDog · 2024-07-03T11:09:33Z

I have two requests before getting to the review:

Neither title nor PR description describe the change. Lifecycle hooks and reacting to node shutdown is great – but what is this PR doing and achieving? Also, the release note entry in the PR template must be filled.
You're moving lots of code. Please revert those changes as much as possible so the PR becomes reviewable. Refactoring and file renames can be done separately.

AndiDog · 2024-07-03T11:10:04Z

/assign

sebltm · 2024-07-04T19:21:13Z

@AndiDog sorry I hadn't cleaned up the PR, I didn't know if it would get some traction :)
I've updated the PR, updated the description. Let me know if it looks good, I'll write some docs and add release notes

sebltm · 2024-07-13T16:11:43Z

@AndiDog let me know if this looks good or if there's anything else I should take a look at :)

AndiDog · 2024-07-15T15:56:32Z

The PR is definitely reviewable now. I'm not much experienced with lifecycle hooks and aws-node-termination-handler (is that your actual use case?). Maybe MachinePool machines (#4527) give us a good way to detect node shutdown and have CAPI/CAPA take care of it? Or in other words: I'm not fully confident reviewing here with my knowledge, but maybe others have a better clue – please feel free to ping or discuss in Slack (#cluster-api-aws) so we can find someone to check this feature request.

AndiDog · 2024-07-15T15:56:43Z

/ok-to-test

sebltm · 2024-11-24T15:03:20Z

Sorry I've been away, thank you @AndiDog for picking this one up

fiunchinho · 2024-11-25T14:11:40Z

exp/api/v1beta2/awsmachinepool_webhook.go

@@ -133,7 +133,6 @@ func (r *AWSMachinePool) validateAdditionalSecurityGroups() field.ErrorList {
 	}
 	return allErrs
 }
-


for consistency sake, I'd keep this

Done, that empty line was a mistake from merging

fiunchinho · 2024-11-25T14:17:08Z

exp/api/v1beta2/awsmachinepool_webhook_test.go

+		{
+			name: "Should fail if either roleARN or notifcationARN is set but not both",
+			pool: &AWSMachinePool{
+				Spec: AWSMachinePoolSpec{
+					AWSLifecycleHooks: []AWSLifecycleHook{
+						{
+							RoleARN: aws.String("role-arn"),
+						},
+					},
+				},
+			},
+			wantErr: true,
+		},


I'd add a case for only setting roleARN, and another one only setting notificationARN

fiunchinho · 2024-11-25T14:27:48Z

exp/controllers/awsmachinepool_controller.go

@@ -298,6 +298,11 @@ func (r *AWSMachinePoolReconciler) reconcileNormal(ctx context.Context, machineP
 		return nil
 	}

+	if err := r.reconcileLifecycleHooks(machinePoolScope, asgsvc); err != nil {


We have a ctx variable in this function, which we don't pass here, but later down the stack we end up creating a context.TODO(). May be worth passing the context that we already have. What do you think?

@fiunchinho it looks like most of the other interfaces for the ASGInterface and EC2Interface use the same pattern (they get called from places that have context, and they themselves create their own context.TODO context).
We could start breaking the pattern here, it'd be a bit of a divergence to the rest of the code

Then for consistency sake, it'd be better to follow the same pattern for now. It could be addressed in a different PR later on

I think creating a context.TODO might have been a mistake when usage of *WithContext AWS SDK functions was introduced. Context should always be specified where possible in order to support timeouts, for instance. Some other interface functions are correctly taking such an argument already.

AndiDog · 2024-11-25T18:55:38Z

I noticed that CreateASG didn't handle the hooks. Likely, it's best if both are created atomically, so I added this as another commit.

AndiDog · 2024-11-26T08:07:59Z

@sebltm I'll try to continue here to bring it through review

…overs what we think it does

Backported from kubernetes-sigs#4875 Co-authored-by: Andreas Sommer <[email protected]>

AndiDog · 2024-11-27T13:30:40Z

/test pull-cluster-api-provider-aws-e2e

Giving it a try, but E2E might be problematic right now.

fiunchinho

/lgtm

Backported from kubernetes-sigs#4875 Co-authored-by: Andreas Sommer <[email protected]>

… defaults by AWS

k8s-ci-robot · 2024-11-28T14:19:51Z

New changes are detected. LGTM label has been removed.

k8s-ci-robot · 2024-11-28T16:02:24Z

@sebltm: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-cluster-api-provider-aws-build-docker-release-2-6	`2421ec3`	link	true	`/test pull-cluster-api-provider-aws-build-docker-release-2-6`
pull-cluster-api-provider-aws-build-release-2-6	`2421ec3`	link	true	`/test pull-cluster-api-provider-aws-build-release-2-6`
pull-cluster-api-provider-aws-e2e	`c35bbc3`	link	false	`/test pull-cluster-api-provider-aws-e2e`
pull-cluster-api-provider-aws-test	`b95757c`	link	true	`/test pull-cluster-api-provider-aws-test`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Backported from kubernetes-sigs#4875 Co-authored-by: Sebastien Michel <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 18, 2024

k8s-ci-robot requested review from AndiDog and fiunchinho March 18, 2024 09:14

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority labels Mar 18, 2024

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 18, 2024

sebltm changed the title ~~feat(awsmachinepool): add the ability to add lifecycle hooks~~ ✨feat(awsmachinepool): add the ability to add lifecycle hooks Mar 18, 2024

sebltm force-pushed the lifecycle-hooks branch from 687948e to 0ec9303 Compare April 16, 2024 11:41

sebltm marked this pull request as ready for review April 16, 2024 11:41

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2024

k8s-ci-robot requested a review from luthermonson April 16, 2024 11:41

sebltm changed the title ~~✨feat(awsmachinepool): add the ability to add lifecycle hooks~~ ✨feat(awsmachinepool): custom lifecyclehooks for machinepools May 10, 2024

k8s-ci-robot assigned AndiDog Jul 3, 2024

sebltm force-pushed the lifecycle-hooks branch from 0ec9303 to fb7d6af Compare July 4, 2024 19:11

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 4, 2024

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jul 13, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 15, 2024

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 24, 2024

fiunchinho reviewed Nov 25, 2024

View reviewed changes

Create lifecycle hooks together with ASG

275b3d6

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 25, 2024

AndiDog added 3 commits November 26, 2024 17:56

Pass down context

171ea12

Add webhook tests for all lifecycle hook fields and ensure the test c…

cd21af9

…overs what we think it does

Undo whitespace-only change (merge mistake)

8238c4b

AndiDog force-pushed the lifecycle-hooks branch from 2a0abcd to 8238c4b Compare November 26, 2024 17:55

AndiDog added 2 commits November 27, 2024 13:18

Minor test EXPECT() improvement

5b05e7f

Fix hooks spec logic

c35bbc3

AndiDog added a commit to giantswarm/cluster-api-provider-aws that referenced this pull request Nov 27, 2024

feat: custom lifecyclehooks for machinepools

c5e02c2

Backported from kubernetes-sigs#4875 Co-authored-by: Andreas Sommer <[email protected]>

AndiDog mentioned this pull request Nov 27, 2024

✨ feat: custom lifecyclehooks for machinepools giantswarm/cluster-api-provider-aws#613

Merged

fiunchinho approved these changes Nov 27, 2024

View reviewed changes

k8s-ci-robot assigned fiunchinho Nov 27, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 27, 2024

AndiDog added a commit to giantswarm/cluster-api-provider-aws that referenced this pull request Nov 28, 2024

feat: custom lifecyclehooks for machinepools

4babb7b

Backported from kubernetes-sigs#4875 Co-authored-by: Andreas Sommer <[email protected]>

Fix lifecycle hook needs-update check for nil fields which get set to…

66ad8a8

… defaults by AWS

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 28, 2024

Always update DefaultResult/HeartbeatTimeout settings if they drifted

b95757c

AndiDog added a commit to giantswarm/cluster-api-provider-aws that referenced this pull request Dec 2, 2024

feat: custom lifecyclehooks for machinepools (#613)

09d9674

Backported from kubernetes-sigs#4875 Co-authored-by: Sebastien Michel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875

✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875

sebltm commented Mar 18, 2024 •

edited

Loading

k8s-ci-robot commented Mar 18, 2024

k8s-ci-robot commented Mar 18, 2024

AndiDog commented Jul 3, 2024

AndiDog commented Jul 3, 2024

sebltm commented Jul 4, 2024

sebltm commented Jul 13, 2024

AndiDog commented Jul 15, 2024

AndiDog commented Jul 15, 2024

sebltm commented Nov 24, 2024

fiunchinho Nov 25, 2024

AndiDog Nov 26, 2024

fiunchinho Nov 25, 2024

AndiDog Nov 26, 2024

fiunchinho Nov 25, 2024

sebltm Nov 26, 2024

fiunchinho Nov 26, 2024

AndiDog Nov 26, 2024

AndiDog commented Nov 25, 2024

AndiDog commented Nov 26, 2024

AndiDog commented Nov 27, 2024

fiunchinho left a comment

k8s-ci-robot commented Nov 28, 2024

k8s-ci-robot commented Nov 28, 2024

✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875

Are you sure you want to change the base?

✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875

Conversation

sebltm commented Mar 18, 2024 • edited Loading

k8s-ci-robot commented Mar 18, 2024

k8s-ci-robot commented Mar 18, 2024

AndiDog commented Jul 3, 2024

AndiDog commented Jul 3, 2024

sebltm commented Jul 4, 2024

sebltm commented Jul 13, 2024

AndiDog commented Jul 15, 2024

AndiDog commented Jul 15, 2024

sebltm commented Nov 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndiDog commented Nov 25, 2024

AndiDog commented Nov 26, 2024

AndiDog commented Nov 27, 2024

fiunchinho left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 28, 2024

k8s-ci-robot commented Nov 28, 2024

sebltm commented Mar 18, 2024 •

edited

Loading