-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨feat(awsmachinepool): custom lifecyclehooks for machinepools #4875
base: main
Are you sure you want to change the base?
Conversation
Welcome @sebltm! |
Hi @sebltm. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I have two requests before getting to the review:
|
/assign |
@AndiDog sorry I hadn't cleaned up the PR, I didn't know if it would get some traction :) |
@AndiDog let me know if this looks good or if there's anything else I should take a look at :) |
The PR is definitely reviewable now. I'm not much experienced with lifecycle hooks and aws-node-termination-handler (is that your actual use case?). Maybe MachinePool machines (#4527) give us a good way to detect node shutdown and have CAPI/CAPA take care of it? Or in other words: I'm not fully confident reviewing here with my knowledge, but maybe others have a better clue – please feel free to ping or discuss in Slack ( |
/ok-to-test |
Sorry I've been away, thank you @AndiDog for picking this one up |
@@ -133,7 +133,6 @@ func (r *AWSMachinePool) validateAdditionalSecurityGroups() field.ErrorList { | |||
} | |||
return allErrs | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency sake, I'd keep this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, that empty line was a mistake from merging
{ | ||
name: "Should fail if either roleARN or notifcationARN is set but not both", | ||
pool: &AWSMachinePool{ | ||
Spec: AWSMachinePoolSpec{ | ||
AWSLifecycleHooks: []AWSLifecycleHook{ | ||
{ | ||
RoleARN: aws.String("role-arn"), | ||
}, | ||
}, | ||
}, | ||
}, | ||
wantErr: true, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a case for only setting roleARN
, and another one only setting notificationARN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -298,6 +298,11 @@ func (r *AWSMachinePoolReconciler) reconcileNormal(ctx context.Context, machineP | |||
return nil | |||
} | |||
|
|||
if err := r.reconcileLifecycleHooks(machinePoolScope, asgsvc); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a ctx
variable in this function, which we don't pass here, but later down the stack we end up creating a context.TODO()
. May be worth passing the context that we already have. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fiunchinho it looks like most of the other interfaces for the ASGInterface and EC2Interface use the same pattern (they get called from places that have context, and they themselves create their own context.TODO context).
We could start breaking the pattern here, it'd be a bit of a divergence to the rest of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then for consistency sake, it'd be better to follow the same pattern for now. It could be addressed in a different PR later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think creating a context.TODO
might have been a mistake when usage of *WithContext
AWS SDK functions was introduced. Context should always be specified where possible in order to support timeouts, for instance. Some other interface functions are correctly taking such an argument already.
I noticed that |
@sebltm I'll try to continue here to bring it through review |
2a0abcd
to
8238c4b
Compare
Backported from kubernetes-sigs#4875 Co-authored-by: Andreas Sommer <[email protected]>
/test pull-cluster-api-provider-aws-e2e Giving it a try, but E2E might be problematic right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Backported from kubernetes-sigs#4875 Co-authored-by: Andreas Sommer <[email protected]>
New changes are detected. LGTM label has been removed. |
@sebltm: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Backported from kubernetes-sigs#4875 Co-authored-by: Sebastien Michel <[email protected]>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds to the v1beta2 definition for the
AWSMachinePool
andAWSManagedMachinePool
with a new fieldlifecycleHooks
which is a list of:The matching webhooks are updated to validate the lifecycle hooks as they are added to the Custom Resource.
The matching reconcilers are updated to enable reconciling those lifecycle hooks: if the lifecycle hook is present in the Custom Resource but not in the cloud, it is created. And if there is a lifecycle hook present in the cloud but not declared in the Custom Resource then it is removed.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #4020
AWS supports Lifecycle Hooks before/after performing certain actions on an ASG. For example, before scaling in (removing) a node, the ASG can publish an event in an SQS queue which can them be consumed by the node-termination-handler to ensure its proper removal from Kubernetes (it will cordon, drain the node and wait for a period of time for applications to be removed before allowing the Autoscaling Group to terminate the instance).
This allows Kubernetes or other components to be aware of the node's lifecycle and take appropriate actions
Special notes for your reviewer:
Checklist:
Release note: