-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add Node managed labels support #7173
Conversation
I will take a look at this ASAP but I'm getting to the idea that one way to get this moving is to start acting on machine -> node propagation, which seems already to get consensus and then tackling md changes in a separated PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enxebre thanks for keeping this effort moving, as said in the office hour meeting, I'm planning to invest more time helping on this topic
internal/controllers/machinedeployment/machinedeployment_controller.go
Outdated
Show resolved
Hide resolved
internal/controllers/machinedeployment/machinedeployment_controller.go
Outdated
Show resolved
Hide resolved
internal/controllers/machinedeployment/machinedeployment_controller.go
Outdated
Show resolved
Hide resolved
Does this cover taints as well? |
ee9bb3e
to
b0fbb61
Compare
b0fbb61
to
081a0a9
Compare
Updated to scope the PR to propagate from Machines to Nodes. cc @fabriziopandini |
e291634
to
b3b9c83
Compare
/test pull-cluster-api-e2e-main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@enxebre thanks for reducing the scope of this PR.
I think the only point to be addressed is about the label prefixes to be reconciled once.
The proposal states:
One-time Apply of Standard Kubernetes Labels
The machine controller will apply the standard kubernetes labels, if specified, to the Node immediately after the Node enters the Ready state (ProviderID is set). We will enforce this one-time application by making the labels immutable via a validating webhook.
WRT to the first part, I think that we should revert the order (first set labels if not already there, then set the provider ID) so the operation is re-entrant.
WRT to making those labels immutable via webhooks, I'm personally leaning toward not implementing it, because I feel we should leave admin the freedom to manage what we do not reconcile, but this is up for discussion.
I was also thinking that we can eventually also change CABPK so the labels are applied by kubeadm on node creation (and give a recommendation to all the bootstrap provider to do the same), but this could be also a follow up
So as the PR it's right now labels are applied along with annotations as soon as the matching Node is found. Does this sounds good? If so I'll update the proposal.
I think I agree.
As it's right now all managed labels would be continuously reconciled from Machines to Nodes. I think that's ok particularly from a UX pov. If the above makes sense I'll update the proposal to reflect it back. |
The reason that was introduced in the proposal is just that it's (after inline-propagation is implemented) a bit awkward if you change one of those labels and they are never rolled out at all (except if you trigger the rollout e.g. with rolloutAfter which is not implemented as of today, not sure if there any alternatives to trigger that going forward). But making it immutable via a webhook is also not really a solution for that. So probably comes down to documenting this and getting rolloutAfter implemented in the fullness of time.
I'm not sure I follow. Are we talking about continuously reconciling node.cluster.x-k8s.io/* + node-role.kubernetes.io/* + node-restriction.kubernetes.io/*? If yes, how do we resolve this issue surfaced in the proposal?
Or is the idea that for now it's okay as it's effectively only done on create because we don't have in-place propagation and when we implement inline-propagation we will adjust the Machine=>Node sync accordingly? |
I was suggesting this and deviate from current proposal a bit and reconcile them all anytime in the lifecycle of the machine, If you don't want CAPI to take authoritative ownership of these prefixes, then just don't set the labels. |
5cc2e73
to
83633c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just took a quick look
updated, tests currently failing because of kubernetes/client-go#992 |
That's fun. I would assume that the client-go issue isn't resolved quick enough that we wouldn't be significantly blocked for this feature. Given that, I would suggest we look for the best way to "fixup" the tests in a pragmatic way. Just a first suggestion, what if we add a flag to the reconciler "disableNodeLabelSync" that we only set for our tests (+ corresponding godoc with link to the client-go issue of course) (Otherwise lgtm +/- golangci-lint) |
Makes sense, updated. I actually think of this as a feature where we give users more control over when to update labels in a node pool by explicitly signalling it. |
e48114e
to
9a4ffef
Compare
@enxebre looks like the linter has some findings |
Thanks! fixed. Given the current PR implementation via SSA, I think we might want to re-consider again extending the functionality to enable arbitrary labels as API input. I think it should just work with SSA and it would enable easier adoption for use cases where arbitrary labels are dictating workloads topology distribution/affinity via gitops or so. That could be an additive follow up if we wanted to thought. Thoughts? |
I'm not sure if we would want to take all Machine labels and then propagate them through to the Node. Right now it's relatively obvious which ones are also going to the Nodes. I think if we want to consider propagating aribtrary labels through I would prefer doing this via a separate "node labels" field of some sort. I'm open to discussing this in a follow-up issue, but I would keep the scope of the current PR to what we decided in the proposal. |
Thx! |
LGTM label has been added. Git tree hash: f824a33f165ff6e31bae16ac2f7bc05da6587ec0
|
Right, I was referring to reconsider having a dedicated field for label sync in Machines given that with field input + SSA we cover current PR use case but also enable any other. Yeh let's leave that discussion aside from this PR 👍🏾. |
/assign @fabriziopandini @ykakarap PTAL. I think this is pretty close to merge-ready. |
/lgtm Looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -91,6 +91,7 @@ type Reconciler struct { | |||
// nodeDeletionRetryTimeout determines how long the controller will retry deleting a node | |||
// during a single reconciliation. | |||
nodeDeletionRetryTimeout time.Duration | |||
disableNodeLabelSync bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up let's open an issue to track the migration of tests to envclient and the removal of this flag (another option might be to split up the func and then unit test the individual parts vs testing the entire controller).
In the meantime, we can use a follow-up PR to add a comment on this field and make it explicit that it is only for allowing partial testing of the controller logic using the fake client (because features based on SSA cannot be tested with it).
cc @ykakarap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fabriziopandini The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
Implements part of #6255. Propagate managed labels from Machines to Nodes.
/hold
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #