-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign device plugin reset #747
Redesign device plugin reset #747
Conversation
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
Pull Request Test Coverage Report for Build 11775513002Details
💛 - Coveralls |
a3e8a58
to
ba8d6ef
Compare
ba8d6ef
to
ef25135
Compare
@zeeke @adrianchiris @ykulazhenkov when you have time please take a look on this :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments
d719a8c
to
0981cec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Overall design looks good to me but we must be super sure about all of these annotation states. I'm pretty scared of having a deadlock in some production cluster :)
0981cec
to
a71ec00
Compare
146b706
to
95e9d84
Compare
8152db1
to
e2e2ad2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review.
left with drain controller and its helper.
its hard to review added logic since a bunch of code was moved while new code added in existing functions in the same commit :\
pkg/utils/cluster.go
Outdated
} | ||
|
||
if newObj.GetLabels()[key] != value { | ||
log.Log.V(2).Info("LabelObject(): Annotate object", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Annotate -> Label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/utils/cluster.go
Outdated
err := c.Patch(ctx, | ||
newObj, patch) | ||
if err != nil { | ||
log.Log.Error(err, "annotateObject(): Failed to patch object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: fix func name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/utils/cluster.go
Outdated
@@ -161,3 +162,40 @@ func AnnotateNode(ctx context.Context, nodeName string, key, value string, c cli | |||
|
|||
return AnnotateObject(ctx, node, key, value, c) | |||
} | |||
|
|||
// LabelObject adds label to a kubernetes object | |||
func LabelObject(ctx context.Context, obj client.Object, key, value string, c client.Client) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: keep it internal to the package for now ? since you only use LabelNode in controller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
pkg/platforms/openshift/openshift.go
Outdated
@@ -228,6 +228,18 @@ func (c *openshiftContext) OpenshiftAfterCompleteDrainNode(ctx context.Context, | |||
return false, err | |||
} | |||
|
|||
value, exist := mcp.Annotations[consts.MachineConfigPoolPausedAnnotation] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this change related to this PR ?
if not, do you prefer to submit a separate PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it from this PR
@@ -67,12 +67,17 @@ const ( | |||
MachineConfigPoolPausedAnnotationIdle = "Idle" | |||
MachineConfigPoolPausedAnnotationPaused = "Paused" | |||
|
|||
SriovDevicePluginLabel = "sriovnetwork.openshift.io/device-plugin" | |||
SriovDevicePluginLabelEnabled = "Enabled" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use lower case for label values ?
no special preference on my end. but i seldom see label values that are CamelCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copy the same we have on all the others
https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/747/files#diff-0e683dcc8c5eeb8433c366296d8cb81d12ff06fea8a08637949e040a3aa334e3R78
controllers/helper.go
Outdated
@@ -185,42 +173,17 @@ func syncPluginDaemonObjs(ctx context.Context, | |||
data.Data["ReleaseVersion"] = os.Getenv("RELEASEVERSION") | |||
data.Data["ResourcePrefix"] = vars.ResourcePrefix | |||
data.Data["ImagePullSecrets"] = GetImagePullSecrets() | |||
data.Data["NodeSelectorField"] = GetDefaultNodeSelector() | |||
data.Data["NodeSelectorField"] = GetDefaultNodeSelectorForDevicePlugin() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we merge this with the labels we set on sriov network config daemon ?
that way we have sriov device plugin deployed wherever sriov network config daemon is deployed AND has the device-plugin enabled label.
just in case we have leftovers OR we changed the selector for config daemon.
also should we clean this label (and possibly some annot while at it) from node obj if config daemon is not targeting them ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
General comment: to me it seems (much ?) simpler that the config daemon will be in charge of updating its own node label to evict DP pod. the drain controller will then handle only what it was supposed to do originally (handle cordon/drain related operations) we should discuss this. |
e2e2ad2
to
31c7fee
Compare
3d20dff
to
46affb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
great design work!
always deploy sriov network device plugin and use a label to enable or disable it on the nodes Signed-off-by: Sebastian Sch <[email protected]>
46affb0
to
baa41c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. LGTM
This commit introduces a new redesign on how the operator resets the device plugin