-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change spec.Parallel field with spec.MaxUnavailable #715
Conversation
Signed-off-by: Radim Hrazdil <[email protected]>
936ae80
to
6835afa
Compare
6835afa
to
b9b1be6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before reviewing this I think we have to put default to "50%" and remove parallel lane, what do you think ?
b68843e
to
f64fbf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass, skipped tests.
@@ -12,18 +14,20 @@ type NodeNetworkConfigurationPolicySpec struct { | |||
// The desired configuration of the policy | |||
DesiredState State `json:"desiredState,omitempty"` | |||
|
|||
// When set to true, changes are applied to all nodes in parallel | |||
// MaxUnavailable specifies percentage or number | |||
// of machines that can be updating at a time. Default is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make default to 50% so we have a balance between safe and fast, at ci this means
- normal lane force MAX_UNAVAILABLE to 1
- parallel lane test with default (with have to be 50%)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can set default to 50%, but in CI we only have 3 nodes, so we'll need to force parallel lane to 75% to have 2 nodes working in parallel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done:
- normal lane: NMSTATE_MAXUNAVAILABLE=1
- parallel lane: NMSTATE_MAXUNAVAILABLE=2 (for the reason above, 50% would only be 1 node)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, let's do a follow up PR with KUBEVIRT_NUM_NODES=5 (1 master + 4 workers) at ci since most of the test NNCP are applied towards workers and keep just one lane.
Since we will have just one lane even with 5 nodes we will consume less resources (2 lanes * 3 nodes = 6 nodes)
@@ -53,6 +54,10 @@ import ( | |||
"k8s.io/apimachinery/pkg/types" | |||
) | |||
|
|||
const ( | |||
DEFAULT_MAXUNAVAILABLE = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto "50%"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
maxUnavailable: | ||
anyOf: | ||
- type: integer | ||
- type: string | ||
description: MaxUnavailable specifies percentage or number of machines | ||
that can be updating at a time. Default is 1. | ||
x-kubernetes-int-or-string: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this generated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it is
if policy.Spec.MaxUnavailable != nil { | ||
intOrPercent = *policy.Spec.MaxUnavailable | ||
} | ||
maxUnavailable, err := intstr.GetValueFromIntOrPercent(&intOrPercent, len(nmstateNodes), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we have to use GetScaledValueFromIntOrPercent
since GetValueFromIntOrPercent
is being deprecated,
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/intstr/intstr.go#L141-L168
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right! I didn't see that because I had old vendor packages. Done
if err != nil { | ||
if apierrors.IsConflict(err) { | ||
return ctrl.Result{RequeueAfter: nodeRunningUpdateRetryTime}, err | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this else since previous is a return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
enactmentCount, err := r.enactmentsCountsByPolicy(instance) | ||
if err != nil { | ||
log.Error(err, "Error getting enactment counts") | ||
return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't return the error ? It will call Reconcile again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, returning error triggers reconcile again
https://cluster-api.sigs.k8s.io/developer/providers/implementers-guide/controllers_and_reconciliation.html#reconciliation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we really don't want Reconcile to be called again ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, indeed we should reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
err = r.Client.Status().Update(context.TODO(), policy) | ||
if err != nil { | ||
return err | ||
} | ||
return nil | ||
} | ||
|
||
func (r *NodeNetworkConfigurationPolicyReconciler) releaseNodeRunningUpdate(policyKey types.NamespacedName) { | ||
func (r *NodeNetworkConfigurationPolicyReconciler) releaseNodeRunningUpdate(policy *nmstatev1beta1.NodeNetworkConfigurationPolicy) { | ||
policyKey := types.NamespacedName{Name: policy.GetName(), Namespace: policy.GetNamespace()} | ||
instance := &nmstatev1beta1.NodeNetworkConfigurationPolicy{} | ||
_ = retry.RetryOnConflict(retry.DefaultRetry, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let at least print the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
f64fbf1
to
2f870ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass, just nits, e2e tests looks fine.
docs/user-guide/102-configuration.md
Outdated
In such a case, `maxUnavailable` can be used to define portion size of a cluster | ||
that can apply a policy configuration concurrently. | ||
MaxUnavailable specifies percentage or a constant number of nodes that | ||
can be progressing a policy at a time. Default is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be progressing a policy at a time. Default is 1. | |
can be progressing a policy at a time. Default is "50%". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, misssed that. Fixed
pkg/policyconditions/conditions.go
Outdated
@@ -86,7 +86,7 @@ func SetPolicyFailedToConfigure(conditions *nmstate.ConditionList, message strin | |||
) | |||
} | |||
|
|||
func nodesRunningNmstate(cli client.Client) ([]corev1.Node, error) { | |||
func NodesRunningNmstate(cli client.Client) ([]corev1.Node, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do some boy scout here and mov this to pkg/node
with something like node.FindNmstateRunning
since this has nothing to do with policyconditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
2f870ec
to
a9dd869
Compare
Change parallel field with maxUnavailable, which allows more granular configuration for concurrent policy configuration. MaxUnavailable can be either set with an int value or percentage in string: spec: maxUnavailable: "50%" or spec: maxUnavailable: 3 Updated used guide and added an example policy that specifies maxUnavailable. Signed-off-by: Radim Hrazdil <[email protected]>
a9dd869
to
79736c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit at function name.
return ctrl.Result{}, nil | ||
} | ||
|
||
err = r.claimNodeRunningUpdate(instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to change the name of this function now, to something like increaseUnavailableNodeCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
force-pushed changes to the said function name, also updated related error messages to be aligned with the code
Update test that check nnce conditions to tolerate both failing and aborted conditions when invalid nncp is created Signed-off-by: Radim Hrazdil <[email protected]>
79736c6
to
955c66c
Compare
@rhrazdil: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do the CI cleanup at follow up
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: qinqon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Radim Hrazdil [email protected]
Updated used guide and added an example policy
that specifies maxUnavailable.
Is this a BUG FIX or a FEATURE ?:
What this PR does / why we need it:
Change parallel field with maxUnavailable, which allows
more granular configuration for concurrent policy
configuration.
Default behaviour remains unchanged as maxUnavailable
is by default set to 1.
MaxUnavailable can be either set with an int value or
percentage in string:
or
Special notes for your reviewer:
Release note: