Change spec.Parallel field with spec.MaxUnavailable #715

rhrazdil · 2021-03-15T09:44:32Z

Signed-off-by: Radim Hrazdil [email protected]

Updated used guide and added an example policy
that specifies maxUnavailable.

Is this a BUG FIX or a FEATURE ?:

Uncomment only one, leave it on its own line:

/kind bug
/kind enhancement

What this PR does / why we need it:

Change parallel field with maxUnavailable, which allows
more granular configuration for concurrent policy
configuration.

Default behaviour remains unchanged as maxUnavailable
is by default set to 1.

MaxUnavailable can be either set with an int value or
percentage in string:

spec:
  maxUnavailable: "50%"

or

spec:
  maxUnavailable: 3

Special notes for your reviewer:

Release note:

spec.parallel field is dropped from the API, spec.maxUnavailable is added as s substitute

Signed-off-by: Radim Hrazdil <[email protected]>

qinqon

Before reviewing this I think we have to put default to "50%" and remove parallel lane, what do you think ?

qinqon

First pass, skipped tests.

qinqon · 2021-03-16T12:56:43Z

api/shared/nodenetworkconfigurationpolicy_types.go

@@ -12,18 +14,20 @@ type NodeNetworkConfigurationPolicySpec struct {
 	// The desired configuration of the policy
 	DesiredState State `json:"desiredState,omitempty"`

-	// When set to true, changes are applied to all nodes in parallel
+	// MaxUnavailable specifies percentage or number
+	// of machines that can be updating at a time. Default is 1.


Let's make default to 50% so we have a balance between safe and fast, at ci this means

normal lane force MAX_UNAVAILABLE to 1

parallel lane test with default (with have to be 50%)

I can set default to 50%, but in CI we only have 3 nodes, so we'll need to force parallel lane to 75% to have 2 nodes working in parallel

Done:

normal lane: NMSTATE_MAXUNAVAILABLE=1

parallel lane: NMSTATE_MAXUNAVAILABLE=2 (for the reason above, 50% would only be 1 node)

Ack, let's do a follow up PR with KUBEVIRT_NUM_NODES=5 (1 master + 4 workers) at ci since most of the test NNCP are applied towards workers and keep just one lane.

Since we will have just one lane even with 5 nodes we will consume less resources (2 lanes * 3 nodes = 6 nodes)

qinqon · 2021-03-16T12:57:27Z

controllers/nodenetworkconfigurationpolicy_controller.go

@@ -53,6 +54,10 @@ import (
 	"k8s.io/apimachinery/pkg/types"
 )

+const (
+	DEFAULT_MAXUNAVAILABLE = 1


ditto "50%"

qinqon · 2021-03-16T14:24:42Z

deploy/crds/nmstate.io_nodenetworkconfigurationpolicies.yaml

+              maxUnavailable:
+                anyOf:
+                - type: integer
+                - type: string
+                description: MaxUnavailable specifies percentage or number of machines
+                  that can be updating at a time. Default is 1.
+                x-kubernetes-int-or-string: true


Is this generated ?

qinqon · 2021-03-16T14:28:22Z

controllers/nodenetworkconfigurationpolicy_controller.go

+	if policy.Spec.MaxUnavailable != nil {
+		intOrPercent = *policy.Spec.MaxUnavailable
+	}
+	maxUnavailable, err := intstr.GetValueFromIntOrPercent(&intOrPercent, len(nmstateNodes), true)


Looks like we have to use GetScaledValueFromIntOrPercent since GetValueFromIntOrPercent is being deprecated,
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/intstr/intstr.go#L141-L168

Right! I didn't see that because I had old vendor packages. Done

qinqon · 2021-03-16T14:38:13Z

controllers/nodenetworkconfigurationpolicy_controller.go

+	if err != nil {
+		if apierrors.IsConflict(err) {
+			return ctrl.Result{RequeueAfter: nodeRunningUpdateRetryTime}, err
+		} else {


we don't need this else since previous is a return

qinqon · 2021-03-16T14:39:08Z

controllers/nodenetworkconfigurationpolicy_controller.go

+	enactmentCount, err := r.enactmentsCountsByPolicy(instance)
+	if err != nil {
+		log.Error(err, "Error getting enactment counts")
+		return ctrl.Result{}, nil


We don't return the error ? It will call Reconcile again.

AFAIK, returning error triggers reconcile again
https://cluster-api.sigs.k8s.io/developer/providers/implementers-guide/controllers_and_reconciliation.html#reconciliation

But we really don't want Reconcile to be called again ?

Hmm, indeed we should reconcile.

qinqon · 2021-03-16T14:44:53Z

controllers/nodenetworkconfigurationpolicy_controller.go

 	err = r.Client.Status().Update(context.TODO(), policy)
 	if err != nil {
 		return err
 	}
 	return nil
 }

-func (r *NodeNetworkConfigurationPolicyReconciler) releaseNodeRunningUpdate(policyKey types.NamespacedName) {
+func (r *NodeNetworkConfigurationPolicyReconciler) releaseNodeRunningUpdate(policy *nmstatev1beta1.NodeNetworkConfigurationPolicy) {
+	policyKey := types.NamespacedName{Name: policy.GetName(), Namespace: policy.GetNamespace()}
 	instance := &nmstatev1beta1.NodeNetworkConfigurationPolicy{}
 	_ = retry.RetryOnConflict(retry.DefaultRetry, func() error {


Let at least print the error

qinqon

Second pass, just nits, e2e tests looks fine.

qinqon · 2021-03-17T10:02:03Z

docs/user-guide/102-configuration.md

+In such a case, `maxUnavailable` can be used to define portion size of a cluster
+that can apply a policy configuration concurrently.
+MaxUnavailable specifies percentage or a constant number of nodes that
+can be progressing a policy at a time. Default is 1.


Suggested change

can be progressing a policy at a time. Default is 1.

can be progressing a policy at a time. Default is "50%".

Thanks, misssed that. Fixed

qinqon · 2021-03-17T10:04:16Z

pkg/policyconditions/conditions.go

@@ -86,7 +86,7 @@ func SetPolicyFailedToConfigure(conditions *nmstate.ConditionList, message strin
 	)
 }

-func nodesRunningNmstate(cli client.Client) ([]corev1.Node, error) {
+func NodesRunningNmstate(cli client.Client) ([]corev1.Node, error) {


Can we do some boy scout here and mov this to pkg/node with something like node.FindNmstateRunning since this has nothing to do with policyconditions

Change parallel field with maxUnavailable, which allows more granular configuration for concurrent policy configuration. MaxUnavailable can be either set with an int value or percentage in string: spec: maxUnavailable: "50%" or spec: maxUnavailable: 3 Updated used guide and added an example policy that specifies maxUnavailable. Signed-off-by: Radim Hrazdil <[email protected]>

qinqon

Small nit at function name.

qinqon · 2021-03-18T06:32:16Z

controllers/nodenetworkconfigurationpolicy_controller.go

+		return ctrl.Result{}, nil
+	}
+
+	err = r.claimNodeRunningUpdate(instance)


We have to change the name of this function now, to something like increaseUnavailableNodeCount

force-pushed changes to the said function name, also updated related error messages to be aligned with the code

Update test that check nnce conditions to tolerate both failing and aborted conditions when invalid nncp is created Signed-off-by: Radim Hrazdil <[email protected]>

kubevirt-bot · 2021-03-19T08:55:19Z

@rhrazdil: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
pull-kubernetes-nmstate-e2e-handler-k8s-future	`955c66c`	link	`/test pull-kubernetes-nmstate-e2e-handler-k8s-future`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

qinqon

Let's do the CI cleanup at follow up

qinqon · 2021-03-22T10:05:27Z

/approve

kubevirt-bot · 2021-03-22T10:05:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [qinqon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kubevirt-bot requested review from oshoval and RamLavi March 15, 2021 09:45

sync vendor

b039a26

Signed-off-by: Radim Hrazdil <[email protected]>

rhrazdil force-pushed the maxunavailable branch from 936ae80 to 6835afa Compare March 16, 2021 09:37

kubevirt-bot added size/XL and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Mar 16, 2021

rhrazdil force-pushed the maxunavailable branch from 6835afa to b9b1be6 Compare March 16, 2021 09:57

qinqon requested changes Mar 16, 2021

View reviewed changes

kubevirt-bot assigned qinqon Mar 16, 2021

rhrazdil force-pushed the maxunavailable branch from b68843e to f64fbf1 Compare March 16, 2021 14:06

rhrazdil mentioned this pull request Mar 16, 2021

Policy rollout in chunks of custom size #652

Closed

qinqon requested changes Mar 16, 2021

View reviewed changes

rhrazdil force-pushed the maxunavailable branch from f64fbf1 to 2f870ec Compare March 17, 2021 08:00

qinqon reviewed Mar 17, 2021

View reviewed changes

rhrazdil force-pushed the maxunavailable branch from 2f870ec to a9dd869 Compare March 17, 2021 15:23

rhrazdil force-pushed the maxunavailable branch from a9dd869 to 79736c6 Compare March 17, 2021 15:26

qinqon requested changes Mar 18, 2021

View reviewed changes

Update tests that check conditions

955c66c

Update test that check nnce conditions to tolerate both failing and aborted conditions when invalid nncp is created Signed-off-by: Radim Hrazdil <[email protected]>

rhrazdil force-pushed the maxunavailable branch from 79736c6 to 955c66c Compare March 19, 2021 08:31

qinqon approved these changes Mar 22, 2021

View reviewed changes

kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2021

kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2021

kubevirt-bot merged commit a0c316b into nmstate:master Mar 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change spec.Parallel field with spec.MaxUnavailable #715

Change spec.Parallel field with spec.MaxUnavailable #715

rhrazdil commented Mar 15, 2021

qinqon left a comment

qinqon left a comment

qinqon Mar 16, 2021

rhrazdil Mar 16, 2021

rhrazdil Mar 17, 2021

qinqon Mar 17, 2021 •

edited

Loading

qinqon Mar 16, 2021

rhrazdil Mar 17, 2021

qinqon Mar 16, 2021

rhrazdil Mar 16, 2021

qinqon Mar 16, 2021

rhrazdil Mar 17, 2021

qinqon Mar 16, 2021

rhrazdil Mar 17, 2021

qinqon Mar 16, 2021

rhrazdil Mar 16, 2021

qinqon Mar 17, 2021

rhrazdil Mar 18, 2021

rhrazdil Mar 18, 2021

qinqon Mar 16, 2021

rhrazdil Mar 17, 2021

qinqon left a comment

qinqon Mar 17, 2021

rhrazdil Mar 18, 2021

qinqon Mar 17, 2021

rhrazdil Mar 18, 2021

qinqon left a comment

qinqon Mar 18, 2021

rhrazdil Mar 19, 2021

kubevirt-bot commented Mar 19, 2021

qinqon left a comment

qinqon commented Mar 22, 2021

kubevirt-bot commented Mar 22, 2021

	can be progressing a policy at a time. Default is 1.
	can be progressing a policy at a time. Default is "50%".

Change spec.Parallel field with spec.MaxUnavailable #715

Change spec.Parallel field with spec.MaxUnavailable #715

Conversation

rhrazdil commented Mar 15, 2021

qinqon left a comment

Choose a reason for hiding this comment

qinqon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinqon Mar 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinqon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinqon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kubevirt-bot commented Mar 19, 2021

qinqon left a comment

Choose a reason for hiding this comment

qinqon commented Mar 22, 2021

kubevirt-bot commented Mar 22, 2021

qinqon Mar 17, 2021 •

edited

Loading