Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change spec.Parallel field with spec.MaxUnavailable #715

Merged
merged 3 commits into from
Mar 22, 2021

Conversation

rhrazdil
Copy link
Collaborator

Signed-off-by: Radim Hrazdil [email protected]

Updated used guide and added an example policy
that specifies maxUnavailable.

Is this a BUG FIX or a FEATURE ?:

Uncomment only one, leave it on its own line:

/kind bug
/kind enhancement

What this PR does / why we need it:

Change parallel field with maxUnavailable, which allows
more granular configuration for concurrent policy
configuration.

Default behaviour remains unchanged as maxUnavailable
is by default set to 1.

MaxUnavailable can be either set with an int value or
percentage in string:

spec:
  maxUnavailable: "50%"

or

spec:
  maxUnavailable: 3

Special notes for your reviewer:

Release note:

spec.parallel field is dropped from the API, spec.maxUnavailable is added as s substitute

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/enhancement dco-signoff: yes Indicates the PR's author has DCO signed all their commits. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Mar 15, 2021
@kubevirt-bot kubevirt-bot requested review from oshoval and RamLavi March 15, 2021 09:45
Signed-off-by: Radim Hrazdil <[email protected]>
@kubevirt-bot kubevirt-bot added size/XL and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Mar 16, 2021
Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before reviewing this I think we have to put default to "50%" and remove parallel lane, what do you think ?

Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, skipped tests.

@@ -12,18 +14,20 @@ type NodeNetworkConfigurationPolicySpec struct {
// The desired configuration of the policy
DesiredState State `json:"desiredState,omitempty"`

// When set to true, changes are applied to all nodes in parallel
// MaxUnavailable specifies percentage or number
// of machines that can be updating at a time. Default is 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make default to 50% so we have a balance between safe and fast, at ci this means

  • normal lane force MAX_UNAVAILABLE to 1
  • parallel lane test with default (with have to be 50%)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can set default to 50%, but in CI we only have 3 nodes, so we'll need to force parallel lane to 75% to have 2 nodes working in parallel

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

  • normal lane: NMSTATE_MAXUNAVAILABLE=1
  • parallel lane: NMSTATE_MAXUNAVAILABLE=2 (for the reason above, 50% would only be 1 node)

Copy link
Member

@qinqon qinqon Mar 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, let's do a follow up PR with KUBEVIRT_NUM_NODES=5 (1 master + 4 workers) at ci since most of the test NNCP are applied towards workers and keep just one lane.

Since we will have just one lane even with 5 nodes we will consume less resources (2 lanes * 3 nodes = 6 nodes)

@@ -53,6 +54,10 @@ import (
"k8s.io/apimachinery/pkg/types"
)

const (
DEFAULT_MAXUNAVAILABLE = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto "50%"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 52 to 58
maxUnavailable:
anyOf:
- type: integer
- type: string
description: MaxUnavailable specifies percentage or number of machines
that can be updating at a time. Default is 1.
x-kubernetes-int-or-string: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this generated ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is

if policy.Spec.MaxUnavailable != nil {
intOrPercent = *policy.Spec.MaxUnavailable
}
maxUnavailable, err := intstr.GetValueFromIntOrPercent(&intOrPercent, len(nmstateNodes), true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have to use GetScaledValueFromIntOrPercent since GetValueFromIntOrPercent is being deprecated,
https://github.com/kubernetes/apimachinery/blob/master/pkg/util/intstr/intstr.go#L141-L168

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! I didn't see that because I had old vendor packages. Done

if err != nil {
if apierrors.IsConflict(err) {
return ctrl.Result{RequeueAfter: nodeRunningUpdateRetryTime}, err
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this else since previous is a return

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

enactmentCount, err := r.enactmentsCountsByPolicy(instance)
if err != nil {
log.Error(err, "Error getting enactment counts")
return ctrl.Result{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't return the error ? It will call Reconcile again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we really don't want Reconcile to be called again ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, indeed we should reconcile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

err = r.Client.Status().Update(context.TODO(), policy)
if err != nil {
return err
}
return nil
}

func (r *NodeNetworkConfigurationPolicyReconciler) releaseNodeRunningUpdate(policyKey types.NamespacedName) {
func (r *NodeNetworkConfigurationPolicyReconciler) releaseNodeRunningUpdate(policy *nmstatev1beta1.NodeNetworkConfigurationPolicy) {
policyKey := types.NamespacedName{Name: policy.GetName(), Namespace: policy.GetNamespace()}
instance := &nmstatev1beta1.NodeNetworkConfigurationPolicy{}
_ = retry.RetryOnConflict(retry.DefaultRetry, func() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let at least print the error

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass, just nits, e2e tests looks fine.

In such a case, `maxUnavailable` can be used to define portion size of a cluster
that can apply a policy configuration concurrently.
MaxUnavailable specifies percentage or a constant number of nodes that
can be progressing a policy at a time. Default is 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
can be progressing a policy at a time. Default is 1.
can be progressing a policy at a time. Default is "50%".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, misssed that. Fixed

@@ -86,7 +86,7 @@ func SetPolicyFailedToConfigure(conditions *nmstate.ConditionList, message strin
)
}

func nodesRunningNmstate(cli client.Client) ([]corev1.Node, error) {
func NodesRunningNmstate(cli client.Client) ([]corev1.Node, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do some boy scout here and mov this to pkg/node with something like node.FindNmstateRunning since this has nothing to do with policyconditions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Change parallel field with maxUnavailable, which allows
more granular configuration for concurrent policy
configuration.

MaxUnavailable can be either set with an int value or
percentage in string:

spec:
  maxUnavailable: "50%"

or

spec:
  maxUnavailable: 3

Updated used guide and added an example policy
that specifies maxUnavailable.

Signed-off-by: Radim Hrazdil <[email protected]>
Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit at function name.

return ctrl.Result{}, nil
}

err = r.claimNodeRunningUpdate(instance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to change the name of this function now, to something like increaseUnavailableNodeCount

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force-pushed changes to the said function name, also updated related error messages to be aligned with the code

Update test that check nnce conditions to tolerate both
failing and aborted conditions when invalid nncp is created

Signed-off-by: Radim Hrazdil <[email protected]>
@kubevirt-bot
Copy link
Collaborator

@rhrazdil: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-nmstate-e2e-handler-k8s-future 955c66c link /test pull-kubernetes-nmstate-e2e-handler-k8s-future

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do the CI cleanup at follow up

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Mar 22, 2021
@qinqon
Copy link
Member

qinqon commented Mar 22, 2021

/approve

@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2021
@kubevirt-bot kubevirt-bot merged commit a0c316b into nmstate:master Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/enhancement lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants