Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue node drain after reboot #232

Merged
merged 2 commits into from
Jan 27, 2022

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Jan 20, 2022

Since drain operation started we don't need to requires
drain lock for this node because node already has required
annotation.

It's safe to continue node drain procedure without lock.

Closes: #230

Signed-off-by: Ivan Kolodyazhny [email protected]

Since drain operation started we don't need to requires
drain lock for this node because node already has required
annotation.

It's safe to continue node drain procedure without lock.

Closes: k8snetworkplumbingwg#230

Signed-off-by: Ivan Kolodyazhny <[email protected]>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne
Copy link
Collaborator Author

e0ne commented Jan 20, 2022

Let's keep this PR in a WIP state because it requires additional testing, unit-tests implementation and most probably additional fixes

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

pkg/daemon/daemon.go Outdated Show resolved Hide resolved
pkg/daemon/daemon.go Outdated Show resolved Hide resolved
@e0ne e0ne force-pushed the multiple-policies branch from fdd02ab to 8f52b7c Compare January 20, 2022 12:27
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@pliurh
Copy link
Collaborator

pliurh commented Jan 21, 2022

I tested this patch on OCP. It works.

if err := dn.completeDrain(); err != nil {
glog.Errorf("nodeStateSyncHandler(): failed to complete draining: %v", err)
return err
}
} else if !ok {
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are now calling dn.annotateNode() in the else clause even if annotation is already set. this will invoke an extra GET to k8s API.

this IMO should be called like before only if annotation is not set WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only two places where we add 'Idle' annotation: the line above and 'completeDrain' function, so IMO it doesn't make sense to check for it.

Probably we need to refactor the part of code where we do annotations to make it more clear

Copy link
Collaborator

@adrianchiris adrianchiris Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just to save a call to k8s API in annotateNode()

Think of a case where nodeStateSyncHandler runs but (for some reason) it does not require drain and the sriov-node-state annotation is already set to idle. in this case we dont really need to call annotateNode.

not sure how often we might hit this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will fix it into 'annotateNode' function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a corner case where dn.node is not the latest if not get from annotateNode call?

Copy link
Collaborator

@adrianchiris adrianchiris Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a corner case where dn.node is not the latest if not get from annotateNode call?

dn.node will get updated via informer. it will eventually be consistent (thats how config daemon is designed).

Copy link
Collaborator

@adrianchiris adrianchiris Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will fix it into 'annotateNode' function

I think this else clause aims to add idle annotation if annotation does not exist in dn.node. i prefer to keep this logic
(add a check : hasSriovNodeStateAnnot()) or similar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a corner case where dn.node is not the latest if not get from annotateNode call?

dn.node will get updated via informer.

I was thinking if there is delay between informer receive the node update event and dn.node be update
or a delay between node being update in api server and informer receive the node update event.

In the above case, dn.node is not the latest, which may result in inaccurate return from nodeHasAnnotation? (just thinking the possibility, I didn't test myself or it may be easy to test)

it will eventually be consistent (thats how config daemon is designed).

If the nodeHasAnnotation returns incorrectly, I don't see how it can resolve by itself, guess the node will still maintain its old annotation forever unless new policy or state event?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the nodeHasAnnotation returns incorrectly, I don't see how it can resolve by itself, guess the node will still maintain its old annotation forever unless new policy or state event?

i dont see how its different from what we had before, the change just moved the logic to a separate function.

your concern is valid theoretically, we have not encountered this. and its general for config daemon design.

@zshi-redhat
Copy link
Collaborator

/cc @SchSeba

@github-actions github-actions bot requested a review from SchSeba January 24, 2022 10:18
@e0ne e0ne changed the title WIP. Continue node drain after reboot Continue node drain after reboot Jan 24, 2022
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

pkg/daemon/daemon.go Outdated Show resolved Hide resolved
@e0ne e0ne force-pushed the multiple-policies branch from 687cbcb to 031dcbc Compare January 25, 2022 14:32
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@adrianchiris
Copy link
Collaborator

/test-all

@Eoghan1232
Copy link
Collaborator

/lgtm

@github-actions github-actions bot added the lgtm label Jan 26, 2022
@adrianchiris adrianchiris merged commit fb8ced0 into k8snetworkplumbingwg:master Jan 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple SriovNetworkNodePolicy creation fails
5 participants