-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-13558: fix: ensure LVMVolumeGroupNodeStatus is removed by dedicated cleanup controller in case of multi-node #372
Conversation
Skipping CI for Draft Pull Request. |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
d9a86cb
to
41f47ba
Compare
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test all |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #372 +/- ##
===========================================
+ Coverage 16.59% 57.01% +40.42%
===========================================
Files 24 28 +4
Lines 2061 2138 +77
===========================================
+ Hits 342 1219 +877
+ Misses 1693 828 -865
- Partials 26 91 +65
|
/test all |
/hold as testing revealed that I potentially introduce a delete issue |
seems like there is an edge case sometimes where the vg is not removed. not sure why, but unrelated to the PR. when I can reproduce I will open a separate issue. |
/hold |
acc4927
to
3982605
Compare
/jira refresh |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-13558, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
/cc @suleymanakbas91 to get another opinion on that controller approach. |
59b26bc
to
840c0de
Compare
840c0de
to
827e320
Compare
/test all |
/test verify |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jakobmoellerdev, suleymanakbas91 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…controller in case of multi-node
827e320
to
422d3ae
Compare
/lgtm |
@jakobmoellerdev: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@jakobmoellerdev: Jira Issue OCPBUGS-13558: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-13558 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
In case the operator gets deployed in a multi-node environemnt the
LVMVolumeGroupNodeStatus
is orphaned when a node is removed. This is unintended. We need a logic that can succesfully remove this NodeStatus whenever there is a removed Node.Unfortunately since the status update of LVMCluster is currently assuming the correct presence of the
LVMVolumeGroupNodeStatus
, the only way I found to fix this issue would be toLVMVolumeGroupNodeStatus
in the status update ofLVMCluster
in case the node does not exist anymore and the Status check finds an orphaned status without a node. potentially expensive since nowLVMCluster
has to compare all nodes and check if the LVMNodeVolumeStatus exists. Also the removal will be delayed from the Node.LVMVolumeGroupNodeStatus
in a new reconcile loop that listens on node changes and uses a finalizer to protect the node deletion until we have theLVMVolumeGroupNodeStatus
cleaned up. This is the "clean" solution to how we should handle deletion, however it should be noted This reconcile loop would only need to run outside of SNO, in SNO it can be disabled. Danger here is that if the finalizer is not removed properly, node removal can fail. Good thing is that in theory we can remove the created vgs from the node easily with that hook later on if necessaryCurrently the PR uses a new controller.
Currently the main integration test was changed to now remove the node (which triggers the Status object removal) instead of removing the object directly which automatically covers this use case.