Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPEDGE-1165: fix: Ensure no racy CSI plugin registration, better startup behavior for vgmanager, failing test summary artifact #642

Merged
merged 11 commits into from
Jun 26, 2024

Conversation

jakobmoellerdev
Copy link
Contributor

@jakobmoellerdev jakobmoellerdev commented Jun 25, 2024

This stops vgmanager from going healthy too early as we need to signal healthiness only when the driver has been started. Fixes the racy registration server that now properly starts and shuts down and is part of the healthiness check we use in vgmanager. That will ensure that no vgmanager gets ready that isnt registered in the kubelet. Ensures that vgmanager pods are gone before LVMCluster is freed.

Adds an artifact into the e2e step on failure that can be used to see the immediate LVMS cluster state when a failure occurs in case of a cleanup in the post failure steps.

Validates in tests that all CSINodes have proper registrations for vgmanager so we dont have weird flakes in the future.

@jakobmoellerdev
Copy link
Contributor Author

/test all

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2024
Copy link
Contributor

openshift-ci bot commented Jun 25, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 25, 2024
Copy link
Contributor

openshift-ci bot commented Jun 25, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jakobmoellerdev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2024

Codecov Report

Attention: Patch coverage is 24.48980% with 74 lines in your changes missing coverage. Please review.

Project coverage is 58.50%. Comparing base (b0c38da) to head (65137a4).
Report is 6 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #642      +/-   ##
==========================================
+ Coverage   58.00%   58.50%   +0.49%     
==========================================
  Files          54       54              
  Lines        4117     4258     +141     
==========================================
+ Hits         2388     2491     +103     
- Misses       1492     1521      +29     
- Partials      237      246       +9     
Files Coverage Δ
...rollers/lvmcluster/resource/vgmanager_daemonset.go 97.20% <100.00%> (+0.16%) ⬆️
internal/csi/grpc_runner.go 0.00% <0.00%> (ø)
...trollers/lvmcluster/resource/topolvm_csi_driver.go 76.00% <30.00%> (-10.05%) ⬇️
...ernal/controllers/lvmcluster/resource/vgmanager.go 68.23% <48.14%> (-11.77%) ⬇️
internal/csi/registrar.go 0.00% <0.00%> (ø)
cmd/vgmanager/vgmanager.go 0.00% <0.00%> (ø)

... and 6 files with indirect coverage changes

@jakobmoellerdev
Copy link
Contributor Author

/test all

1 similar comment
@jakobmoellerdev
Copy link
Contributor Author

/test all

@jakobmoellerdev
Copy link
Contributor Author

/test all

1 similar comment
@jakobmoellerdev
Copy link
Contributor Author

/test all

Signed-off-by: Jakob Möller <[email protected]>
@jakobmoellerdev
Copy link
Contributor Author

/test all

@jakobmoellerdev jakobmoellerdev changed the title fix: Create startup probe and add lvmd configuration check to vgmanager healthiness OCPEDGE-1165: fix: Create startup probe and add lvmd configuration check to vgmanager healthiness Jun 25, 2024
@jakobmoellerdev jakobmoellerdev marked this pull request as ready for review June 25, 2024 20:02
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 25, 2024

@jakobmoellerdev: This pull request references OCPEDGE-1165 which is a valid jira issue.

In response to this:

This stops vgmanager from going healthy too early as we need to signal healthiness only when the driver has been started.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 25, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 25, 2024
@jakobmoellerdev
Copy link
Contributor Author

/test e2e-aws
/test e2e-aws-single-node

1 similar comment
@jakobmoellerdev
Copy link
Contributor Author

/test e2e-aws
/test e2e-aws-single-node

@jakobmoellerdev
Copy link
Contributor Author

/hold
putting on hold until Ive confirmed the flakiness is gone and I added repeat tests that we dont want to have on main

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2024
@jakobmoellerdev jakobmoellerdev changed the title OCPEDGE-1165: fix: Create startup probe and add lvmd configuration check to vgmanager healthiness OCPEDGE-1165: fix: Ensure no racy CSI plugin registration as well as better startup behavior for vgmanager Jun 26, 2024
@jakobmoellerdev jakobmoellerdev changed the title OCPEDGE-1165: fix: Ensure no racy CSI plugin registration as well as better startup behavior for vgmanager OCPEDGE-1165: fix: Ensure no racy CSI plugin registration, better startup behavior for vgmanager, failing test summary artifact Jun 26, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jun 26, 2024

@jakobmoellerdev: This pull request references OCPEDGE-1165 which is a valid jira issue.

In response to this:

This stops vgmanager from going healthy too early as we need to signal healthiness only when the driver has been started. Fixes the racy registration server that now properly starts and shuts down and is part of the healthiness check we use in vgmanager. That will ensure that no vgmanager gets ready that isnt registered in the kubelet. Ensures that vgmanager pods are gone before LVMCluster is freed.

Adds an artifact into the e2e step on failure that can be used to see the immediate LVMS cluster state when a failure occurs in case of a cleanup in the post failure steps.

Validates in tests that all CSINodes have proper registrations for vgmanager so we dont have weird flakes in the future.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jakobmoellerdev
Copy link
Contributor Author

/hold

@jakobmoellerdev
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2024
Copy link
Contributor

@suleymanakbas91 suleymanakbas91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just small nits

cmd/vgmanager/vgmanager.go Outdated Show resolved Hide resolved
cmd/vgmanager/vgmanager.go Show resolved Hide resolved
Signed-off-by: Jakob Möller <[email protected]>
@suleymanakbas91
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2024
Copy link
Contributor

openshift-ci bot commented Jun 26, 2024

@jakobmoellerdev: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/lvm-operator-e2e-aws-sno f956eba link true /test lvm-operator-e2e-aws-sno
ci/prow/lvm-operator-e2e-aws f956eba link true /test lvm-operator-e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 748fb1b into openshift:main Jun 26, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants