-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPVE-627: fix: concurrent apply / status checks -> LVMCluster #391
OCPVE-627: fix: concurrent apply / status checks -> LVMCluster #391
Conversation
@jakobmoellerdev: This pull request references OCPVE-627 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jakobmoellerdev: This pull request references OCPVE-627 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
||
var errs []error | ||
for i := 0; i < len(resources); i++ { | ||
if err := <-results; err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if something goes wrong and we will stuck there forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at the code path there should not be a way to get stuck here. This will always return exactly the amount of times that a resource is attempted to be created. This means that there will always be the correct amount of results, no matter the amount of failures in the goroutines.
return ctrl.Result{}, err | ||
resourceSyncStart := time.Now() | ||
results := make(chan error, len(resources)) | ||
create := func(i int) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using https://pkg.go.dev/golang.org/x/sync/errgroup ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errgroup cancel the context on the first error. We dont want the creation of other resources to abort in case of an error on one of the resources.
4aca845
to
8c5e2bc
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #391 +/- ##
===========================================
+ Coverage 16.59% 56.37% +39.78%
===========================================
Files 24 26 +2
Lines 2061 2290 +229
===========================================
+ Hits 342 1291 +949
+ Misses 1693 904 -789
- Partials 26 95 +69
|
/test ci-index-lvm-operator-bundle |
/lgtm |
@jakobmoellerdev: This pull request references OCPVE-627 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
03b69bb
to
45ba774
Compare
/test lvm-operator-e2e-aws |
45ba774
to
098203c
Compare
/test lvm-operator-e2e-aws |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jakobmoellerdev, suleymanakbas91 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test lvm-operator-e2e-aws |
@jakobmoellerdev: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This PR introduces concurrent processing for the ensureCreated operations for each resourceManager in the LVMClusterReconciler controller, which significantly improves the overall reconciliation time for the
LVMCluster
custom resource.The newly added
multierror.go
groups multiple errors into a single error, thereby helping to manage multiple errors generated from concurrent ensureCreated calls. This helps the concurrent apply since errors are not cancelled when one call fails, and multiple failures can occur at once.Moreover, readiness checks have been added after the creation of Deployment and DaemonSet resources in the
topolvm_controller.go
,topolvm_node.go
, andvgmanager.go
files respectively.These checks ensure that the deployment or
DaemonSet
has been completely initialized and is ready for use.It also adds
verifyDeploymentReadiness
andverifyDaemonSetReadiness
functions inutils.go
to verify the readiness of Deployment and DaemonSet resources respectively.Note: The checks currently do not get propagated to LVMCluster yet because the LVMCluster State is independant from the Readiness checks done during reconciliation. We will need an API Change to propagate this correctly (in a later change) as otherwise we would set LVMCluster to Degraded/Failed but couldnt show why (because the only status message fields in LVMCluster are for volume group failures, not any of the basic deployed components, and there is no vg present at the time of the readiness check)