-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14467 chk: properly stop check scheduler #13181
Conversation
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/1/execution/node/147/log |
c61f30d
to
bc871f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/2/execution/node/148/log |
bc871f1
to
2e93b64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/3/execution/node/147/log |
2e93b64
to
550de9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/4/execution/node/387/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/4/execution/node/408/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/4/execution/node/393/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/4/execution/node/353/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/4/execution/node/349/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/4/execution/node/392/log |
550de9a
to
e989308
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/5/execution/node/1388/log |
9b69e2e
to
2b6b53f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/6/execution/node/1388/log |
2b6b53f
to
ae17b92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/7/execution/node/1363/log |
ae17b92
to
36bfcdb
Compare
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/12/execution/node/406/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/12/execution/node/396/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/12/execution/node/355/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/12/execution/node/321/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/12/execution/node/313/log |
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/13/execution/node/304/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/13/execution/node/266/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/13/execution/node/263/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/13/execution/node/259/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13181/13/execution/node/308/log |
When someone wants to stop current check instance, it needs to set ins->ci_sched_exiting to notify related instance scheduler to exit. Originally, we used "ci_sched_running" for such purpose. But it is confused to distinguish whether the scheduler has already exited or someone is stopping the instance. The others may misunderstand that related check scheduler has already exited, but the scheduler is in stopping process, as to subsequent checker restart will get failure. Some code cleanup. Signed-off-by: Fan Yong <[email protected]>
4d6a902
to
e945d45
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
@@ -9,7 +9,7 @@ timeout: 600 | |||
timeouts: | |||
test_daos_degraded_mode: 450 | |||
test_daos_management: 110 | |||
test_daos_cat_recovery: 1800 | |||
test_daos_cat_recovery: 3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pointing out this adds 20 minutes to the timeout, for a total of 50 minutes. And being a pr
test, this really slows down CI.
It's not a comment for just this test, but more so the general trend of having too many long-running pr
tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The long time for test is mainly because frequently system start/stop in every test case. So maybe remove it from pr tests? If necessary, it can be done in next patch, another two patches depends on this one.
In fact, the test is not always so slow, instead, randomly. Mainly because the slow process for "dmg system start/stop", that is out of current patch control.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please, otherwise it's going to be problematic.
@@ -9,7 +9,7 @@ timeout: 600 | |||
timeouts: | |||
test_daos_degraded_mode: 450 | |||
test_daos_management: 110 | |||
test_daos_cat_recovery: 1800 | |||
test_daos_cat_recovery: 3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, please, otherwise it's going to be problematic.
When someone wants to stop current check instance, it needs to set ins->ci_sched_exiting to notify related instance scheduler to exit.
Originally, we used "ci_sched_running" for such purpose. But it is confused to distinguish whether the scheduler has already exited or someone is stopping the instance. The others may misunderstand that related check scheduler has already exited, but the scheduler is in stopping process, as to subsequent checker restart will get failure.
Some code cleanup.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: