-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-6159 test: Run fault injection tests in parallel and prepare for IL testing. #3902
Conversation
This changes rather than simply adds testing so isn't ready for merge, however it does find a large number of issues, including some assertion failures, double-frees and deadlocks. Resolve the worst of the issues. Skip-func-hw-test: true Quick-build: true Skip-checkpatch: true Signed-off-by: Ashley Pittman <[email protected]>
Test stage NLT completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/1/execution/node/526/log |
Skip-checkpath: true Quick-build: true Skip-func-hw-test: true Signed-off-by: Ashley Pittman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-3902/2/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
FYI: Errors found in lines not modified in the patch:
src/tests/ftest/cart/util/cart_logtest.py:480:
(pylint-line-too-long) Line too long (82/80)
src/tests/ftest/cart/util/cart_logtest.py:485:
(pylint-line-too-long) Line too long (96/80)
utils/node_local_test.py:421:
(lint) Remove this block when daos_server shutdown works.
utils/node_local_test.py:469:
(lint) Enable memleak checking when server shutdown works.
utils/node_local_test.py:1207:
(lint) This doesn't work with two pools, partly related to
utils/node_local_test.py:1252:
(lint) change this to something else, md5sum uses fread which isn't
utils/node_local_test.py:1465:
(pylint-line-too-long) Line too long (81/80)
utils/node_local_test.py:493:
(pylint-protected-access) Access to a protected member _daos of a client class
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/2/execution/node/57/log |
Signed-off-by: Ashley Pittman <[email protected]>
Skip-func-hw-test: true Signed-off-by: Ashley Pittman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
FYI: Errors found in lines not modified in the patch:
src/tests/ftest/cart/util/cart_logtest.py:480:
(pylint-line-too-long) Line too long (82/80)
src/tests/ftest/cart/util/cart_logtest.py:485:
(pylint-line-too-long) Line too long (96/80)
utils/node_local_test.py:421:
(lint) Remove this block when daos_server shutdown works.
utils/node_local_test.py:469:
(lint) Enable memleak checking when server shutdown works.
utils/node_local_test.py:1223:
(lint) This doesn't work with two pools, partly related to
utils/node_local_test.py:1267:
(lint) change this to something else, md5sum uses fread which isn't
utils/node_local_test.py:1480:
(pylint-line-too-long) Line too long (81/80)
utils/node_local_test.py:500:
(pylint-protected-access) Access to a protected member _daos of a client class
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/3/execution/node/302/log |
Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/3/execution/node/238/log |
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/3/execution/node/266/log |
Skip-func-hw-test: true Signed-off-by: Ashley Pittman <[email protected]>
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/57/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/255/log |
Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/266/log |
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/275/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-3902/5/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
FYI: Errors found in lines not modified in the patch:
src/tests/ftest/cart/util/cart_logtest.py:480:
(pylint-line-too-long) Line too long (82/80)
src/tests/ftest/cart/util/cart_logtest.py:485:
(pylint-line-too-long) Line too long (96/80)
utils/node_local_test.py:421:
(lint) Remove this block when daos_server shutdown works.
utils/node_local_test.py:469:
(lint) Enable memleak checking when server shutdown works.
utils/node_local_test.py:1223:
(lint) This doesn't work with two pools, partly related to
utils/node_local_test.py:1267:
(lint) change this to something else, md5sum uses fread which isn't
utils/node_local_test.py:1482:
(pylint-line-too-long) Line too long (81/80)
utils/node_local_test.py:500:
(pylint-protected-access) Access to a protected member _daos of a client class
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/57/log |
Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/244/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/309/log |
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/260/log |
Test stage Build on CentOS 7 debug completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/328/log |
Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/332/log |
Test stage Build on CentOS 7 release completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/342/log |
Test stage Build on Ubuntu 20.04 with Clang completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/263/log |
Test stage Build on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/293/log |
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/312/log |
b31ae64
to
1d215a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
FYI: Errors found in lines not modified in the patch:
utils/node_local_test.py:1:
(pylint-missing-docstring) Missing module docstring
utils/node_local_test.py:3:
(pylint-pointless-string-statement) String statement has no effect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-3902/40/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/40/execution/node/67/log |
Signed-off-by: Ashley Pittman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Signed-off-by: Ashley Pittman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
I've updated the title and first comment of this PR and it should now be ready for review. This PR improves run-time and allows us to enable more testing, however if that testing is enabled them multiple errors are encountered which causes failures so enabling this new test and fixing these errors is being handled separately. |
|
||
fatal_errors = False | ||
|
||
while not finished or active: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally think this is easier to read if you add parenthesis around "not finished" so one doesn't need to remember order of operations rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think pylint would warn if I tried that, but I'll address if I need to revise this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a few questions, but otherwise LGTM.
Update to the fault injection testing to run multiple instances in
parallel, up to the number of cores per node. This dramatically
increases the speed of testing, and cuts down time-to-result.
Add the ability to run a read through the interception library under
fault injection, but do not enable this yet as it still causes a number
of issues.
Resolve the worst/easiest of the issues with the interception library
and the array object code.
Signed-off-by: Ashley Pittman [email protected]