Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-6159 test: Run fault injection tests in parallel and prepare for IL testing. #3902

Merged
merged 49 commits into from
Feb 2, 2021

Conversation

ashleypittman
Copy link
Contributor

@ashleypittman ashleypittman commented Nov 18, 2020

Update to the fault injection testing to run multiple instances in
parallel, up to the number of cores per node. This dramatically
increases the speed of testing, and cuts down time-to-result.

Add the ability to run a read through the interception library under
fault injection, but do not enable this yet as it still causes a number
of issues.

Resolve the worst/easiest of the issues with the interception library
and the array object code.

Signed-off-by: Ashley Pittman [email protected]

This changes rather than simply adds testing so isn't ready
for merge, however it does find a large number of issues,
including some assertion failures, double-frees and deadlocks.

Resolve the worst of the issues.

Skip-func-hw-test: true
Quick-build: true
Skip-checkpatch: true

Signed-off-by: Ashley Pittman <[email protected]>
@daosbuild1
Copy link
Collaborator

Skip-checkpath: true
Quick-build: true
Skip-func-hw-test: true

Signed-off-by: Ashley Pittman <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-3902/2/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules

FYI: Errors found in lines not modified in the patch:

src/tests/ftest/cart/util/cart_logtest.py:480:
(pylint-line-too-long) Line too long (82/80)

src/tests/ftest/cart/util/cart_logtest.py:485:
(pylint-line-too-long) Line too long (96/80)

utils/node_local_test.py:421:
(lint) Remove this block when daos_server shutdown works.

utils/node_local_test.py:469:
(lint) Enable memleak checking when server shutdown works.

utils/node_local_test.py:1207:
(lint) This doesn't work with two pools, partly related to

utils/node_local_test.py:1252:
(lint) change this to something else, md5sum uses fread which isn't

utils/node_local_test.py:1465:
(pylint-line-too-long) Line too long (81/80)

utils/node_local_test.py:493:
(pylint-protected-access) Access to a protected member _daos of a client class

utils/node_local_test.py Outdated Show resolved Hide resolved
utils/node_local_test.py Show resolved Hide resolved
@daosbuild1
Copy link
Collaborator

Signed-off-by: Ashley Pittman <[email protected]>
Skip-func-hw-test: true

Signed-off-by: Ashley Pittman <[email protected]>
@daosbuild1 daosbuild1 dismissed their stale review November 20, 2020 11:12

Updated patch

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

FYI: Errors found in lines not modified in the patch:

src/tests/ftest/cart/util/cart_logtest.py:480:
(pylint-line-too-long) Line too long (82/80)

src/tests/ftest/cart/util/cart_logtest.py:485:
(pylint-line-too-long) Line too long (96/80)

utils/node_local_test.py:421:
(lint) Remove this block when daos_server shutdown works.

utils/node_local_test.py:469:
(lint) Enable memleak checking when server shutdown works.

utils/node_local_test.py:1223:
(lint) This doesn't work with two pools, partly related to

utils/node_local_test.py:1267:
(lint) change this to something else, md5sum uses fread which isn't

utils/node_local_test.py:1480:
(pylint-line-too-long) Line too long (81/80)

utils/node_local_test.py:500:
(pylint-protected-access) Access to a protected member _daos of a client class

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/3/execution/node/302/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/3/execution/node/238/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/3/execution/node/266/log

Skip-func-hw-test: true

Signed-off-by: Ashley Pittman <[email protected]>
@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/255/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/266/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/4/execution/node/275/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-3902/5/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules

FYI: Errors found in lines not modified in the patch:

src/tests/ftest/cart/util/cart_logtest.py:480:
(pylint-line-too-long) Line too long (82/80)

src/tests/ftest/cart/util/cart_logtest.py:485:
(pylint-line-too-long) Line too long (96/80)

utils/node_local_test.py:421:
(lint) Remove this block when daos_server shutdown works.

utils/node_local_test.py:469:
(lint) Enable memleak checking when server shutdown works.

utils/node_local_test.py:1223:
(lint) This doesn't work with two pools, partly related to

utils/node_local_test.py:1267:
(lint) change this to something else, md5sum uses fread which isn't

utils/node_local_test.py:1482:
(pylint-line-too-long) Line too long (81/80)

utils/node_local_test.py:500:
(pylint-protected-access) Access to a protected member _daos of a client class

src/client/array/dc_array.c Outdated Show resolved Hide resolved
src/client/dfuse/il/int_posix.c Outdated Show resolved Hide resolved
@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/244/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/309/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/260/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 debug completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/328/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/332/log

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 release completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/342/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Ubuntu 20.04 with Clang completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/263/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/293/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-3902/5/execution/node/312/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

FYI: Errors found in lines not modified in the patch:

utils/node_local_test.py:1:
(pylint-missing-docstring) Missing module docstring

utils/node_local_test.py:3:
(pylint-pointless-string-statement) String statement has no effect

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils/node_local_test.py Outdated Show resolved Hide resolved
@daosbuild1
Copy link
Collaborator

@daosbuild1 daosbuild1 dismissed their stale review January 28, 2021 17:29

Updated patch

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

jolivier23
jolivier23 previously approved these changes Jan 29, 2021
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@ashleypittman ashleypittman changed the title DAOS-6159 test: Add IL to fault injection testing DAOS-6159 test: Run fault injection tests in parallel and prepare for IL testing. Feb 1, 2021
@ashleypittman
Copy link
Contributor Author

I've updated the title and first comment of this PR and it should now be ready for review. This PR improves run-time and allows us to enable more testing, however if that testing is enabled them multiple errors are encountered which causes failures so enabling this new test and fixing these errors is being handled separately.


fatal_errors = False

while not finished or active:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think this is easier to read if you add parenthesis around "not finished" so one doesn't need to remember order of operations rules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pylint would warn if I tried that, but I'll address if I need to revise this.

@ashleypittman ashleypittman requested a review from mjmac February 2, 2021 15:41
Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a few questions, but otherwise LGTM.

src/client/array/dc_array.c Show resolved Hide resolved
src/client/dfuse/il/int_posix.c Show resolved Hide resolved
utils/node_local_test.py Show resolved Hide resolved
utils/node_local_test.py Show resolved Hide resolved
@ashleypittman ashleypittman requested a review from a team February 2, 2021 17:20
@ashleypittman ashleypittman merged commit ff8acc8 into master Feb 2, 2021
@ashleypittman ashleypittman deleted the nlt-fi-il branch February 2, 2021 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants