-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-11663 common: fault injection to simulate DAOS inconsistency #10933
Conversation
In includes the following: 1. DAOS_CHK_CONT_ORPHAN Take effect when destory the container, that will cause the container shards to be left on related targets then simulate the case of orphan container. 2. DAOS_CHK_CONT_BAD_LABEL Take effect when change the container label, that will cause the container label in the container service does not match the one in the container property. Signed-off-by: Fan Yong <[email protected]>
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
56b049a
to
a3b971a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-10933/2/testReport/(root)/ |
a3b971a
to
299b081
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/3/execution/node/357/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/3/execution/node/372/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/3/execution/node/316/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/3/execution/node/324/log |
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/3/execution/node/321/log |
299b081
to
9ff2d53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/4/execution/node/1028/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit in the spelling of "label". I am not very familiar with daos_fail_loc, but I'll look into it.
src/include/daos/common.h
Outdated
@@ -825,6 +825,10 @@ enum { | |||
#define DAOS_FORCE_EC_AGG_PEER_FAIL (DAOS_FAIL_UNIT_TEST_GROUP_LOC | 0x9a) | |||
#define DAOS_FAIL_TX_CONVERT (DAOS_FAIL_UNIT_TEST_GROUP_LOC | 0x9b) | |||
|
|||
#define DAOS_CHK_POOL_ORPHAN (DAOS_FAIL_UNIT_TEST_GROUP_LOC | 0xa0) | |||
#define DAOS_CHK_CONT_ORPHAN (DAOS_FAIL_UNIT_TEST_GROUP_LOC | 0xa1) | |||
#define DAOS_CHK_CONT_BAD_LABLE (DAOS_FAIL_UNIT_TEST_GROUP_LOC | 0xa2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Should be DAOS_CHK_CONT_BAD_LABEL
src/mgmt/srv_pool.c
Outdated
if (DAOS_FAIL_CHECK(DAOS_CHK_POOL_ORPHAN)) | ||
D_GOTO(out, rc = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? We already have dmg faults mgmt-svc pool $pool POOL_NONEXIST_ON_MS
which as far as I can tell does the same thing. It removes the pool entry from the MS so that the checker finds the orphaned pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that. I will remove it.
I think we also need the fault injection to simulate POOL_NONEXIST_ON_ENGINE, means to destroy the pool shards on targets, but with the pool entry kept on MS.
if (!DAOS_FAIL_CHECK(DAOS_CHK_CONT_ORPHAN)) | ||
rc = ds_cont_tgt_destroy(in->tdi_pool_uuid, in->tdi_uuid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is all we really need now, right? I think we can probably just use the daos_debug_set_params
utility for this. I could make dmg shell out to that utility because it's not very nice to use, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It am not sure whether it is convenient for @shimizukko to directly use the C API in his CR demo scripts or we have some utils named daos_debug_set_params?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could add this to the daos
tool, maybe. The dmg
tool does not have any connection to the C API, and we don't want to add a new dependency just for this faults use case. I'll think about adding a new daos faults
subcommand.
9ff2d53
to
ea2fc7e
Compare
ea2fc7e
to
85f257c
Compare
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/5/execution/node/174/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/6/execution/node/373/log |
Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/6/execution/node/368/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/6/execution/node/303/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/6/execution/node/300/log |
Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10933/6/execution/node/343/log |
85f257c
to
a76cb6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Add support for the following faults: * daos faults DAOS_CHK_CONT_ORPHAN Fail to remove container during destroy * DAOS_CHK_CONT_BAD_LABEL Fail to persist set-prop --properties label:$newval Signed-off-by: Michael MacDonald <[email protected]>
In includes the following:
DAOS_CHK_CONT_ORPHAN
Take effect when destory the container, that will cause the
container shards to be left on related targets then simulate
the case of orphan container.
DAOS_CHK_CONT_BAD_LABEL
Take effect when change the container label, that will cause
the container label in the container service does not match
the one in the container property.
Signed-off-by: Fan Yong [email protected]
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: