-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-12287 test: CR Pass 4 - Orphan container #13063
Conversation
Test steps: 1. Create a pool and a container. 2. Inject fault to cause orphan container. i.e., container is left in the system, but doesn't appear with daos commands. 3. Check that the container doesn't appear with daos command. 4. Stop servers. 5. Use ddb to verify that the container is left in shards. 6. Enable the checker. 7. Set policy to --all-interactive. 8. Start the checker and query the checker until the fault is detected. 9. Repair by selecting the destroy option. 10. Query the checker until the fault is repaired. 11. Disable the checker. 12. Run the ddb command and verify that the container is removed from shard. Create recovery_test_base.py to store methods used across the CR tests. Add daos faults command support in daos_utils_base.py and daos_utils.py Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/3/execution/node/750/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
cmd_out = run_pcmd(hosts=hosts, command=command) | ||
|
||
# return vos_file | ||
for file in cmd_out[0]["stdout"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to use run_remote than run_pcmd. Something like this...
cmd_out = run_pcmd(hosts=hosts, command=command) | |
# return vos_file | |
for file in cmd_out[0]["stdout"]: | |
from run_utils import run_remote | |
cmd_out = run_remote(self.log, hosts, command) | |
# return vos_file | |
for file in cmd_out.output[0].stdout: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks.
# Start server to prepare for the cleanup. | ||
dmg_command.system_start() | ||
|
||
report_errors(test=self, errors=errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pointing out that if system_start raises an exception and causes the test to fail, then we would miss the report_errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added except CommandFailure
to catch the system start exception.
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true
…y except Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/5/execution/node/147/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/5/execution/node/409/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/5/execution/node/410/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/5/execution/node/394/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/5/execution/node/311/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/6/execution/node/147/log |
Test stage Functional on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/6/execution/node/757/log |
Blocked due to https://daosio.atlassian.net/browse/DAOS-14473 (Engine crash during |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: test_orphan_container ddb_cmd Test-repeat: 4 Required-githooks: true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/7/execution/node/147/log |
Skip-unit-tests: true Skip-fault-injection-test: true Test-tag: pr ddb_cmd Required-githooks: true Signed-off-by: Makito Kano <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Python Bandit check completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/8/execution/node/169/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13063/8/execution/node/884/log |
The 8 CR test failures (CR20 to CR27) in daos_test/suite.py are not related to this PR. They are unexpected and Fan Yong is looking into them in DAOS-14513. |
Test steps:
Create recovery_test_base.py to store methods used across the CR tests.
Add daos faults command support in daos_utils_base.py and daos_utils.py
Skip-unit-tests: true
Skip-fault-injection-test: true
Test-tag: test_orphan_container ddb_cmd
Test-repeat: 4
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: