-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge master into CR at 20240305 #13604
Conversation
Define the shared data for the interface between control plane and DAOS check engine, include the following: DAOS global inconsistency class, the action to repair inconsistency, DAOS check scan phases, instance status, pool status, and so on. Signed-off-by: Fan Yong <[email protected]>
Define the dRPC protocol that are used by control plane to control DAOS check engine for the following use cases: 1) Start check - DRPC_METHOD_MGMT_CHK_START 2) Stop check - DRPC_METHOD_MGMT_CHK_STOP 3) Query check progress - DRPC_METHOD_MGMT_CHK_QUERY 4) Get check parameters and property - DRPC_METHOD_MGMT_CHK_PROP 5) Execute the action to repair the specified inconsistency under interation mode - DRPC_METHOD_MGMT_CHK_ACT Signed-off-by: Fan Yong <[email protected]>
Define new dRPC upcall to control plane for the following use cases: DRPC_METHOD_CHK_LIST_POOL: obtain the known pools list from MS. DRPC_METHOD_CHK_REG_POOL: register the (orphan) pool to MS. DRPC_METHOD_CHK_DEREG_POOL: deregister the (dangling) pool from MS. DRPC_METHOD_CHK_REPORT: DAOS check engine reports to the control plane with the found inconsistency and the repair result. If the repair action is CIA_INTERACT, then notify the control plane to interact with the admin for the repair decision. Signed-off-by: Fan Yong <[email protected]>
The check module infrastructure and bootstrap sequence. New options for start engine with check mode: -C|--check: Start engine with check mode, global consistency check. Signed-off-by: Fan Yong <[email protected]>
Adds a new --checker flag to `dmg system start` which will start the ranks in a special checker mode. All ranks must first be stopped before starting in checker mode. Signed-off-by: Michael MacDonald <[email protected]>
Adds new dmg commands to manage/query CR: * dmg check start * dmg check stop * dmg check query * dmg check prop Signed-off-by: Michael MacDonald <[email protected]>
Ensure that deb/rpm packages know what to do with the file. Signed-off-by: Michael MacDonald <[email protected]>
Implement control plane handlers for the following engine checker dRPC upcalls: * CheckerListPools * CheckerRegisterPool * CheckerDeregisterPool Also fixes a slow test. Signed-off-by: Michael MacDonald <[email protected]>
Shouldn't have been merged into previous commit; causes problems when trying to integrate with other branches. Signed-off-by: Michael MacDonald <[email protected]>
Refactor the system package to remove the direct dependencies on external raft/grpc packages in order to avoid bringing them in for unrelated tools (e.g. daos, dmg, etc). Features: control Signed-off-by: Michael MacDonald <[email protected]>
Signed-off-by: Michael MacDonald <[email protected]>
Setting the system name in the RPC callback is racy. Signed-off-by: Michael MacDonald <[email protected]>
The DAOS Debug Tool (ddb) is a new executable that allows a user to navigate through a file in the VOS format. It is similar to debugfs for ext2/3/4 and offers both a command line and interactive shell mode. This commit is the introduction of the tool with just a couple commands supported. For more details about the tool see src/ddb/README.md Signed-off-by: Ryon Jensen <[email protected]>
Broken after master was merged in. Signed-off-by: Michael MacDonald <[email protected]>
* Move CheckReport to chk package - Separate report request from payload - Add Actions and Details lists to allow the checker to specify defined actions that could be taken in response to the inconsistency report. - Rename chk/check -> chk/chk to ensure unique namespaces. * Adjust srv/mgmt messages to use chk types directly Signed-off-by: Michael MacDonald <[email protected]>
Provide a central repository for storing the system checker progress. Add handler for NotifyReport upcall to store checker reports for display to the admin. Add new `dmg check repair` command to allow the admin to select a repair action for an inconsistency. Changes to chk.CheckReport message: * Actions -> ActChoices * Details -> ActDetails * Adds ActMsgs array for human-formatted action descriptions TODO: Add test coverage once this settles down. Signed-off-by: Michael MacDonald <[email protected]>
Add a new `dmg faults add-checker-report` command to allow manual injection of checker reports for prototyping and testing. The command and associated RPCs are not compiled into release builds. Signed-off-by: Michael MacDonald <[email protected]>
Small fix to always export the chk_pb variable so it can be used by server and client. Signed-off-by: Ryon Jensen <[email protected]>
* Introduces the dump, dump_ilog, dump_dtx, load, and rm commands for the ddb tool. * Reworked the construction of the unit test lists so that the test function name is printed in stead of a separate description. * Added some filtering to the unit tests. * Abstracted out the printing of the commands so that what is printed is more easily testable and to clean up the command functions a little. Signed-off-by: Ryon Jensen <[email protected]>
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/17/execution/node/1173/log |
Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/18/execution/node/1405/log |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/18/execution/node/1451/log |
ea25607
to
923dba0
Compare
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/20/execution/node/1150/log |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/21/execution/node/1173/log |
923dba0
to
ec21d5a
Compare
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/22/execution/node/288/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/22/execution/node/291/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/22/execution/node/350/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/22/execution/node/353/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/23/execution/node/319/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/23/execution/node/369/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/23/execution/node/330/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/23/execution/node/316/log |
Signed-off-by: Li Wei <[email protected]> Required-githooks: true
Signed-off-by: Li Wei <[email protected]> Required-githooks: true
For lower layer primary group initialization. Signed-off-by: Fan Yong <[email protected]>
Signed-off-by: Fan Yong <[email protected]>
ec21d5a
to
191e761
Compare
Bump version and add new changelog entry for CR. Signed-off-by: Fan Yong <[email protected]>
191e761
to
8b4d4c4
Compare
1. For test_lost_majority_ps_replicas, remove "rdb-pool" from two ranks that contain the pool service replica. 2. For test_dangling_rank_entry, the rebuild process after CR is not related with CR logic but may cause test timeout, drop it. 3. More log messages when update CR bookmark. 4. More class tags for ddb tests. Signed-off-by: Fan Yong <[email protected]>
8b4d4c4
to
48c6d6f
Compare
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13604/26/execution/node/1452/log |
rebuild_simple timeout because of DAOS-15290. |
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: