-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-15914: crt_reply_send_input_free() #14817
Conversation
- New crt_reply_send_input_free() API added which releases input buffer right after HG_Respond() instead of waiting until the handle is destroyed. - srv_obj.c calls changed to use new crt_reply_send_input_free() Required-githooks: true Signed-off-by: Alexander A Oganezov <[email protected]>
Errors are component not formatted correctly,Ticket number suffix is not a number. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14817/1/testReport/ |
- I/O context takes refcount on RPC - only release input buffer for target update Signed-off-by: Liang Zhen <[email protected]>
Signed-off-by: Liang Zhen <[email protected]>
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14817/2/testReport/ |
@@ -1516,6 +1516,16 @@ crt_hg_reply_send(struct crt_rpc_priv *rpc_priv) | |||
rc = crt_hgret_2_der(hg_ret); | |||
} | |||
|
|||
/* Release input buffer */ | |||
if (rpc_priv->crp_release_input_early && !rpc_priv->crp_forward) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if we release RPC input buffer only with checking "if (!rpc_priv->crp_forward)"? that will allow us to release input buffer for most of RPCs (except corpc children) without changing the callers' logic (such as the RPC handlers for object/dtx in this patch)? Although the observed timeout RPCs in DAOS-15914 logs were for object update and DTX commit, it does not means that the trouble RPCs holding multiple recv buffer were all for object update and DTX commit, instead, they may be for others, we do not know. So if we can release the input buffer for more RPCs, that may be more helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nasf-Fan we did that in the initial patch (https://github.com/daos-stack/daos/pull/14793/files) and ended up with random memory corruptions, suggesting that something in daos keeps using input buffer still during crt_hg_reply_send().
See Saurabh updates with this error in DAOS-15914:
aurora-daos-0318 ERROR 2024/07/25 00:38:05 daos_engine:0 07/25-00:38:05.15 aurora-daos-0318 DAOS[61709/14/30252] common EMRG src/common/mem.c:1383 set_offsets() Assertion '!OID_IS_NULL(root_oid)' failed: You must call pmemobj_root before umem_class_init daos_engine: src/common/mem.c:1383: set_offsets: Assertion
0' failed.`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am not sure which RPC handler (since there are too many RPC handlers for kinds of modules) may access the input buffer after early release, but I think that releasing the input buffer only for tgt_update and dtx_commit RPCs seems not enough. I would suggest to do that for all OBJ/DTX sponsored RPCs. If still hit above memory corruption, then it is not difficult to fix related RPC handler (in such two modules).
There was one failure in https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14817/2/artifact/Functional%20Hardware%20Medium/dfuse/daos_build.py/job.log with
According to https://daosio.atlassian.net/browse/DAOS-16215 this was fixed in release/2.6 by b6fc83e. |
Required-githooks: true
…os-stack/daos into aaoganez/DAOS-15914-srv-obj-reply-2.6
Master version of this patch: #15314 |
New crt_reply_send_input_free() API added which releases input buffer right after HG_Respond() instead of waiting until the handle is destroyed.
srv_obj.c calls changed to use new crt_reply_send_input_free()
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: