-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-8082 mgmt: missing collect req put #6340
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-6340/2/execution/node/414/log |
src/engine/init.c
Outdated
@@ -783,6 +783,7 @@ server_init(int argc, char *argv[]) | |||
static void | |||
server_fini(bool force) | |||
{ | |||
crt_rank_abort_all(NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What problem is being addressed here? Isn't this abort call racing with all other xstreams, who may send new requests after the abort call returns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, some RPC completion callback might need those modules, for example IV, so it needs to abort all inflight RPC and execute all completion callback, then cleanup the module. Here is what I met in my local test, clearly this IV rpc callback happened after all modules have been cleanup.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/wangdi/daos_m_20190301/daos/install/bin/daos_engine -t 8 -x 0 -g daos_ser'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000436294 in ivc_on_put (ivns=, iv_value=0x13c1aa00, priv=0x232f5d80) at src/engine/server_iv.c:618
618 rc = entry->iv_class->iv_class_ops->ivc_ent_put(entry,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 hwloc-libs-1.11.8-4.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libatomic-4.8.5-44.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libibverbs-22.4-5.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-22.4-5.el7.x86_64 libtool-ltdl-2.4.6-29.01.el7.x86_64 libunwind-1.2-2.el7.x86_64 libuuid-2.23.2-65.el7_9.1.x86_64 libyaml-0.1.4-11.el7_0.x86_64 lz4-1.8.3-1.el7.x86_64 numactl-libs-2.0.12-5.el7.x86_64
(gdb) bt
#0 0x0000000000436294 in ivc_on_put (ivns=, iv_value=0x13c1aa00, priv=0x232f5d80) at src/engine/server_iv.c:618
#1 0x00007fb5e7e8aab1 in handle_ivupdate_response (cb_info=0x2b671e0) at src/cart/crt_iv.c:2541
#2 handle_response_internal (arg=0x2b671e0) at src/cart/crt_iv.c:2714
#3 0x00007fb5e7e23f24 in crt_rpc_complete (rpc_priv=rpc_priv@entry=0x35df600, rc=rc@entry=-1018) at src/cart/crt_context.c:351
#4 0x00007fb5e7e2cb3f in crt_ctx_epi_abort (rlink=, arg=) at src/cart/crt_context.c:413
#5 0x00007fb5e8111dc2 in d_hash_table_traverse (htable=htable@entry=0x2b6d9c8, cb=cb@entry=0x7fb5e7e2c920 <crt_ctx_epi_abort>, arg=arg@entry=0x2b673b4) at src/gurt/hash.c:900
#6 0x00007fb5e7e2e99b in crt_context_destroy (crt_ctx=0x2b6d940, force=force@entry=1) at src/cart/crt_context.c:499
#7 0x000000000043c7f6 in dss_srv_handler (arg=0x2a78ad0) at src/engine/srv.c:502
#8 0x00007fb5e70c5daa in ABTD_ythread_func_wrapper () from /home/wangdi/daos_m_20190301/daos/install/bin/../prereq/dev/argobots/lib/libabt.so.1
#9 0x00007fb5e70c5f51 in make_fcontext () from /home/wangdi/daos_m_20190301/daos/install/bin/../prereq/dev/argobots/lib/libabt.so.1
#10 0x0000000000000000 in ?? ()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this segfault happened, server_fini must be calling dss_srv_fini, which was waiting for the xstreams to terminate; server_fini must have not called ds_iv_fini or dss_module_fini yet. So, the class object should be in good state; the segfault might be due to "priv". Any idea how "priv" could point to freed memory? Due to ds_pool_stop -> ds_iv_ns_stop (I don't know)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, module_cleanup ->ds_pool_stop->ds_iv_ns_stop->iv_entry_free() will free all IV entries.
So it is better to abort all RPC before that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the primary xstream completes the abort call, the system xstream may begin new IV operations, no? (More generally, any xstream may send new RPCs, no?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also something to keep in mind; just because you issued abort to all rpcs, doesnt mean that completion callbacks will be executed before the call returns. This is a non-blocking call, so a completion can happen at much later time. In theory reference counts on various structs that we use should protect against segfaults like listed above; even if the structure is marked as free it should remain 'in use' as long as there is rpc that is still referencing those data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all IV is on stream 0, if we abort all RPC on xstream0, then there should no IV RPC & complete callback before module cleanup, as long as it does not yield.
@frostedcmos I saw it wait all rpc & complete callback in crt_ctx_epi_abort() ? No? TBH, I think abort should be synchronized, at least there should be a synchronized version abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, this version does have sync to wait for all completions; i was thinking of a lower level abort which is just fire-and-forget type of command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all IV is on stream 0, if we abort all RPC on xstream0, then there should no IV RPC & complete callback before module cleanup, as long as it does not yield.
server_fini is called on the primary xstream, not the system xstream.
sm_cleanup implementations are not prohibited from yielding. E.g., does ds_rebuild_leader_stop_all yield? In the future, it's hard to say if the rebuild, obj, and cont sm_cleanup might want to wait for certain resources to be released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ds_rebuild_leader_stop_all() will yield, but after it set sp_stopping, which will blocking all IV RPC.
Hmm, ideally cart_abort_rank actually should abort & stop further rpc send/receive on all context, then it should be safe to start cleanup process.
Let me remove this from the patch, and land those leak fix first.
just rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/cart/crt_iv.c
Outdated
if (rc == 0) | ||
IVNS_DECREF(ivns_internal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that does not look correct. crt_ivf_pending_request_add() increments ivns passed to it, as it keeps a reference to it in the list.
the corresponding decref is done in crt_ivf_pending_reqs_process(). If there is some scenario that causes ref counts to be missed there, that would be a proper place to add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the ivns to crt_ivf_rpc_issue() needs to be decref, since crt_ivf_pending_request_add() already increase the refcount, so it needs to decref here for rc == 0 case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep as discussed in the IM offline, you are correct. good catch. Please add a comment here though indicating that this is decref for corresponding addref done prior to crt_ivf_rpc_issue()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise okay.
src/cart/crt_iv.c
Outdated
if (rc == 0) /* drop the ref crt_hdlr_iv_fetch_aux() */ | ||
IVNS_DECREF(ivns_internal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frostedcmos, do you think the real bug is that crt_ivf_pending_request_add should not take an extra invs reference, because it is taking over cb_info, instead of copying it? It looks like to me we should not insert a decref call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer to add extra invs explicitly before adding to list or handing over to other functions(ULT), easier to follow, i.e. increasing ref of ivns_internal in request_add() seems good to me. IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont have a very strong opinion on either way it is done. i think current way is cleaner; am not sure why it would matter whether we store cb_info vs copy it for purposes of ref counting; copying simply would be less optimal, but fundamentally we are storing/saving cb_info->ivns_internal and in my view as such we should be taking ref count on it, which we do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 1138-1139, crt_ivf_pending_request_add is given cb_info, which includes a IV NS reference in cb_info->ifc_ivns_internal. It then puts cb_info to a list. Because crt_ivf_rpc_issue doesn't free cb_info afterwards, crt_ivf_pending_request_add is effectively taking over cb_info, instead of copying it. Hence, it simply doesn't need to take any extra IV NS reference.
In other words, the IV NS reference here is owned by cb_info, not crt_ivf_pending_request_add or crt_ivf_rpc_issue. The decref of ifc_ivns_internal (which "ivns_internal" aliases) shall always happen together with the D_FREE of the cb_info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this actually makes sense, let me update the patch.
src/include/daos_srv/daos_engine.h
Outdated
@@ -812,7 +812,7 @@ unsigned int dss_ctx_nr_get(void); | |||
|
|||
/* Cache for container root */ | |||
struct tree_cache_root { | |||
struct btr_root btr_root; | |||
struct btr_root *btr_root; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks useless to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this is used to free btr_root during tree destroy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dbtree_create_inplace() will use the root directly, instead of allocate a new one. so you have to allocate the root for dram tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the 'btr_root' in this data structure is useless.
I merged all leak fixes into this patch, please review. blocker for 2.0 EC soak. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/object/srv_obj_migrate.c
Outdated
|
||
*rootp = val_iov.iov_buf; | ||
D_ASSERT(*rootp != NULL); | ||
out: | ||
if (rc < 0 && daos_handle_is_valid(root.root_hdl)) | ||
dbtree_destroy(root.root_hdl, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the inserted entry on error, instead of destroy tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
src/object/cli_obj.c
Outdated
|
||
if (rc == 0) { | ||
*shard_ptr = obj_shard; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one looks incorrect to me, as now you only take one addref for both case that 1) first-time shard open, and 2) re-usable case when obj shard already opened.
This change will not cause failure, but will cause the obj_shard open cache actually cannot usable (cannot reuse the opened obj_shard), this may affect performance a little.
The original intent why take 2 times addref (one in dc_obj_shard_open and another obj_shard_addref below) is to make the opened obj_shard be reusable until obj_close().
below L187's addref will be released in obj_shard_open() corresponding obj_shard_close;
and first-time's addref in above dc_obj_shard_open, will be released in obj_close -> obj_layout_free(), then the obj_shard can be reused between obj open/close.
why you want to change it, did you observe mem leak for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, yes, the layout is not released in some cases. But I probably can revert this change for the moment. Actually we probably should not reset do_obj to NULL in obj_shard_decref(), so to reuse the cache, instead of messing with the refcount and open count.
src/object/cli_obj.c
Outdated
|
||
if (rc == 0) { | ||
*shard_ptr = obj_shard; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor, need not assign the pointer back for fail case.
1. missing ivns reference put in crt_ivf_rpc_issue(). 2. free btree root allocated in tree_cache_create_internal(). 3. pool_target_addr_list is not freed in ds_pool_target_update_state(). 4. free layout of reclaim in rebuild_obj_scan_cb(). 5. missing ranks and tgt_uuids free in ds_mgmt_tgt_pool_create_ranks(). 6. Missing collective req put for a few mgmt handler. 7. missing obj_decref in dc_obj_layout_get(). 8. missing sc_snapshots free in cont_child_free_ref(). 9. Missing free upds allocated in swim_updates_prepare(). 10. Add efi_stripe_sgls_nr to free efi_stripe_sgls in all cases. 11. Add do_shard_idx to dc_obj_shard, since do_shard may not be the offset of shard in cob_shards during reintegration. 12. dkeys and aksys free are missing in cursor. Signed-off-by: Di Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Unit Test completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-6340/9/execution/node/738/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with the changes in src/object.c, but other changes look good to me.
Missing collective req put for a few mgmt handler.
Signed-off-by: Di Wang [email protected]