Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-8082 mgmt: missing collect req put #6340

Merged
merged 3 commits into from
Aug 6, 2021
Merged

DAOS-8082 mgmt: missing collect req put #6340

merged 3 commits into from
Aug 6, 2021

Conversation

wangdi1
Copy link
Contributor

@wangdi1 wangdi1 commented Aug 1, 2021

Missing collective req put for a few mgmt handler.

Signed-off-by: Di Wang [email protected]

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

liw
liw previously approved these changes Aug 2, 2021
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Build on CentOS 7 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-6340/2/execution/node/414/log

@@ -783,6 +783,7 @@ server_init(int argc, char *argv[])
static void
server_fini(bool force)
{
crt_rank_abort_all(NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What problem is being addressed here? Isn't this abort call racing with all other xstreams, who may send new requests after the abort call returns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, some RPC completion callback might need those modules, for example IV, so it needs to abort all inflight RPC and execute all completion callback, then cleanup the module. Here is what I met in my local test, clearly this IV rpc callback happened after all modules have been cleanup.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/home/wangdi/daos_m_20190301/daos/install/bin/daos_engine -t 8 -x 0 -g daos_ser'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000436294 in ivc_on_put (ivns=, iv_value=0x13c1aa00, priv=0x232f5d80) at src/engine/server_iv.c:618
618 rc = entry->iv_class->iv_class_ops->ivc_ent_put(entry,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-323.el7_9.x86_64 hwloc-libs-1.11.8-4.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libatomic-4.8.5-44.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libibverbs-22.4-5.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-22.4-5.el7.x86_64 libtool-ltdl-2.4.6-29.01.el7.x86_64 libunwind-1.2-2.el7.x86_64 libuuid-2.23.2-65.el7_9.1.x86_64 libyaml-0.1.4-11.el7_0.x86_64 lz4-1.8.3-1.el7.x86_64 numactl-libs-2.0.12-5.el7.x86_64
(gdb) bt
#0 0x0000000000436294 in ivc_on_put (ivns=, iv_value=0x13c1aa00, priv=0x232f5d80) at src/engine/server_iv.c:618
#1 0x00007fb5e7e8aab1 in handle_ivupdate_response (cb_info=0x2b671e0) at src/cart/crt_iv.c:2541
#2 handle_response_internal (arg=0x2b671e0) at src/cart/crt_iv.c:2714
#3 0x00007fb5e7e23f24 in crt_rpc_complete (rpc_priv=rpc_priv@entry=0x35df600, rc=rc@entry=-1018) at src/cart/crt_context.c:351
#4 0x00007fb5e7e2cb3f in crt_ctx_epi_abort (rlink=, arg=) at src/cart/crt_context.c:413
#5 0x00007fb5e8111dc2 in d_hash_table_traverse (htable=htable@entry=0x2b6d9c8, cb=cb@entry=0x7fb5e7e2c920 <crt_ctx_epi_abort>, arg=arg@entry=0x2b673b4) at src/gurt/hash.c:900
#6 0x00007fb5e7e2e99b in crt_context_destroy (crt_ctx=0x2b6d940, force=force@entry=1) at src/cart/crt_context.c:499
#7 0x000000000043c7f6 in dss_srv_handler (arg=0x2a78ad0) at src/engine/srv.c:502
#8 0x00007fb5e70c5daa in ABTD_ythread_func_wrapper () from /home/wangdi/daos_m_20190301/daos/install/bin/../prereq/dev/argobots/lib/libabt.so.1
#9 0x00007fb5e70c5f51 in make_fcontext () from /home/wangdi/daos_m_20190301/daos/install/bin/../prereq/dev/argobots/lib/libabt.so.1
#10 0x0000000000000000 in ?? ()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this segfault happened, server_fini must be calling dss_srv_fini, which was waiting for the xstreams to terminate; server_fini must have not called ds_iv_fini or dss_module_fini yet. So, the class object should be in good state; the segfault might be due to "priv". Any idea how "priv" could point to freed memory? Due to ds_pool_stop -> ds_iv_ns_stop (I don't know)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, module_cleanup ->ds_pool_stop->ds_iv_ns_stop->iv_entry_free() will free all IV entries.

So it is better to abort all RPC before that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the primary xstream completes the abort call, the system xstream may begin new IV operations, no? (More generally, any xstream may send new RPCs, no?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also something to keep in mind; just because you issued abort to all rpcs, doesnt mean that completion callbacks will be executed before the call returns. This is a non-blocking call, so a completion can happen at much later time. In theory reference counts on various structs that we use should protect against segfaults like listed above; even if the structure is marked as free it should remain 'in use' as long as there is rpc that is still referencing those data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all IV is on stream 0, if we abort all RPC on xstream0, then there should no IV RPC & complete callback before module cleanup, as long as it does not yield.

@frostedcmos I saw it wait all rpc & complete callback in crt_ctx_epi_abort() ? No? TBH, I think abort should be synchronized, at least there should be a synchronized version abort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, this version does have sync to wait for all completions; i was thinking of a lower level abort which is just fire-and-forget type of command

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all IV is on stream 0, if we abort all RPC on xstream0, then there should no IV RPC & complete callback before module cleanup, as long as it does not yield.

server_fini is called on the primary xstream, not the system xstream.

sm_cleanup implementations are not prohibited from yielding. E.g., does ds_rebuild_leader_stop_all yield? In the future, it's hard to say if the rebuild, obj, and cont sm_cleanup might want to wait for certain resources to be released.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ds_rebuild_leader_stop_all() will yield, but after it set sp_stopping, which will blocking all IV RPC.

Hmm, ideally cart_abort_rank actually should abort & stop further rpc send/receive on all context, then it should be safe to start cleanup process.

Let me remove this from the patch, and land those leak fix first.

@wangdi1 wangdi1 requested a review from liw August 2, 2021 17:20
@wangdi1
Copy link
Contributor Author

wangdi1 commented Aug 2, 2021

just rebase.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@wangdi1 wangdi1 requested a review from NiuYawei August 3, 2021 04:39
Comment on lines 1144 to 1145
if (rc == 0)
IVNS_DECREF(ivns_internal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does not look correct. crt_ivf_pending_request_add() increments ivns passed to it, as it keeps a reference to it in the list.

the corresponding decref is done in crt_ivf_pending_reqs_process(). If there is some scenario that causes ref counts to be missed there, that would be a proper place to add it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the ivns to crt_ivf_rpc_issue() needs to be decref, since crt_ivf_pending_request_add() already increase the refcount, so it needs to decref here for rc == 0 case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep as discussed in the IM offline, you are correct. good catch. Please add a comment here though indicating that this is decref for corresponding addref done prior to crt_ivf_rpc_issue()

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Contributor

@liw liw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise okay.

Comment on lines 1144 to 1145
if (rc == 0) /* drop the ref crt_hdlr_iv_fetch_aux() */
IVNS_DECREF(ivns_internal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frostedcmos, do you think the real bug is that crt_ivf_pending_request_add should not take an extra invs reference, because it is taking over cb_info, instead of copying it? It looks like to me we should not insert a decref call here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer to add extra invs explicitly before adding to list or handing over to other functions(ULT), easier to follow, i.e. increasing ref of ivns_internal in request_add() seems good to me. IMHO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont have a very strong opinion on either way it is done. i think current way is cleaner; am not sure why it would matter whether we store cb_info vs copy it for purposes of ref counting; copying simply would be less optimal, but fundamentally we are storing/saving cb_info->ivns_internal and in my view as such we should be taking ref count on it, which we do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 1138-1139, crt_ivf_pending_request_add is given cb_info, which includes a IV NS reference in cb_info->ifc_ivns_internal. It then puts cb_info to a list. Because crt_ivf_rpc_issue doesn't free cb_info afterwards, crt_ivf_pending_request_add is effectively taking over cb_info, instead of copying it. Hence, it simply doesn't need to take any extra IV NS reference.

In other words, the IV NS reference here is owned by cb_info, not crt_ivf_pending_request_add or crt_ivf_rpc_issue. The decref of ifc_ivns_internal (which "ivns_internal" aliases) shall always happen together with the D_FREE of the cb_info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this actually makes sense, let me update the patch.

@wangdi1 wangdi1 requested review from liuxuezhao and liw August 4, 2021 16:03
@@ -812,7 +812,7 @@ unsigned int dss_ctx_nr_get(void);

/* Cache for container root */
struct tree_cache_root {
struct btr_root btr_root;
struct btr_root *btr_root;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks useless to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is used to free btr_root during tree destroy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbtree_create_inplace() will use the root directly, instead of allocate a new one. so you have to allocate the root for dram tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the 'btr_root' in this data structure is useless.

@wangdi1
Copy link
Contributor Author

wangdi1 commented Aug 5, 2021

I merged all leak fixes into this patch, please review. blocker for 2.0 EC soak. Thanks.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

NiuYawei
NiuYawei previously approved these changes Aug 5, 2021

*rootp = val_iov.iov_buf;
D_ASSERT(*rootp != NULL);
out:
if (rc < 0 && daos_handle_is_valid(root.root_hdl))
dbtree_destroy(root.root_hdl, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the inserted entry on error, instead of destroy tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

NiuYawei
NiuYawei previously approved these changes Aug 5, 2021

if (rc == 0) {
*shard_ptr = obj_shard;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one looks incorrect to me, as now you only take one addref for both case that 1) first-time shard open, and 2) re-usable case when obj shard already opened.
This change will not cause failure, but will cause the obj_shard open cache actually cannot usable (cannot reuse the opened obj_shard), this may affect performance a little.

The original intent why take 2 times addref (one in dc_obj_shard_open and another obj_shard_addref below) is to make the opened obj_shard be reusable until obj_close().
below L187's addref will be released in obj_shard_open() corresponding obj_shard_close;
and first-time's addref in above dc_obj_shard_open, will be released in obj_close -> obj_layout_free(), then the obj_shard can be reused between obj open/close.

why you want to change it, did you observe mem leak for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, yes, the layout is not released in some cases. But I probably can revert this change for the moment. Actually we probably should not reset do_obj to NULL in obj_shard_decref(), so to reuse the cache, instead of messing with the refcount and open count.


if (rc == 0) {
*shard_ptr = obj_shard;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor, need not assign the pointer back for fail case.

1. missing ivns reference put in crt_ivf_rpc_issue().

2. free btree root allocated in tree_cache_create_internal().

3. pool_target_addr_list is not freed in ds_pool_target_update_state().

4. free layout of reclaim in rebuild_obj_scan_cb().

5. missing ranks and tgt_uuids free in ds_mgmt_tgt_pool_create_ranks().

6. Missing collective req put for a few mgmt handler.

7. missing obj_decref in dc_obj_layout_get().

8. missing sc_snapshots free in cont_child_free_ref().

9. Missing free upds allocated in swim_updates_prepare().

10. Add efi_stripe_sgls_nr to free efi_stripe_sgls in all cases.

11. Add do_shard_idx to dc_obj_shard, since do_shard
may not be the offset of shard in cob_shards during
reintegration.

12. dkeys and aksys free are missing in cursor.

Signed-off-by: Di Wang <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@wangdi1 wangdi1 requested a review from NiuYawei August 5, 2021 16:23
@daosbuild1
Copy link
Collaborator

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Contributor

@liw liw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with the changes in src/object.c, but other changes look good to me.

@jolivier23 jolivier23 merged commit 27e8a49 into master Aug 6, 2021
@jolivier23 jolivier23 deleted the daos_mgmt_leak branch August 6, 2021 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants