Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-9595 chk: consolidate pool membership #9865

Merged
merged 1 commit into from
Aug 12, 2022

Conversation

Nasf-Fan
Copy link
Contributor

@Nasf-Fan Nasf-Fan commented Aug 1, 2022

When DAOS check start, all involved check engines will report their
known pools' information, including the pool service replicas, pool
label and related storage allocation, to the check leader via reply.

After the pool list consolidation in the pass_1, for each pool, the
check leader will send related pool information to its pool service
leaders via new RPC - CHK_POOL_MBS.

On the check engine side, the pool service leader compares the pool
map with these information pushed from the check leader and handles
the following cases:

  1. An target has some allocated storage but does not appear in the
    pool map. Under such case, the associated space will be deleted
    from the engine by default.

  2. An target has some allocated storage and is marked as "DOWN" or
    "DOWNOUT" in the pool map. For this case, the administrator can
    decide to either remove or leave it there.

  3. An target is referenced in the pool map ("NEW", "UP", "UPIN" or
    "DRAIN"), but no storage is actually allocated on this engine.
    Under such case, the entry for the target in the pool map will
    be marked as "DOWN" (for the "UP", "UPIN" or "DRAIN" entry) or
    "DOWNOUT" (for the "NEW" entry).

Temporarily skip code format check against src/chk/chk_internal.h
and src/mgmt/rpc.h to avoid fake warning messages.

Signed-off-by: Fan Yong [email protected]

@Nasf-Fan Nasf-Fan requested a review from a team as a code owner August 1, 2022 03:06
@github-actions
Copy link

github-actions bot commented Aug 1, 2022

Bug-tracker data:
Ticket title is 'pass2: scan & report allocated storage'
Status is 'In Review'
Labels: '531nth,triaged'
https://daosio.atlassian.net/browse/DAOS-9595

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 009ab53 to 7f0aa42 Compare August 1, 2022 03:33
@Nasf-Fan Nasf-Fan requested a review from a team as a code owner August 1, 2022 03:33
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 7f0aa42 to 40b3869 Compare August 1, 2022 03:47
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 40b3869 to f9b9b36 Compare August 1, 2022 04:01
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from f9b9b36 to d5fca4d Compare August 1, 2022 04:11
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from d5fca4d to 3594b5d Compare August 1, 2022 04:23
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan removed request for a team August 1, 2022 04:28
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 3594b5d to b00ef00 Compare August 1, 2022 04:36
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

When DAOS check start, all involved check engines will report their
known pools' information, including the pool service replicas, pool
label and related storage allocation, to the check leader via reply.

After the pool list consolidation in the pass_1, for each pool, the
check leader will send related pool information to its pool service
leaders via new RPC - CHK_POOL_MBS.

On the check engine side, the pool service leader compares the pool
map with these information pushed from the check leader and handles
the following cases:

1. An target has some allocated storage but does not appear in the
   pool map. Under such case, the associated space will be deleted
   from the engine by default.

2. An target has some allocated storage and is marked as "DOWN" or
   "DOWNOUT" in the pool map. For this case, the administrator can
   decide to either remove or leave it there.

3. An target is referenced in the pool map ("NEW", "UP", "UPIN" or
   "DRAIN"), but no storage is actually allocated on this engine.
   Under such case, the entry for the target in the pool map will
   be marked as "DOWN" (for the "UP", "UPIN" or "DRAIN" entry) or
   "DOWNOUT" (for the "NEW" entry).

Temporarily skip code format check against src/chk/chk_internal.h
and src/mgmt/rpc.h to avoid fake warning messages.

Signed-off-by: Fan Yong <[email protected]>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 93b2031 to c906da6 Compare August 3, 2022 15:52
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

}

int
ds_pool_svc_flush_map(struct ds_pool_svc *ds_svc, struct pool_map *map, uint32_t version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Is it intentional that we do not schedule rebuild jobs when updating the pool map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pool map update is driven by the DAOS check instead of regular rebuild. The logic is something like that:
For each target that reported as part of the pool, compare with the pool map and fix related inconsistency; and then handle those non-accessed (in former comparison) pool map entries. During these process, there will be yield because of RPC or interaction with admin. All related pool map fixes are in DRAM in this step. After all done, it will call ds_pool_svc_flush_map() to persistently change the pool map and broadcast the changes to other pool shards.

Comment on lines 6880 to 7057
/*
* Have toresign to avoid handling future requests with stale pool map cache.
* Continue to distribute the new pool map to other pool shards since the RDB
* has already been updated.
*/
rdb_resign(svc->ps_rsvc.s_db, svc->ps_rsvc.s_term);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing to update the local map implies that the local secondary group for this pool may not have the latest membership. In this state, it's simpler to just give up, instead of continuing to take some chance. When a new leader steps up, it will distribute the new pool map, schedule rebuild jobs, (and in the future also do what replace_failed_replicas does).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under DAOS check mode, in spite of PS leader itself is down or some other is down during checking the pool membership, then the DAOS check for this pool will be marked as aborted.
On the other hand, if the PS leader switches to other engine because of short time network split without engine down, then it will not cause DAOS check to be failed even if both the old PS leader and the new PS leader do DAOS pool membership check in parallel. It may cause some redundant check, but not error.
So here, the current PS leader must has the latest membership. It has already updated the pool map in RDB, but failed to refresh the pool map (attached to the pool instance) in-DRAM. If we give up, means the DAOS check for this pool will be failed, but if we try to distributed the update to other engines, we may have chance to continue the DAOS check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the PS code assumes that the local secondary group always reflects the latest pool map. At least, I'd suggest skipping the following the map_dist and reconfigure calls in this case to avoid adding burden to non-chk code. Also, do you really intend to return the nonzero rc to chk like this patch does?

Copy link
Contributor Author

@Nasf-Fan Nasf-Fan Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, if do not have others to change, I prefer to adjust it in the subsequent patch #9867 which will be rebased after landing this one.

As for the return value, I think it is better to return it to the caller. It is the caller's duty to determine the next step. For CHK case, if do not specify "failout", then will go ahead.

How do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fine with me.

if (file == NULL) {
D_ERROR(DF_UUIDF": failed to allocate file name for shards status %d\n",
DP_UUID(uuid), i);
D_GOTO(out_path, rc = -DER_NOMEM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we leak clue->pc_tgt_status on the error paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I will refresh the patch to fix it.

Comment on lines +68 to +70
X(MGMT_TGT_SHARD_DESTROY, \
0, &CQF_mgmt_tgt_shard_destroy, \
ds_mgmt_hdlr_tgt_shard_destroy, NULL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do we need to bump DAOS_MGMT_VERSION above for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we need to add support for proto query like we did with object and pool RPCs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it seems unnecessary because we do not support interoperation among servers.


D_ALLOC_ARRAY(cpr->cpr_mbs, cpr->cpr_shard_nr);
if (cpr->cpr_mbs == NULL)
D_GOTO(out, rc = -DER_NOMEM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this error path we'll finalize an rsvc_client that has not been initialized. Could we avoid doing this even if it might work at the moment, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will refresh the patch.

* Flush the pool map to persistent storage (if not under dryrun mode)
* and distribute the pool map to other pool shards.
*/
rc1 = ds_pool_svc_flush_map(svc, map, version);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] The map object already includes the version; the version parameter is unnecessary, isn't it? On the other hand, do you think we should pass a version for ds_pool_svc_flush_map to check before writing the new map? For instance,

read the map: version x
change the map
if the old map is not version x
    try reading and changing again
write the new map

Copy link
Contributor Author

@Nasf-Fan Nasf-Fan Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map object already includes the version; the version parameter is unnecessary, isn't it?

Right, I will drop the redundant parameter.

On the other hand, do you think we should pass a version for ds_pool_svc_flush_map to check before writing the new map? For instance,

Under check mode, we disabled node eviction and do not allow reintegration. Means that there will be no pool map refresh except the DAOS check logic itself update the pool map. So it is unnecessary to re-check the pool map version before ds_pool_svc_flush_map(). On the other hand, if the ds_pool_svc_flush_map() is called under non-check mode, then the sponsor needs to do as you described.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from c906da6 to cc381a0 Compare August 5, 2022 16:02
pool_map_bump_version(struct pool_map *map)
{
map->po_version++;
D_DEBUG(DB_TRACE, "Bumb pool map to version %u\n", map->po_version);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D_DEBUG(DB_TRACE, "Bumb pool map to version %u\n", map->po_version);
D_DEBUG(DB_TRACE, "Bump pool map to version %u\n", map->po_version);

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from cc381a0 to 3d9c88a Compare August 5, 2022 16:07
@daosbuild1 daosbuild1 dismissed their stale review August 5, 2022 16:09

Updated patch

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan requested review from liw and jolivier23 August 7, 2022 13:09
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 3d9c88a to 929dccd Compare August 8, 2022 04:08
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9865/14/execution/node/1046/log

Copy link
Contributor

@jolivier23 jolivier23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question about proto query. If we have a new version for mgmt RPC, do we need to add query?

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from 929dccd to b668567 Compare August 9, 2022 04:03
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9865/15/execution/node/320/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9865/15/execution/node/364/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9865/15/execution/node/323/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9865/15/execution/node/342/log

@daosbuild1
Copy link
Collaborator

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from b668567 to dd5b38f Compare August 9, 2022 10:08
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-9595_1 branch from dd5b38f to d096386 Compare August 10, 2022 01:25
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan requested a review from jolivier23 August 10, 2022 14:35
@Nasf-Fan
Copy link
Contributor Author

@liw @liuxuezhao @jolivier23 , would you please to help review the patch? Thanks!

Copy link
Contributor

@jolivier23 jolivier23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a proto query for the MGMT RPC change?

@Nasf-Fan Nasf-Fan requested a review from mjmac August 12, 2022 07:29
@Nasf-Fan
Copy link
Contributor Author

@mjmac , would you please to help to hand this one? Then I can rebase the subsequent, thanks!

@Nasf-Fan
Copy link
Contributor Author

Do we need a proto query for the MGMT RPC change?

Currently, it seems unnecessary because we do not support interoperation among servers.

@mjmac mjmac merged commit 2b6d5fd into feature/cat_recovery Aug 12, 2022
@mjmac mjmac deleted the Nasf-Fan/DAOS-9595_1 branch August 12, 2022 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants