Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-9599 chk: handle inconsistent pool label #9524

Merged
merged 1 commit into from
Jul 11, 2022

Conversation

Nasf-Fan
Copy link
Contributor

The pool label is mainly used by MS to lookup pool UUID by label.
The PS recorded pool label is some kind of the backup. So if the
MS known pool label does not match the pool label recorded by PS,
then trust MS as long as MS known pool label is not NULL. Related
repair will not happen on check leader, instead, it will be done
on the PS leader during CHK__CHECK_SCAN_PHASE__CSP_POOL_CLEANUP.

If the admin wants to recover pool label on MS with the pool info
on PS, then it will use DRPC_METHOD_CHK_REG_POOL drpc upcall.

The patch also contains some other fixes:

  1. Only handle dangling pool within the given pools check list.

  2. Some code cleanup for "uint32_t" and "int32_t" usage in chk.

Signed-off-by: Fan Yong [email protected]

Required-githooks: true

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

The pool label is mainly used by MS to lookup pool UUID by label.
The PS recorded pool label is some kind of the backup. So if the
MS known pool label does not match the pool label recorded by PS,
then trust MS as long as MS known pool label is not NULL. Related
repair will not happen on check leader, instead, it will be done
on the PS leader during CHK__CHECK_SCAN_PHASE__CSP_POOL_CLEANUP.

If the admin wants to recover pool label on MS with the pool info
on PS, then it will use DRPC_METHOD_CHK_REG_POOL drpc upcall.

The patch also contains some other fixes:

1. Only handle dangling pool within the given pools check list.

2. If some engine failed to report some pool information, then
   related pool cannot be handled as no quorum case.

3. Some code cleanup for "uint32_t" and "int32_t" usage in chk.

Signed-off-by: Fan Yong <[email protected]>
@Nasf-Fan Nasf-Fan requested a review from a team as a code owner July 6, 2022 10:45
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan Nasf-Fan requested review from mjmac and liuxuezhao and removed request for a team July 7, 2022 08:30
Copy link
Contributor

@mjmac mjmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine to me, but it would be best to get another review from someone who is more familiar with the engine side of things. I am a better reviewer for control plane and protobuf updates.

@jolivier23 jolivier23 added the CR Catastrophic Recovery Feature label Jul 7, 2022
@Nasf-Fan Nasf-Fan requested review from jolivier23 and wangdi1 July 8, 2022 08:34
@Nasf-Fan Nasf-Fan requested review from mjmac and daosbuild1 July 11, 2022 09:30
@mjmac mjmac merged commit 082a664 into feature/cat_recovery Jul 11, 2022
@mjmac mjmac deleted the DAOS-9599_1 branch July 11, 2022 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CR Catastrophic Recovery Feature
Development

Successfully merging this pull request may close these issues.

4 participants