Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-9584 chk: Add srv handlers for checker upcalls #8579

Merged
merged 2 commits into from
Apr 4, 2022

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Mar 30, 2022

Implement control plane handlers for the following
engine checker dRPC upcalls:

  • CheckerListPools
  • CheckerRegisterPool
  • CheckerDeregisterPool

Signed-off-by: Michael MacDonald [email protected]

The TestDatabase only needs to be set up once for the
whole suite.

Signed-off-by: Michael MacDonald <[email protected]>
@mjmac mjmac changed the base branch from master to feature/cat_recovery March 30, 2022 19:37
@daosbuild1
Copy link
Collaborator

Implement control plane handlers for the following
engine checker dRPC upcalls:
  * CheckerListPools
  * CheckerRegisterPool
  * CheckerDeregisterPool

Signed-off-by: Michael MacDonald <[email protected]>
@mjmac mjmac force-pushed the mjmac/DAOS-9584 branch from 54d9063 to 2368ec7 Compare March 30, 2022 19:42
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@mjmac mjmac requested review from Nasf-Fan and kjacque March 31, 2022 01:53
@mjmac mjmac marked this pull request as ready for review March 31, 2022 12:34
@mjmac mjmac requested a review from a team as a code owner March 31, 2022 12:34
return
}

ps := &system.PoolService{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nasf-Fan: I think we can add it later, but we should probably also have the full set of known storage ranks as part of this information. The MS stores this information for later use during pool destroy/cleanup. We also can store some information about the per-rank tier storage allocations (scm/nvme), but we don't really do anything with it right now.

case drpc.MethodCheckerRegisterPool:
return mod.handleCheckerRegisterPool(ctx, req)
case drpc.MethodCheckerDeregisterPool:
return mod.handleCheckerDeregisterPool(ctx, req)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nasf-Fan: Hmm, thinking about this a little more, we probably need an upcall to fix a pool too, right? E.g. in the case where the label stored in the pool properties does not match the label stored in the MS. Or the svc ranks are wrong, etc.

I suppose the checker could also just ask the MS to deregister the old pool and register a new pool with the fixed values. Maybe that's simpler. But as I noted in the other comment we may lose information that way.

Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks reasonable. I found myself wondering about all that this checker does, though. Returning DER_MISC is vague at best--maybe it would be useful to add new DER codes? Or otherwise provide a string that the engine could log as well?

@mjmac
Copy link
Contributor Author

mjmac commented Apr 4, 2022

Overall looks reasonable. I found myself wondering about all that this checker does, though. Returning DER_MISC is vague at best--maybe it would be useful to add new DER codes? Or otherwise provide a string that the engine could log as well?

In many (most?) cases the upcall handlers do return more specific status codes for what are probably the most likely errors (e.g. DER_EXIST, DER_INVALID, etc). We could probably do more to inspect the errors for other failures (e.g. RemovePoolService() fails, return DER_NOTLEADER if it's because leadership was lost), but at the moment there's still a lot of undefined/undesigned behavior. What we have now seems reasonable as a starting point while we iterate/integrate.

@mjmac mjmac merged commit 57204f0 into feature/cat_recovery Apr 4, 2022
@mjmac mjmac deleted the mjmac/DAOS-9584 branch April 4, 2022 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants