-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-9595 chk: consolidate pool membership #9611
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9611/1/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
* CHK_POOL_MBS: | ||
* From check leader to check engine to notify the pool members. | ||
*/ | ||
#define DAOS_ISEQ_CHK_POOL_MBS \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) Macros with complex values should be enclosed in parentheses
((d_string_t) (cpmi_label) CRT_VAR) \ | ||
((struct chk_pool_mbs) (cpmi_targets) CRT_ARRAY) \ | ||
|
||
#define DAOS_OSEQ_CHK_POOL_MBS \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) Macros with complex values should be enclosed in parentheses
src/common/pool_map.c
Outdated
@@ -413,6 +413,7 @@ pool_buf_attach(struct pool_buf *buf, struct pool_component *comps, | |||
buf->pb_domain_nr++; | |||
|
|||
buf->pb_comps[nr] = comps[0]; | |||
buf->pb_comps[nr].co_flags&= ~PO_COMPF_CHK_DONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) spaces required around that '&=' (ctx:VxW)
src/common/pool_map.c
Outdated
@@ -413,6 +413,7 @@ pool_buf_attach(struct pool_buf *buf, struct pool_component *comps, | |||
buf->pb_domain_nr++; | |||
|
|||
buf->pb_comps[nr] = comps[0]; | |||
buf->pb_comps[nr].co_flags&= ~PO_COMPF_CHK_DONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buf->pb_comps[nr].co_flags&= ~PO_COMPF_CHK_DONE; | |
buf->pb_comps[nr].co_flags &= ~PO_COMPF_CHK_DONE; |
@@ -214,4 +217,16 @@ CRT_RPC_DECLARE(mgmt_mark, DAOS_ISEQ_MGMT_MARK, DAOS_OSEQ_MGMT_MARK) | |||
CRT_RPC_DECLARE(mgmt_get_bs_state, DAOS_ISEQ_MGMT_GET_BS_STATE, | |||
DAOS_OSEQ_MGMT_GET_BS_STATE) | |||
|
|||
#define DAOS_ISEQ_MGMT_TGT_SHARD_DESTROY /* input fields */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) Macros with complex values should be enclosed in parentheses
((int32_t) (tsdi_shard_idx) CRT_VAR) \ | ||
((uint32_t) (tsdi_padding) CRT_VAR) | ||
|
||
#define DAOS_OSEQ_MGMT_TGT_SHARD_DESTROY /* output fields */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) Macros with complex values should be enclosed in parentheses
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9611/1/execution/node/132/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style warning(s) for job https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-9611/2/
Please review https://wiki.hpdd.intel.com/display/DC/Coding+Rules
@@ -214,4 +217,16 @@ CRT_RPC_DECLARE(mgmt_mark, DAOS_ISEQ_MGMT_MARK, DAOS_OSEQ_MGMT_MARK) | |||
CRT_RPC_DECLARE(mgmt_get_bs_state, DAOS_ISEQ_MGMT_GET_BS_STATE, | |||
DAOS_OSEQ_MGMT_GET_BS_STATE) | |||
|
|||
#define DAOS_ISEQ_MGMT_TGT_SHARD_DESTROY /* input fields */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) Macros with complex values should be enclosed in parentheses
((int32_t) (tsdi_shard_idx) CRT_VAR) \ | ||
((uint32_t) (tsdi_padding) CRT_VAR) | ||
|
||
#define DAOS_OSEQ_MGMT_TGT_SHARD_DESTROY /* output fields */ \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(style) Macros with complex values should be enclosed in parentheses
Test stage checkpatch completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-9611/2/execution/node/133/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
When DAOS check start, all involved check engines will report their known pools' information, including the pool service replicas, pool label and related storage allocation, to the check leader via reply. After the pool list consolidation in the pass_1, for each pool, the check leader will send related pool information to its pool service leaders via new RPC - CHK_POOL_MBS. On the check engine side, the pool service leader compares the pool map with these information pushed from the check leader and handles the following cases: 1. An target has some allocated storage but does not appear in the pool map. Under such case, the associated space will be deleted from the engine by default. 2. An target has some allocated storage and is marked as "DOWN" or "DOWNOUT" in the pool map. For this case, the administrator can decide to either remove or leave it there. 3. An target is referenced in the pool map ("NEW", "UP", "UPIN" or "DRAIN"), but no storage is actually allocated on this engine. Under such case, the entry for the target in the pool map will be marked as "DOWN" (for the "UP", "UPIN" or "DRAIN" entry) or "DOWNOUT" (for the "NEW" entry). Temporarily skip code format check against src/chk/chk_internal.h and src/mgmt/rpc.h to avoid fake warning messages. Signed-off-by: Fan Yong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
}; | ||
|
||
/* Pool service */ | ||
struct pool_svc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without a significantly strong reason, I don't think we should just expose the internal pool_svc type. Could we revert these pool_svc changes and do the pool map manipulations in some new PS RPC(s) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered the pool_svc usage cases for a while, but did not found some suitable solution. One of the important reasons for exporting such structure is that all RDB related APIs need the "pool_svc" parameter to start the TX, lock, and so on. If we do not export "pool_svc", then related CHK logic have to be moved into pool/rsvc module(s) or introduce some complex (but non-general) callback, that will make related logic to be more dirty than current implementation. I do not think it is expected.
The usage "pool_svc" is not only in this patch but also in subsequent patches. For example, on the engine of PS leader, the CHK logic will do the following:
- Check whether current engine is the PS leader or not. If yes, then
- Parse all related pool shards reported information one by one, compare with its local pool map, and then do related reparation, such as refresh the pool map or destroy related pool shard, that may involve in the interaction with admin via CHK leader.
- Compare pool label from MBS with its local property, and repair if necessary.
In above example, the "pool_svc" is generated in the 1st step, then repeatedly used in the 2nd step, and then the 3rd step. If we do NOT export "pool_svc" in the 1st step, then means that both 2nd and 3rd steps will be wrapped into pool/rsvc, but they are quite different than other general pool/rsvc logic, looks very dirty and strange.
It is replaced by the patch #9865 |
When DAOS check start, all involved check engines will report their
known pools' information, including the pool service replicas, pool
label and related storage allocation, to the check leader via reply.
After the pool list consolidation in the pass_1, for each pool, the
check leader will send related pool information to its pool service
leaders via new RPC - CHK_POOL_MBS.
On the check engine side, the pool service leader compares the pool
map with these information pushed from the check leader and handles
the following cases:
An target has some allocated storage but does not appear in the
pool map. Under such case, the associated space will be deleted
from the engine by default.
An target has some allocated storage and is marked as "DOWN" or
"DOWNOUT" in the pool map. For this case, the administrator can
decide to either remove or leave it there.
An target is referenced in the pool map ("NEW", "UP", "UPIN" or
"DRAIN"), but no storage is actually allocated on this engine.
Under such case, the entry for the target in the pool map will
be marked as "DOWN" (for the "UP", "UPIN" or "DRAIN" entry) or
"DOWNOUT" (for the "NEW" entry).
Temporarily skip code format check against src/chk/chk_internal.h
and src/mgmt/rpc.h to avoid fake warning messages.
Signed-off-by: Fan Yong [email protected]