-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-10250 control: Get enabled and disabled ranks with dmg pool query #14436
Conversation
Ticket title is 'Update engine to get enabled and disabled ranks with drpc pool query' |
22d9e2f
to
1916065
Compare
Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/2/testReport/ |
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/2/testReport/ |
1916065
to
9d4d536
Compare
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/4/testReport/ |
9d4d536
to
f3d6fe5
Compare
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/373/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/358/log |
f3d6fe5
to
5cec650
Compare
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/318/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/315/log |
Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/6/testReport/ |
Makes enabled and disabled ranks option of dmg compatible. Update and add cmocka unit tests of engine management related functions. Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query(). Features: control dmg Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
5cec650
to
52da7cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @knard-intel , comments are nonblocking and only minor suggestions.
src/mgmt/tests/mocks.c
Outdated
{ | ||
/* If function is to return with an error, pool_info and ranks will not be filled. */ | ||
if (ds_mgmt_pool_query_return != 0) | ||
return ds_mgmt_pool_query_return; | ||
|
||
uuid_copy(ds_mgmt_pool_query_uuid, pool_uuid); | ||
ds_mgmt_pool_query_info_ptr = (void *)pool_info; | ||
if (pool_info != NULL && (pool_info->pi_bits & DPI_ENGINES_ENABLED) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: might look nicer to perform the pool_info NULL check once and indent the rest of the checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Improve code readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Improve code readability
Fixed with commit 29e8d0e
src/pool/srv_cli.c
Outdated
|
||
if (ranks != NULL) { | ||
bool get_enabled = (info ? ((info->pi_bits & DPI_ENGINES_ENABLED) != 0) : false); | ||
if (info != NULL && (info->pi_bits & DPI_ENGINES_ENABLED) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a NULL daos_pool_info_t valid here? if so then maybe GOTO label above pool_query_reply_to_info might be useful to skip to in the case of NULL. It looks like we are using the info reference as both input and an output, if so then should we update the documentation for dsc_pool_svc_query to mark the field as in/out? It could be made more clear by reading the input bit-fields from info before explicitly specifying output values in a separate info_out. This suggestion could be considered as overkill but I've occasionally found mixing input output to be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree that the usage of the info var is confusing.
Thanks for your attention.
- Fix the
dsc_pool_svc_query()
documentation - Refactor the info var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree that the usage of the info var is confusing. Thanks for your attention.
- Fix the
dsc_pool_svc_query()
documentation- Refactor the info var
Fixed with commit 87ff98c
Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14436/16/display/redirect |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/17/testReport/ |
…/daos-10250 Required-githooks: true
Fix reviewers comments: - Change rc != -DER_SUCCESS to rc != 0 Features: control dmg Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
Fix reviewers comments: - Remove invalid assert Features: control dmg Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/18/testReport/ |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/19/testReport/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. My comments are minor non-blocking suggestions. Good work.
src/pool/srv_cli.c
Outdated
} | ||
|
||
if ((pi_bits & DPI_ENGINES_ENABLED) != 0) { | ||
D_ASSERT(enabled_ranks != NULL); | ||
if (enabled_ranks == NULL) { | ||
DL_ERROR(-DER_INVAL, DF_UUID ": query pool with invalid params", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor - might be nice to be a little more descriptive here. Something like "query pool requested enabled ranks, but ptr is NULL"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fix error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fix error message
Fixed with commit 533bf93
src/pool/srv_cli.c
Outdated
} | ||
D_DEBUG(DB_MD, DF_UUID ": found %" PRIu32 " enabled ranks in pool map\n", | ||
DP_UUID(pool_uuid), enabled_rank_list->rl_nr); | ||
} | ||
|
||
if ((pi_bits & DPI_ENGINES_DISABLED) != 0) { | ||
D_ASSERT(disabled_ranks != NULL); | ||
if (disabled_ranks == NULL) { | ||
DL_ERROR(-DER_INVAL, DF_UUID ": query pool with invalid params", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fix error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Fix error message
Fixed with commit 533bf93
Fix reviewers comments: - Fix error message Features: control dmg Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/20/testReport/ |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/20/execution/node/1189/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for addressing this. At google this was briefly a problem because our service control team wanted to query both enabled and disabled and were confused by the "not supported" error.
char *enabled_ranks_str = NULL; | ||
char *disabled_ranks_str = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel too strongly either way, but just a thought -- I wonder if it makes more sense to convert the response message to return a pair of uint32 arrays instead of the range strings. It would simplify the engine-side code and let the caller decide how to present the lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall how many ranks can be in a pool. If it's a lot, the string may actually be the better way to transmit that value...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in the case of Aurora, a pool could in theory span 2048 ranks. As far as I'm aware there's no hard-coded limit on the number of ranks in a pool, so it could go higher for larger systems.
You make a good point about the ranklist string being a more compact representation. As I said, I don't feel very strongly about this, I guess I was just reacting a little to what seemed like presentation-layer logic living in the drpc handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, I have to admit that I am also not found of using a string, and I will indeed prefer to use a list of intervals.
However, I will prefer to do that in a follow up PR with a dedicated ticket.
Does it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lists of intervals might be a nice compromise here. If I'm understanding correctly, you're proposing something like:
enabled_ranks: [0,7,9,15] (equivalent to "[0-7,9-15]")
Or, best-case:
enabled_ranks: [0,15] (equivalent to "[0-15]")
Right? This would still be reasonably compact, and would still require some interpretation, but would at least eliminate the string manipulation steps for encode/decode.
In any case, I am fine with doing that as a follow-up rather than forcing it into this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping this simple and transferring uint32 enabled_ranks
over protobuf would be my preferred solution if range string is not acceptable. Given the infrequency of the operation would ~2k*uint32s introduce a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knard-intel IMO better to solve this problem in a new PR associated with a new ticket. I think it's OK to leave the strings for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanabarr Fair point... I think it would be a good idea to test and see what the impact is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side both solution (integer interval and explicit list of ranks) make sense.
Interval of integer should not be so hard to implement on the engine side as it is already building a list of intervals.
If it is OK for you, I will submit this PR as it for landing (as soon as the CI is OK), and I will continue investigation on this point in a follow-up PR.
If it is not OK for you, do not hesitate to ask for modifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up action regarding decoupling improvement (update of the protobuf structure) will be done in the ticket DAOS-15987
…/daos-10250 Features: control dmg Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
…/daos-10250 Features: control dmg Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: spelling PP-PR in commit message
@daos-stack/daos-gatekeeper , please could you lend this PR with the following commit message: Title: Body:
|
#14436) Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250). Update and add cmocka unit tests of engine management related functions (DAOS-10253). Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query(). Required-githooks: true Change-Id: I27f5b3acb003faea2d53697e83f0afeb0e284080 Signed-off-by: Cedric Koch-Hofer <[email protected]>
#14436) Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250). Update and add cmocka unit tests of engine management related functions (DAOS-10253). Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query(). Required-githooks: true Change-Id: I27f5b3acb003faea2d53697e83f0afeb0e284080 Signed-off-by: Cedric Koch-Hofer <[email protected]>
#14436) (#14548) Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250). Update and add cmocka unit tests of engine management related functions (DAOS-10253). Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query(). Signed-off-by: Cedric Koch-Hofer <[email protected]>
#14436) Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250). Update and add cmocka unit tests of engine management related functions (DAOS-10253). Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query(). Required-githooks: true Signed-off-by: Cedric Koch-Hofer <[email protected]>
Description
This PP is fixing the following issues:
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: