Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-10250 control: Get enabled and disabled ranks with dmg pool query #14436

Merged
merged 18 commits into from
Jun 11, 2024

Conversation

knard-intel
Copy link
Contributor

@knard-intel knard-intel commented May 24, 2024

Description

This PP is fixing the following issues:

  • DAOS-10250: Makes enabled and disabled ranks option of dmg compatible.
  • DAOS-10253: Update and add cmocka unit tests of engine management related functions.
  • Extra: Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query(). There is no JIRA for this issue as it was discover during the dev of this PR.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Qu10250ick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

github-actions bot commented May 24, 2024

Ticket title is 'Update engine to get enabled and disabled ranks with drpc pool query'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-10250

@knard-intel knard-intel force-pushed the ckochhof/fix/master/daos-10250 branch from 22d9e2f to 1916065 Compare May 24, 2024 09:30
@knard-intel knard-intel self-assigned this May 24, 2024
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/2/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/2/testReport/

@knard-intel knard-intel force-pushed the ckochhof/fix/master/daos-10250 branch from 1916065 to 9d4d536 Compare May 24, 2024 11:19
@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/4/testReport/

@knard-intel knard-intel force-pushed the ckochhof/fix/master/daos-10250 branch from 9d4d536 to f3d6fe5 Compare May 24, 2024 19:46
@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/373/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/358/log

@knard-intel knard-intel force-pushed the ckochhof/fix/master/daos-10250 branch from f3d6fe5 to 5cec650 Compare May 24, 2024 19:57
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/318/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/5/execution/node/315/log

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/6/testReport/

Makes enabled and disabled ranks option of dmg compatible.
Update and add cmocka unit tests of engine management related functions.
Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query().

Features: control dmg
Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@knard-intel knard-intel force-pushed the ckochhof/fix/master/daos-10250 branch from 5cec650 to 52da7cd Compare May 24, 2024 22:42
@knard-intel knard-intel marked this pull request as ready for review May 27, 2024 08:21
@knard-intel knard-intel requested review from a team as code owners May 27, 2024 08:21
tanabarr
tanabarr previously approved these changes May 28, 2024
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @knard-intel , comments are nonblocking and only minor suggestions.

{
/* If function is to return with an error, pool_info and ranks will not be filled. */
if (ds_mgmt_pool_query_return != 0)
return ds_mgmt_pool_query_return;

uuid_copy(ds_mgmt_pool_query_uuid, pool_uuid);
ds_mgmt_pool_query_info_ptr = (void *)pool_info;
if (pool_info != NULL && (pool_info->pi_bits & DPI_ENGINES_ENABLED) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might look nicer to perform the pool_info NULL check once and indent the rest of the checks

Copy link
Contributor Author

@knard-intel knard-intel May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Improve code readability

Copy link
Contributor Author

@knard-intel knard-intel May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Improve code readability

Fixed with commit 29e8d0e


if (ranks != NULL) {
bool get_enabled = (info ? ((info->pi_bits & DPI_ENGINES_ENABLED) != 0) : false);
if (info != NULL && (info->pi_bits & DPI_ENGINES_ENABLED) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a NULL daos_pool_info_t valid here? if so then maybe GOTO label above pool_query_reply_to_info might be useful to skip to in the case of NULL. It looks like we are using the info reference as both input and an output, if so then should we update the documentation for dsc_pool_svc_query to mark the field as in/out? It could be made more clear by reading the input bit-fields from info before explicitly specifying output values in a separate info_out. This suggestion could be considered as overkill but I've occasionally found mixing input output to be problematic.

Copy link
Contributor Author

@knard-intel knard-intel May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree that the usage of the info var is confusing.
Thanks for your attention.

  • Fix the dsc_pool_svc_query() documentation
  • Refactor the info var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree that the usage of the info var is confusing. Thanks for your attention.

  • Fix the dsc_pool_svc_query() documentation
  • Refactor the info var

Fixed with commit 87ff98c

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-14436/16/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/17/testReport/

kanard38 added 3 commits June 3, 2024 19:12
Fix reviewers comments:
- Change rc != -DER_SUCCESS to rc != 0

Features: control dmg
Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
Fix reviewers comments:
- Remove invalid assert

Features: control dmg
Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/18/testReport/

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/19/testReport/

kjacque
kjacque previously approved these changes Jun 3, 2024
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My comments are minor non-blocking suggestions. Good work.

}

if ((pi_bits & DPI_ENGINES_ENABLED) != 0) {
D_ASSERT(enabled_ranks != NULL);
if (enabled_ranks == NULL) {
DL_ERROR(-DER_INVAL, DF_UUID ": query pool with invalid params",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor - might be nice to be a little more descriptive here. Something like "query pool requested enabled ranks, but ptr is NULL"?

Copy link
Contributor Author

@knard-intel knard-intel Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix error message

Fixed with commit 533bf93

}
D_DEBUG(DB_MD, DF_UUID ": found %" PRIu32 " enabled ranks in pool map\n",
DP_UUID(pool_uuid), enabled_rank_list->rl_nr);
}

if ((pi_bits & DPI_ENGINES_DISABLED) != 0) {
D_ASSERT(disabled_ranks != NULL);
if (disabled_ranks == NULL) {
DL_ERROR(-DER_INVAL, DF_UUID ": query pool with invalid params",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Contributor Author

@knard-intel knard-intel Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix error message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix error message

Fixed with commit 533bf93

Fix reviewers comments:
- Fix error message

Features: control dmg
Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-14436/20/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14436/20/execution/node/1189/log

Copy link
Contributor

@mjmac mjmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for addressing this. At google this was briefly a problem because our service control team wanted to query both enabled and disabled and were confused by the "not supported" error.

Comment on lines +1749 to +1750
char *enabled_ranks_str = NULL;
char *disabled_ranks_str = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel too strongly either way, but just a thought -- I wonder if it makes more sense to convert the response message to return a pair of uint32 arrays instead of the range strings. It would simplify the engine-side code and let the caller decide how to present the lists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall how many ranks can be in a pool. If it's a lot, the string may actually be the better way to transmit that value...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the case of Aurora, a pool could in theory span 2048 ranks. As far as I'm aware there's no hard-coded limit on the number of ranks in a pool, so it could go higher for larger systems.

You make a good point about the ranklist string being a more compact representation. As I said, I don't feel very strongly about this, I guess I was just reacting a little to what seemed like presentation-layer logic living in the drpc handler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, I have to admit that I am also not found of using a string, and I will indeed prefer to use a list of intervals.
However, I will prefer to do that in a follow up PR with a dedicated ticket.
Does it makes sense.

Copy link
Contributor

@mjmac mjmac Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists of intervals might be a nice compromise here. If I'm understanding correctly, you're proposing something like:
enabled_ranks: [0,7,9,15] (equivalent to "[0-7,9-15]")

Or, best-case:
enabled_ranks: [0,15] (equivalent to "[0-15]")

Right? This would still be reasonably compact, and would still require some interpretation, but would at least eliminate the string manipulation steps for encode/decode.

In any case, I am fine with doing that as a follow-up rather than forcing it into this patch.

Copy link
Contributor

@tanabarr tanabarr Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this simple and transferring uint32 enabled_ranks over protobuf would be my preferred solution if range string is not acceptable. Given the infrequency of the operation would ~2k*uint32s introduce a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knard-intel IMO better to solve this problem in a new PR associated with a new ticket. I think it's OK to leave the strings for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanabarr Fair point... I think it would be a good idea to test and see what the impact is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side both solution (integer interval and explicit list of ranks) make sense.
Interval of integer should not be so hard to implement on the engine side as it is already building a list of intervals.
If it is OK for you, I will submit this PR as it for landing (as soon as the CI is OK), and I will continue investigation on this point in a follow-up PR.
If it is not OK for you, do not hesitate to ask for modifications.

Copy link
Contributor Author

@knard-intel knard-intel Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up action regarding decoupling improvement (update of the protobuf structure) will be done in the ticket DAOS-15987

…/daos-10250

Features: control dmg
Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
…/daos-10250

Features: control dmg
Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: spelling PP-PR in commit message

@knard-intel knard-intel requested review from daltonbohning, liw and a team June 11, 2024 06:47
@knard-intel
Copy link
Contributor Author

@daos-stack/daos-gatekeeper , please could you lend this PR with the following commit message:

Title: DAOS-10250 control: Get enabled and disabled ranks with dmg pool query

Body:

Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250).
Update and add cmocka unit tests of engine management related functions (DAOS-10253).

Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query().

@mjmac mjmac merged commit 5c8868f into master Jun 11, 2024
70 of 87 checks passed
@mjmac mjmac deleted the ckochhof/fix/master/daos-10250 branch June 11, 2024 12:59
mjmac pushed a commit that referenced this pull request Jun 11, 2024
#14436)

Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250).
Update and add cmocka unit tests of engine management related functions (DAOS-10253).

Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query().

Required-githooks: true

Change-Id: I27f5b3acb003faea2d53697e83f0afeb0e284080
Signed-off-by: Cedric Koch-Hofer <[email protected]>
mjmac pushed a commit that referenced this pull request Jun 11, 2024
#14436)

Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250).
Update and add cmocka unit tests of engine management related functions (DAOS-10253).

Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query().

Required-githooks: true

Change-Id: I27f5b3acb003faea2d53697e83f0afeb0e284080
Signed-off-by: Cedric Koch-Hofer <[email protected]>
mjmac added a commit that referenced this pull request Jun 14, 2024
#14436) (#14548)

Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250).
Update and add cmocka unit tests of engine management related functions (DAOS-10253).

Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query().

Signed-off-by: Cedric Koch-Hofer <[email protected]>
mjmac pushed a commit that referenced this pull request Dec 12, 2024
#14436)

Allow enabled and disabled ranks option to be used simultaneously (DAOS-10250).
Update and add cmocka unit tests of engine management related functions (DAOS-10253).

Fix memory leaks of ranks string in function ds_mgmt_drpc_pool_query().

Required-githooks: true

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@mjmac mjmac mentioned this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants