Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15739 engine: Add single-engine, multi-socket support #14311

Merged
merged 4 commits into from
May 8, 2024

Conversation

jolivier23
Copy link
Contributor

@jolivier23 jolivier23 commented May 3, 2024

Backport for the following patches
DAOS-13380 engine: refine tgt_nr check (#12405)
DAOS-15739 engine: Add multi-socket support (#14234)

  • DAOS-13380 engine: refine tgt_nr check
  1. for non-DAOS_TARGET_OVERSUBSCRIBE case fail to start engine if #cores is not enough
  2. for DAOS_TARGET_OVERSUBSCRIBE case allow to force start engine The #nr_xs_helpers possibly be reduced for either case.

Add a simple multi-socket mode for use cases where a single engine must be used. Avoids the issue of having all helper xstreams automatically assigned to a single NUMA node thus increasing efficiency of synchronizations between I/O and helper xstreams.

It is the default behavior if all of the following are true

Neither pinned_numa_node nor first_core are used.
No oversubscription is requested
NUMA has uniform number of cores
targets and helpers divide evenly among numa nodes There is more than one numa node
Update server config logic to ensure first_core is passed on to engine if it's set while keeping existing behavior when both first_core: 0 and pinned_numa_node are set.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Backport for the following patches
DAOS-13380 engine: refine tgt_nr check
DAOS-15739 engine: Add multi-socket support (#14234)

* DAOS-13380 engine: refine tgt_nr check

1. for non-DAOS_TARGET_OVERSUBSCRIBE case
   fail to start engine if #cores is not enough
2. for DAOS_TARGET_OVERSUBSCRIBE case
   allow to force start engine
The #nr_xs_helpers possibly be reduced for either case.

* DAOS-15739 engine: Add multi-socket support (#14234)

Add a simple multi-socket mode for use cases where a single
engine must be used. Avoids the issue of having all helper
xstreams automatically assigned to a single NUMA node thus
increasing efficiency of synchronizations between I/O and
helper xstreams.

It is the default behavior if all of the following are true

Neither pinned_numa_node nor first_core are used.
No oversubscription is requested
NUMA has uniform number of cores
targets and helpers divide evenly among numa nodes
There is more than one numa node
Update server config logic to ensure first_core is passed
on to engine if it's set while keeping existing behavior
when both first_core: 0 and pinned_numa_node are set.

Signed-off-by: Jeff Olivier <[email protected]>
Signed-off-by: Xuezhao Liu <[email protected]>
Signed-off-by: Tom Nabarro <[email protected]>
@jolivier23 jolivier23 changed the title Add single-engine, multi-socket support DAOS-15739 Add single-engine, multi-socket support May 3, 2024
Copy link

github-actions bot commented May 3, 2024

Bug-tracker data:
Ticket title is 'support a single engine, multi-socket configuration'
Status is 'Resolved'
https://daosio.atlassian.net/browse/DAOS-15739

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@jolivier23 jolivier23 changed the title DAOS-15739 Add single-engine, multi-socket support DAOS-15739 engine: Add single-engine, multi-socket support May 3, 2024
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14311/1/execution/node/1474/log

@jolivier23 jolivier23 requested review from mjmac and techbasset May 6, 2024 20:22
jolivier23 added 2 commits May 7, 2024 19:20
Required-githooks: true

Change-Id: I92f65924f9b4b3dce6a756e01e5bfc9e584af0b6
Signed-off-by: Jeff Olivier <[email protected]>
Required-githooks: true

Change-Id: Iffb8046df03d0e6eb59475245786770ca5310f75
Signed-off-by: Jeff Olivier <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Required-githooks: true

Change-Id: I96bed7f1a8aa8ce546129064bbe562d7e34cd8b2
Signed-off-by: Jeff Olivier <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.


check:
D_ASSERT(target < DSS_XS_NR_TOTAL && target >= dss_sys_xs_nr);
offload = target + 17; /* Seed next selection */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to our config? Maybe it should be in a #define?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be 1 probably. I was trying to make it sort of random as to which ULT it uses next without calling rand

@jolivier23 jolivier23 merged commit f16a7dd into google/2.4 May 8, 2024
33 of 36 checks passed
@jolivier23 jolivier23 deleted the jvolivie/multisocket branch May 8, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants