DAOS-15739 engine: Add single-engine, multi-socket support #14311

jolivier23 · 2024-05-03T22:33:13Z

Backport for the following patches
DAOS-13380 engine: refine tgt_nr check (#12405)
DAOS-15739 engine: Add multi-socket support (#14234)

DAOS-13380 engine: refine tgt_nr check

for non-DAOS_TARGET_OVERSUBSCRIBE case fail to start engine if #cores is not enough
for DAOS_TARGET_OVERSUBSCRIBE case allow to force start engine The #nr_xs_helpers possibly be reduced for either case.

DAOS-15739 engine: Add multi-socket support (DAOS-15739 engine: Add multi-socket support #14234)

Add a simple multi-socket mode for use cases where a single engine must be used. Avoids the issue of having all helper xstreams automatically assigned to a single NUMA node thus increasing efficiency of synchronizations between I/O and helper xstreams.

It is the default behavior if all of the following are true

Neither pinned_numa_node nor first_core are used.
No oversubscription is requested
NUMA has uniform number of cores
targets and helpers divide evenly among numa nodes There is more than one numa node
Update server config logic to ensure first_core is passed on to engine if it's set while keeping existing behavior when both first_core: 0 and pinned_numa_node are set.

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

Backport for the following patches DAOS-13380 engine: refine tgt_nr check DAOS-15739 engine: Add multi-socket support (#14234) * DAOS-13380 engine: refine tgt_nr check 1. for non-DAOS_TARGET_OVERSUBSCRIBE case fail to start engine if #cores is not enough 2. for DAOS_TARGET_OVERSUBSCRIBE case allow to force start engine The #nr_xs_helpers possibly be reduced for either case. * DAOS-15739 engine: Add multi-socket support (#14234) Add a simple multi-socket mode for use cases where a single engine must be used. Avoids the issue of having all helper xstreams automatically assigned to a single NUMA node thus increasing efficiency of synchronizations between I/O and helper xstreams. It is the default behavior if all of the following are true Neither pinned_numa_node nor first_core are used. No oversubscription is requested NUMA has uniform number of cores targets and helpers divide evenly among numa nodes There is more than one numa node Update server config logic to ensure first_core is passed on to engine if it's set while keeping existing behavior when both first_core: 0 and pinned_numa_node are set. Signed-off-by: Jeff Olivier <[email protected]> Signed-off-by: Xuezhao Liu <[email protected]> Signed-off-by: Tom Nabarro <[email protected]>

github-actions · 2024-05-03T22:33:30Z

Bug-tracker data:
Ticket title is 'support a single engine, multi-socket configuration'
Status is 'Resolved'
https://daosio.atlassian.net/browse/DAOS-15739

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2024-05-04T18:26:14Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-14311/1/execution/node/1474/log

Required-githooks: true Change-Id: I92f65924f9b4b3dce6a756e01e5bfc9e584af0b6 Signed-off-by: Jeff Olivier <[email protected]>

Required-githooks: true Change-Id: Iffb8046df03d0e6eb59475245786770ca5310f75 Signed-off-by: Jeff Olivier <[email protected]>

daosbuild1

LGTM. No errors found by checkpatch.

Required-githooks: true Change-Id: I96bed7f1a8aa8ce546129064bbe562d7e34cd8b2 Signed-off-by: Jeff Olivier <[email protected]>

daosbuild1

LGTM. No errors found by checkpatch.

mjmac · 2024-05-08T14:20:51Z

src/engine/ult.c


 check:
 	D_ASSERT(target < DSS_XS_NR_TOTAL && target >= dss_sys_xs_nr);
+	offload = target + 17; /* Seed next selection */


Is this specific to our config? Maybe it should be in a #define?

Could be 1 probably. I was trying to make it sort of random as to which ULT it uses next without calling rand

jolivier23 changed the title ~~Add single-engine, multi-socket support~~ DAOS-15739 Add single-engine, multi-socket support May 3, 2024

daosbuild1 reviewed May 3, 2024

View reviewed changes

jolivier23 changed the title ~~DAOS-15739 Add single-engine, multi-socket support~~ DAOS-15739 engine: Add single-engine, multi-socket support May 3, 2024

jolivier23 requested review from mjmac and techbasset May 6, 2024 20:22

mjmac approved these changes May 6, 2024

View reviewed changes

jolivier23 added 2 commits May 7, 2024 19:20

Fix an issue

369e72b

Required-githooks: true Change-Id: I92f65924f9b4b3dce6a756e01e5bfc9e584af0b6 Signed-off-by: Jeff Olivier <[email protected]>

Skip-func-hw-test: true

c3c4f77

Required-githooks: true Change-Id: Iffb8046df03d0e6eb59475245786770ca5310f75 Signed-off-by: Jeff Olivier <[email protected]>

daosbuild1 reviewed May 8, 2024

View reviewed changes

Skip-func-hw: true

eb2de1b

Required-githooks: true Change-Id: I96bed7f1a8aa8ce546129064bbe562d7e34cd8b2 Signed-off-by: Jeff Olivier <[email protected]>

daosbuild1 reviewed May 8, 2024

View reviewed changes

mjmac reviewed May 8, 2024

View reviewed changes

jolivier23 merged commit f16a7dd into google/2.4 May 8, 2024
33 of 36 checks passed

jolivier23 deleted the jvolivie/multisocket branch May 8, 2024 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-15739 engine: Add single-engine, multi-socket support #14311

DAOS-15739 engine: Add single-engine, multi-socket support #14311

jolivier23 commented May 3, 2024 •

edited

Loading

github-actions bot commented May 3, 2024 •

edited

Loading

daosbuild1 left a comment

daosbuild1 commented May 4, 2024

daosbuild1 left a comment

daosbuild1 left a comment

mjmac May 8, 2024

jolivier23 May 8, 2024

DAOS-15739 engine: Add single-engine, multi-socket support #14311

DAOS-15739 engine: Add single-engine, multi-socket support #14311

Conversation

jolivier23 commented May 3, 2024 • edited Loading

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented May 3, 2024 • edited Loading

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented May 4, 2024

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

mjmac May 8, 2024

Choose a reason for hiding this comment

jolivier23 May 8, 2024

Choose a reason for hiding this comment

jolivier23 commented May 3, 2024 •

edited

Loading

github-actions bot commented May 3, 2024 •

edited

Loading