[CPU] Support HBM flatmode and fakenuma mode #3918
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR support CPU system with HBM memory installed and also system with fakenuma mode. With this PR, DeepSpeed run with switch
--bind_cores_to_rank
can better utilize HBM memory bandwidth and also work with these system correctly.(https://cdrdv2-public.intel.com/769060/354227-intel-xeon-cpu-max-series-configuration-and-tuning-guide.pdf).
In flat mode, HBM memory has its own NUMA node which has no CPU cores associated to it. The output of
numactl -H
on a computer with flat mode could looke like the following:In HBM flat mode, we want to use
numactl -p <hbm-node-id>
to prefer memory allocation from CPU node to its related HBM node. So worker running on CPU node 0 would prefer allocate memory on node 2, and worker on CPU node 1 prefer allocate from node 3.worker0:
numactl -C 0-55 -p 2 python ...
worker1:
numactl -C 56-111 -p 3 python ...
numactl -H
on a system with fakenuma looks like the following:With fakenuma, multiple NUMA node will be associated to the same set of CPU cores, so these CPUs could have minimal distance to all these NUMA nodes assocated to them. With fakenuma, each worker will be bind with all the NUMA mode
worker0:
numactl -m 0,1,2,3 -C 0-47 python ...
worker1:
numactl -m 4,5,6,7 -C 48-95 python ...