ch4/ofi: Lazily register FI_MULTI_RECV buffers #6422

raffenet · 2023-02-27T18:11:15Z

Pull Request Description

Avoid consuming GPU resources during initialization by using regular malloc for FI_MULTI_RECV buffers. Do lazy registration when the first GPU communication is detected.

Author Checklist

Provide Description
Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
Commits Follow Good Practice
Commits are self-contained and do not do two things at once.
Commit message is of the form: module: short description
Commit message explains what's in the commit.
Passes All Tests
Whitespace checker. Warnings test. Additional tests via comments.
Contribution Agreement
For non-Argonne authors, check contribution agreement.
If necessary, request an explicit comment from your companies PR approval manager.

raffenet · 2023-02-27T20:13:20Z

test:mpich/ch4/ofi

raffenet · 2023-02-27T22:33:44Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu/ofi

src/mpid/ch4/netmod/ofi/ofi_am_impl.h

raffenet · 2023-02-28T16:04:39Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu/ofi

raffenet · 2023-02-28T16:05:33Z

TODO: move CVAR for tmp buf registration into mpir_gpu.h utils. Will add patch and then this PR should be good.

raffenet · 2023-02-28T16:20:36Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu/ofi

raffenet · 2023-02-28T16:27:44Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu

hzhou · 2023-03-01T18:13:18Z

@raffenet Can we do a gpu test with the CVAR disabling the host registration?

raffenet · 2023-03-01T20:06:36Z

@raffenet Can we do a gpu test with the CVAR disabling the host registration?

Actually the CVAR disables registration by default. I can add a dummy commit to re-enable registration and re-run, if desired.

src/include/mpir_gpu.h

hzhou · 2023-03-01T21:13:08Z

@raffenet Can we do a gpu test with the CVAR disabling the host registration?

Actually the CVAR disables registration by default. I can add a dummy commit to re-enable registration and re-run, if desired.

I see. I was hoping some of the GPU testing failures can be addressed by not registering the buffer. But a bummer.

Can you confirm that we fixed the GPU memory issue? Since we do that lazy register, I think we can leave the CVAR default on, right?

raffenet · 2023-03-02T14:57:29Z

@raffenet Can we do a gpu test with the CVAR disabling the host registration?

Actually the CVAR disables registration by default. I can add a dummy commit to re-enable registration and re-run, if desired.

I see. I was hoping some of the GPU testing failures can be addressed by not registering the buffer. But a bummer.

Can you confirm that we fixed the GPU memory issue? Since we do that lazy register, I think we can leave the CVAR default on, right?

Yes, I think we can default it to on. I'll double check a hello world program and confirm we don't take up any resources.

Avoid consuming GPU resources during initialization by using regular malloc for FI_MULTI_RECV buffers. It may be possible to register the buffers later if we detect they are being used to copy data to the GPU.

Rather than allocate a bunch of buffers, just use one big one with offsets.

MPIR_gpu_register_host is used to register buffers on the host with the GPU. Use a single CVAR to control buffer registration instead of scattering in various parts of the code.

raffenet · 2023-03-02T15:00:24Z

test:mpich/ch4/ofi
test:mpich/ch4/gpu

raffenet · 2023-03-02T15:10:16Z

nvidia-smi output from a program held in wait loop after MPI_Init. Looks like we are no longer creating any resources.

[raffenet@pmrs-gpu-240-01]~% nvidia-smi
Thu Mar  2 09:09:08 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 4000     Off  | 00000000:5E:00.0 Off |                  N/A |
| 30%   32C    P8    10W / 125W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 4000     Off  | 00000000:D8:00.0 Off |                  N/A |
| 30%   37C    P8     8W / 125W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

raffenet · 2023-03-02T19:31:57Z

Test results are consistent with registration turned back on. Outstanding question is, do we want to have an additional switch in ch4/ofi to disable registration of FI_MULTI_RECV buffers in case, say, the provider is already handling it?

hzhou · 2023-03-02T20:01:34Z

Test results are consistent with registration turned back on. Outstanding question is, do we want to have an additional switch in ch4/ofi to disable registration of FI_MULTI_RECV buffers in case, say, the provider is already handling it?

I see. You mean when provider will register the multi-recv buffer? I tend to believe that CUDA or any GPU runtime will cache the address and additional registration should be no-op. In any case, I would suggest that let's not worry about such case until they become a fact, and make decision then.

hzhou

LGTM

raffenet · 2023-03-02T20:04:55Z

Test results are consistent with registration turned back on. Outstanding question is, do we want to have an additional switch in ch4/ofi to disable registration of FI_MULTI_RECV buffers in case, say, the provider is already handling it?

I see. You mean when provider will register the multi-recv buffer? I tend to believe that CUDA or any GPU runtime will cache the address and additional registration should be no-op. In any case, I would suggest that let's not worry about such case until they become a fact, and make decision then.

Works for me.

raffenet added the WIP label Feb 27, 2023

raffenet force-pushed the ofi-am-bufs branch from 6dcc9b5 to 7c2f833 Compare February 27, 2023 18:15

hzhou reviewed Feb 28, 2023

View reviewed changes

src/mpid/ch4/netmod/ofi/ofi_am_impl.h Outdated Show resolved Hide resolved

raffenet force-pushed the ofi-am-bufs branch from 2075cd2 to f848889 Compare February 28, 2023 16:02

raffenet changed the title ~~ch4/ofi: Use regular malloc of FI_MULTI_RECV buffers~~ ch4/ofi: Lazily register FI_MULTI_RECV buffers Feb 28, 2023

raffenet force-pushed the ofi-am-bufs branch from d9d68ff to a387ea6 Compare February 28, 2023 16:27

raffenet removed the WIP label Mar 1, 2023

raffenet requested a review from hzhou March 1, 2023 16:57

hzhou reviewed Mar 1, 2023

View reviewed changes

src/include/mpir_gpu.h Outdated Show resolved Hide resolved

raffenet added 4 commits March 2, 2023 08:59

ch4/ofi: Use regular malloc of FI_MULTI_RECV buffers

5a26793

Avoid consuming GPU resources during initialization by using regular malloc for FI_MULTI_RECV buffers. It may be possible to register the buffers later if we detect they are being used to copy data to the GPU.

ch4/ofi: Use a single buffer for FI_MULTI_RECV messages

b168465

Rather than allocate a bunch of buffers, just use one big one with offsets.

ch4/ofi: Register FI_MULTI_RECV buffers at first GPU communication

fa0dd4c

gpu: Add cvar to control gpu buffer registration

48b0181

MPIR_gpu_register_host is used to register buffers on the host with the GPU. Use a single CVAR to control buffer registration instead of scattering in various parts of the code.

raffenet force-pushed the ofi-am-bufs branch from a387ea6 to 48b0181 Compare March 2, 2023 15:00

hzhou approved these changes Mar 2, 2023

View reviewed changes

raffenet merged commit 86660ae into pmodels:main Mar 2, 2023

raffenet deleted the ofi-am-bufs branch March 2, 2023 20:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch4/ofi: Lazily register FI_MULTI_RECV buffers #6422

ch4/ofi: Lazily register FI_MULTI_RECV buffers #6422

raffenet commented Feb 27, 2023 •

edited

Loading

raffenet commented Feb 27, 2023

raffenet commented Feb 27, 2023

raffenet commented Feb 28, 2023

raffenet commented Feb 28, 2023

raffenet commented Feb 28, 2023

raffenet commented Feb 28, 2023

hzhou commented Mar 1, 2023

raffenet commented Mar 1, 2023

hzhou commented Mar 1, 2023

raffenet commented Mar 2, 2023

raffenet commented Mar 2, 2023

raffenet commented Mar 2, 2023

raffenet commented Mar 2, 2023

hzhou commented Mar 2, 2023

hzhou left a comment

raffenet commented Mar 2, 2023

ch4/ofi: Lazily register FI_MULTI_RECV buffers #6422

ch4/ofi: Lazily register FI_MULTI_RECV buffers #6422

Conversation

raffenet commented Feb 27, 2023 • edited Loading

Pull Request Description

Author Checklist

raffenet commented Feb 27, 2023

raffenet commented Feb 27, 2023

raffenet commented Feb 28, 2023

raffenet commented Feb 28, 2023

raffenet commented Feb 28, 2023

raffenet commented Feb 28, 2023

hzhou commented Mar 1, 2023

raffenet commented Mar 1, 2023

hzhou commented Mar 1, 2023

raffenet commented Mar 2, 2023

raffenet commented Mar 2, 2023

raffenet commented Mar 2, 2023

raffenet commented Mar 2, 2023

hzhou commented Mar 2, 2023

hzhou left a comment

Choose a reason for hiding this comment

raffenet commented Mar 2, 2023

raffenet commented Feb 27, 2023 •

edited

Loading