[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

hashJoe · 2024-05-02T14:03:09Z

Describe the issue

I am sharing below my OrtCUDAProviderOptions, which I use to set the gpu device to use for computation on a server with multiple GPUs.

When setting the deviceId, I encounter buggy memory allocations.

For example, setting the following OrtCUDAProviderOptions:

        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);

results in:

GPU0: 1545MiB / 24576MiB
GPU6: 3MiB / 24576MiB

It discards deviceId being set to 6 and takes 0.

Where as commenting out cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); to become:

        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
//        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);

results in:

GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB

where it selected the correct gpu, but resulted in 545MiB being allocated on device 0 without utilizing the device.

Finally, keeping cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); but adding cudaOptions.add("device_id", String.valueOf(6)); for selecting the device instead of directly specifying it in the constructor as in:

        OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");      // this has no effect anymore! The flag is not considered
        cudaOptions.add("device_id", String.valueOf(6));
        options.addCUDA(cudaOptions);

results in:

GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB

which is the same as example 2.

To confirm whether cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); is being executed or ignored in example 3, I did some experiments and turned out it is not considered anymore and is neglected/shadowed out by cudaOptions.add("device_id", String.valueOf(6)); added afterwards.

There are two problems here:

Using cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); results in selecting the wrong device, in this case device 0 all the time.
There is an initial memory being allocated on device 0, even though the correct deviceId has been utilized for the computation.

As a workaround, I am exporting only one visible cuda device to avoid this problem.

To reproduce

Unfortunately model cannot be provided, but can write a toy example+model and supply if needed.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.4 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

v1.17.3

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

cuda: 11.2, cudnn: 8.1.1

Model File

No response

Is this a quantized model?

No

The text was updated successfully, but these errors were encountered:

Craigacp · 2024-05-02T17:35:31Z

Ok, I think I understand what's going on there. I had expected the C API's UpdateCUDAProviderOptions function to be something I could use to append options to a CUDA options struct, but it looks like what it actually does is delete the old one and only set the options specified in the update call (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_provider_factory.cc#L236). The Java code calls UpdateCUDAProviderOptions each time the add method is called, so the new option overwrites all the old ones. That's pretty annoying, but I can fix it on the Java side. I'll put a fix together next week.

hashJoe · 2024-05-02T20:44:19Z

Thanks for the clarification!
Regardingthe 2nd problem, does the call to UpdateCUDAProviderOptions allocate some memory on device 0 (maybe for initialization) before setting main device to 6 and proceed with main computation? And is it possible to pass multiple options at once to avoid calling the add method several times?

Craigacp · 2024-05-02T21:23:27Z

It is possible in the native code to pass multiple options at once, but not how I've written the Java binding to that native code. The Java object tracks all the options that are set, so I need to modify the SessionOptions.addCUDA call to call a new method on OrtCUDAProviderOptions which calls update once with the aggregated parameters before it's passed in.

WRT the memory allocation on GPU zero, that might be an artifact of how CUDA & ORT works, I think the primary GPU tends to end up with some driver & code related stuff in general, but someone with more CUDA expertise might be able to help there.

hashJoe · 2024-05-03T07:06:35Z

I did additional tests regarding the memory allocation problem on gpu 0, where I simply inserted the following code using cuda api before calling anything that is ORT related:

    int status = cudart.cudaSetDevice(6);      // set cuda device
    checkCuda(cudart.CUDA_SUCCESS, status, "cudaSetDevice");      // check for exception

and ORT code still allocates on deviceId=6:

OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
cudaOptions.add("device_id", String.valueOf(6));
options.addCUDA(cudaOptions);

and the problem above is solved, where nothing is allocated on gpu 0 anymore. However, the same memory size is still allocated on gpu 6 --> 1545MiB. I would have expected that the memory now should be aggregated with that which has been allocated on gpu 0 to become maybe smthn like ~2000MiB if it is to be some cuda driver related stuff.

In other words, I assume there is also a bug where ORT starts allocating on gpu 0 for main computation before detecting the deviceId set by the user?

hashJoe · 2024-05-03T07:12:19Z

Additionally, without the cuda code:

    int status = cudart.cudaSetDevice(6);      // set cuda device
    checkCuda(cudart.CUDA_SUCCESS, status, "cudaSetDevice");      // check for exception

When gpu 0 is free, 545MiB is being allocated on it, however, when it is allocated by another process and not much space is left, a portion of that memory is allocated instead ~100MiB

### Description I misunderstood how UpdateCUDAProviderOptions and UpdateTensorRTProviderOptions work in the C API, I had assumed that they updated the options struct, however they re-initialize the struct to the defaults then only apply the values in the update. I've rewritten the Java bindings for those classes so that they aggregate all the updates and apply them in one go. I also updated the C API documentation to note that these classes have this behaviour. I've not checked if any of the other providers with an options struct have this behaviour, we only expose CUDA and TensorRT's options in Java. There's a small unrelated update to add a private constructor to the Fp16Conversions classes to remove a documentation warning (they shouldn't be instantiated anyway as they are utility classes containing static methods). ### Motivation and Context Fixes #20544.

### Description I misunderstood how UpdateCUDAProviderOptions and UpdateTensorRTProviderOptions work in the C API, I had assumed that they updated the options struct, however they re-initialize the struct to the defaults then only apply the values in the update. I've rewritten the Java bindings for those classes so that they aggregate all the updates and apply them in one go. I also updated the C API documentation to note that these classes have this behaviour. I've not checked if any of the other providers with an options struct have this behaviour, we only expose CUDA and TensorRT's options in Java. There's a small unrelated update to add a private constructor to the Fp16Conversions classes to remove a documentation warning (they shouldn't be instantiated anyway as they are utility classes containing static methods). ### Motivation and Context Fixes microsoft#20544.

### Description I misunderstood how UpdateCUDAProviderOptions and UpdateTensorRTProviderOptions work in the C API, I had assumed that they updated the options struct, however they re-initialize the struct to the defaults then only apply the values in the update. I've rewritten the Java bindings for those classes so that they aggregate all the updates and apply them in one go. I also updated the C API documentation to note that these classes have this behaviour. I've not checked if any of the other providers with an options struct have this behaviour, we only expose CUDA and TensorRT's options in Java. There's a small unrelated update to add a private constructor to the Fp16Conversions classes to remove a documentation warning (they shouldn't be instantiated anyway as they are utility classes containing static methods). ### Motivation and Context Fixes #20544.

github-actions bot added api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider labels May 2, 2024

hashJoe changed the title ~~[Bug] OrtCUDAProviderOptions allocates memory on different devices~~ [Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions May 2, 2024

Craigacp mentioned this issue May 3, 2024

[java] CUDA & TensorRT options fix #20549

Merged

jywu-msft closed this as completed in #20549 May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

hashJoe commented May 2, 2024 •

edited

Loading

Craigacp commented May 2, 2024 •

edited

Loading

hashJoe commented May 2, 2024 •

edited

Loading

Craigacp commented May 2, 2024

hashJoe commented May 3, 2024

hashJoe commented May 3, 2024

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

Comments

hashJoe commented May 2, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Craigacp commented May 2, 2024 • edited Loading

hashJoe commented May 2, 2024 • edited Loading

Craigacp commented May 2, 2024

hashJoe commented May 3, 2024

hashJoe commented May 3, 2024

hashJoe commented May 2, 2024 •

edited

Loading

Craigacp commented May 2, 2024 •

edited

Loading

hashJoe commented May 2, 2024 •

edited

Loading