Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

Closed
hashJoe opened this issue May 2, 2024 · 5 comments · Fixed by #20549
Closed

[Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions #20544

hashJoe opened this issue May 2, 2024 · 5 comments · Fixed by #20549
Labels
api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider

Comments

@hashJoe
Copy link

hashJoe commented May 2, 2024

Describe the issue

I am sharing below my OrtCUDAProviderOptions, which I use to set the gpu device to use for computation on a server with multiple GPUs.

When setting the deviceId, I encounter buggy memory allocations.

  1. For example, setting the following OrtCUDAProviderOptions:
        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);

results in:

GPU0: 1545MiB / 24576MiB
GPU6: 3MiB / 24576MiB

It discards deviceId being set to 6 and takes 0.

  1. Where as commenting out cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); to become:
        OrtCUDAProviderOptions cudaOptions =new OrtCUDAProviderOptions(6);
//        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");
        options.addCUDA(cudaOptions);

results in:

GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB

where it selected the correct gpu, but resulted in 545MiB being allocated on device 0 without utilizing the device.

  1. Finally, keeping cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); but adding cudaOptions.add("device_id", String.valueOf(6)); for selecting the device instead of directly specifying it in the constructor as in:
        OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
        cudaOptions.add("cudnn_conv_algo_search", "DEFAULT");      // this has no effect anymore! The flag is not considered
        cudaOptions.add("device_id", String.valueOf(6));
        options.addCUDA(cudaOptions);

results in:

GPU0: 545MiB / 24576MiB (0% Util)
GPU6: 1545MiB / 24576MiB

which is the same as example 2.

To confirm whether cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); is being executed or ignored in example 3, I did some experiments and turned out it is not considered anymore and is neglected/shadowed out by cudaOptions.add("device_id", String.valueOf(6)); added afterwards.

There are two problems here:

  1. Using cudaOptions.add("cudnn_conv_algo_search", "DEFAULT"); results in selecting the wrong device, in this case device 0 all the time.
  2. There is an initial memory being allocated on device 0, even though the correct deviceId has been utilized for the computation.

As a workaround, I am exporting only one visible cuda device to avoid this problem.

To reproduce

Unfortunately model cannot be provided, but can write a toy example+model and supply if needed.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.4 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

v1.17.3

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

cuda: 11.2, cudnn: 8.1.1

Model File

No response

Is this a quantized model?

No

@github-actions github-actions bot added api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider labels May 2, 2024
@hashJoe hashJoe changed the title [Bug] OrtCUDAProviderOptions allocates memory on different devices [Bug] Memory Allocation Anomaly Across Devices in OrtCUDAProviderOptions May 2, 2024
@Craigacp
Copy link
Contributor

Craigacp commented May 2, 2024

Ok, I think I understand what's going on there. I had expected the C API's UpdateCUDAProviderOptions function to be something I could use to append options to a CUDA options struct, but it looks like what it actually does is delete the old one and only set the options specified in the update call (https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cuda/cuda_provider_factory.cc#L236). The Java code calls UpdateCUDAProviderOptions each time the add method is called, so the new option overwrites all the old ones. That's pretty annoying, but I can fix it on the Java side. I'll put a fix together next week.

@hashJoe
Copy link
Author

hashJoe commented May 2, 2024

Thanks for the clarification!
Regardingthe 2nd problem, does the call to UpdateCUDAProviderOptions allocate some memory on device 0 (maybe for initialization) before setting main device to 6 and proceed with main computation? And is it possible to pass multiple options at once to avoid calling the add method several times?

@Craigacp
Copy link
Contributor

Craigacp commented May 2, 2024

It is possible in the native code to pass multiple options at once, but not how I've written the Java binding to that native code. The Java object tracks all the options that are set, so I need to modify the SessionOptions.addCUDA call to call a new method on OrtCUDAProviderOptions which calls update once with the aggregated parameters before it's passed in.

WRT the memory allocation on GPU zero, that might be an artifact of how CUDA & ORT works, I think the primary GPU tends to end up with some driver & code related stuff in general, but someone with more CUDA expertise might be able to help there.

@hashJoe
Copy link
Author

hashJoe commented May 3, 2024

I did additional tests regarding the memory allocation problem on gpu 0, where I simply inserted the following code using cuda api before calling anything that is ORT related:

    int status = cudart.cudaSetDevice(6);      // set cuda device
    checkCuda(cudart.CUDA_SUCCESS, status, "cudaSetDevice");      // check for exception

and ORT code still allocates on deviceId=6:

OrtCUDAProviderOptions cudaOptions = new OrtCUDAProviderOptions();
cudaOptions.add("device_id", String.valueOf(6));
options.addCUDA(cudaOptions);

and the problem above is solved, where nothing is allocated on gpu 0 anymore. However, the same memory size is still allocated on gpu 6 --> 1545MiB. I would have expected that the memory now should be aggregated with that which has been allocated on gpu 0 to become maybe smthn like ~2000MiB if it is to be some cuda driver related stuff.

In other words, I assume there is also a bug where ORT starts allocating on gpu 0 for main computation before detecting the deviceId set by the user?

@hashJoe
Copy link
Author

hashJoe commented May 3, 2024

Additionally, without the cuda code:

    int status = cudart.cudaSetDevice(6);      // set cuda device
    checkCuda(cudart.CUDA_SUCCESS, status, "cudaSetDevice");      // check for exception

When gpu 0 is free, 545MiB is being allocated on it, however, when it is allocated by another process and not much space is left, a portion of that memory is allocated instead ~100MiB

jywu-msft pushed a commit that referenced this issue May 5, 2024
### Description
I misunderstood how UpdateCUDAProviderOptions and
UpdateTensorRTProviderOptions work in the C API, I had assumed that they
updated the options struct, however they re-initialize the struct to the
defaults then only apply the values in the update. I've rewritten the
Java bindings for those classes so that they aggregate all the updates
and apply them in one go. I also updated the C API documentation to note
that these classes have this behaviour. I've not checked if any of the
other providers with an options struct have this behaviour, we only
expose CUDA and TensorRT's options in Java.

There's a small unrelated update to add a private constructor to the
Fp16Conversions classes to remove a documentation warning (they
shouldn't be instantiated anyway as they are utility classes containing
static methods).

### Motivation and Context
Fixes #20544.
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this issue May 7, 2024
### Description
I misunderstood how UpdateCUDAProviderOptions and
UpdateTensorRTProviderOptions work in the C API, I had assumed that they
updated the options struct, however they re-initialize the struct to the
defaults then only apply the values in the update. I've rewritten the
Java bindings for those classes so that they aggregate all the updates
and apply them in one go. I also updated the C API documentation to note
that these classes have this behaviour. I've not checked if any of the
other providers with an options struct have this behaviour, we only
expose CUDA and TensorRT's options in Java.

There's a small unrelated update to add a private constructor to the
Fp16Conversions classes to remove a documentation warning (they
shouldn't be instantiated anyway as they are utility classes containing
static methods).

### Motivation and Context
Fixes microsoft#20544.
yihonglyu pushed a commit that referenced this issue May 9, 2024
### Description
I misunderstood how UpdateCUDAProviderOptions and
UpdateTensorRTProviderOptions work in the C API, I had assumed that they
updated the options struct, however they re-initialize the struct to the
defaults then only apply the values in the update. I've rewritten the
Java bindings for those classes so that they aggregate all the updates
and apply them in one go. I also updated the C API documentation to note
that these classes have this behaviour. I've not checked if any of the
other providers with an options struct have this behaviour, we only
expose CUDA and TensorRT's options in Java.

There's a small unrelated update to add a private constructor to the
Fp16Conversions classes to remove a documentation warning (they
shouldn't be instantiated anyway as they are utility classes containing
static methods).

### Motivation and Context
Fixes #20544.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api:Java issues related to the Java API ep:CUDA issues related to the CUDA execution provider
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants