Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out Of Memory with BusSpeedReadback #49

Open
Frenetique opened this issue May 20, 2016 · 0 comments
Open

Out Of Memory with BusSpeedReadback #49

Frenetique opened this issue May 20, 2016 · 0 comments

Comments

@Frenetique
Copy link

Frenetique commented May 20, 2016

Hi everyone !

I am using SHOC to characterize some aspects of a board using a Tegra K1 SoC and I ran into an issue with the BusSpeedReadback benchmark with CUDA.

When executing it, the process gets killed by the kernel because of an Out Of Memory situation (OOM).
I tried to pinpoint where it actually fails and found the following line:

108: cudaMalloc((void**)&device, sizeof(float) * numMaxFloats);

The cudaMalloc apparently triggers the OOM mechanic and kills the execution.
I am new to CUDA and GPGPU in general, but as I understand it, this line allocates a block of memory on the device side (GPU) memory. The size of the block is roughly 132MB. The GPU has over 1GB of memory so I don't understand the issue.

I was able to try the same benchmark, running of a different system with about the same amount of memory (even less actually) on the GPU, but using OpenCL and got no issues.

Thanks for your enlightments.

Best regards,

Marc

EDIT: BusSpeedDownload runs fine, even though it as an almost identical code structure. All of the other benchmarks also run fine.

EDIT2: I found that OpenCL version does a check of available memory before trying to allocate any memory bloc size. This sets the maximum bloc size. This mechanic does not exist in the CUDA version, therefore allowing the code to allocate memory blocs up to 512MB. In some configurations, this is too much and provokes an OOM situation.

I also found something weird about the CUDA version of BusDownloadSpeed values reported.
For a block size of 4096kB, kernel time is: 1.67431ms (mean value).
So bandwidth should be 2.5GB/s. But SHOC reports a mean value of 3GB/s for this data size.

I observed the same with the CUDA version of BusReadBackSpeed benchmark.
OpenCL version gives coherant values.

Did I miss something ?

Best,

Marc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant