Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ruby numo-linalg + clblast: OpenCL error: clCreateContext: -6 #524

Open
dpblnt opened this issue Jan 16, 2024 · 1 comment
Open

ruby numo-linalg + clblast: OpenCL error: clCreateContext: -6 #524

dpblnt opened this issue Jan 16, 2024 · 1 comment
Labels

Comments

@dpblnt
Copy link

dpblnt commented Jan 16, 2024

I'm using ruby-dnn on top of numo-linalg into which I load libclblast.so hoping it is an openblas drop in replacement.

# clinfo -l
Platform #0: NVIDIA CUDA
 +-- Device #0: NVIDIA GeForce GTX 770
 `-- Device #1: NVIDIA GeForce GT 730

sci-libs/clblast-1.5.2-r1 built with -cuda +opencl

Running a basic XOR example in ruby-dnn on my first card, which has 2G memory, of which 1G is already used.

CUDA_VISIBLE_DEVICES=0  ruby xor.rb 
nil
Epoch 1/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.2500
Epoch 2/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.0000
Epoch 3/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.0000
Epoch 4/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.0000
Epoch 5/20000
terminate called after throwing an instance of 'clblast::CLCudaAPIError'
  what():  OpenCL error: clCreateContext: -6
Aborted

Running the same script on the second device, which is unused, all 2G mem free

CUDA_VISIBLE_DEVICES=1  ruby xor.rb 
nil
Epoch 1/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.5000
Epoch 2/20000
...
Epoch 15/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.0000
Epoch 16/20000
[========================================]  4/4 loss: -0.0000, accuracy: 0.0000
Epoch 17/20000
terminate called after throwing an instance of 'clblast::CLCudaAPIError'
  what():  OpenCL error: clCreateContext: -6
Aborted

While doing this I observe the GPU memory going to 100% in nvtop, then with the abord the GPU memory gets freed.

As far as I know Numo::Linalg does not explicitly support clblast.
https://github.com/ruby-numo/numo-linalg/blob/master/doc/select-backend.md

How I have performed these tests is I simply loaded libclblast.so instead of openblas

require "numo/linalg/linalg"
Numo::Linalg::Blas.dlopen("/usr/lib64/libclblast.so") 

The data used to feed into the NN is very small, I can't yet understand what/why it eats up GBs of GPU memory.

x = SFloat[[0, 0], [1, 0], [0, 1], [1, 1]]
y = SFloat[[0], [1], [1], [0]]

Do I suspect correctly that everything works as expected until there is no more memory in the GPU? And what I only need is a freeing of GPU memory after each epoch trained?
Is there any method I could do that? Maybe from outside of Numo::Linalg?

@CNugteren
Copy link
Owner

CNugteren commented Feb 17, 2024

Indeed, CLBlast provides a Netlib BLAS API (see here), but it is not recommended for speed, since it will do a lot of memory copies, especially for level-1 and 2 routines it will typically give a slow down. It is also not widely used.

However, it does exist, and should in theory work. It also remains a mystery to me unfortunately why there is so much memory allocation going on, even when you run a very small example. I guess there might be a bug somewhere in the CLBlast BLAS API, but it is difficult to debug this way. I'm not familiar with numo-linalg unfortunately. One path forward could be to compile your own CLBlast from source, and give CMake -DVERBOSE=1. That will enable a lot of extra print statements, which will give us a log to see what CLBlast is doing when you call your numo-linalg code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants