-
Notifications
You must be signed in to change notification settings - Fork 3
cublas tests fail #2
Comments
I've narrowed down this issue a bit. The dot and nrm2 operation seem to be the core of the problem. |
Awesome! I would love to know what is going wrong there exactly :) Note that @hobofan suspected that the memory allocated does not suffice some constraints required - I did not verify this yet. |
Can you assign me here? |
@Anton-4 right now it seems I can only assign team members, so I am going to do some magic to make this work and add you to a subteam named @spearow/contributor - as soon as you accept that I can assign issues to you :) I am aware that this is not easy, I'd recommend to start with the API docs NVIDIA provides, maybe unknown is not exactly correct and only there because not all error codes are differentiated properly. Memory alignment can be checked with |
Some updates:
Getting a little bit closer every day :) |
Did you try This sounds awfully similar to coaster-nn/#11 which seems to points even more towards coaster ... specifically Thanks for taking care of this. Much appreciated! |
relevant output for
0x705c40000 points to an assembly instruction ( |
updates:
|
|
|
Using Fedora
So this file is present on my test machine, yet this happens too. Did you figure out a way to map the instruction back to a rust function? If there are only a few suspects, I am currently tied to a GTX 460 but I can test later this week on a GTX 1050 if it makes any difference. |
From what I've read NO_BINARY_FOR_GPU can have many causes. |
The cublas doc also states that |
Running the dot test by itself succeeds otherwise but fails when setting
|
I've seen some cases where the CUDA_ERROR_NO_BINARY_FOR_GPU happens when no compute capability is specified and a default value is used. |
In our case there is nothing we can specify. There is no nvcc pass. We only use the compiled binary. So the next step would be to figure out which targets are compiled into cublas.
So far I did not find anything related in the cublas doc. |
The minimal compute capability for CUBLAS is 3.5 (I have 3.5), I also ran the |
Alright, that also explaines why the CI fails - since I will get rid of the |
Does this explain all cases of failing tests though? |
I think the |
It does indeed occur for every test. It's so strange that the dot product can be calculated while the error reports that there is no binary. |
Tests with the 4x
|
As expected, that's how it is with me too. |
I wasn't aware the raw coaster free version succeeded reproducable. In essence, the above findings are nice and good moving forward but are not/inderictly related. Right now I think the most sane next step is to figure out the difference in Mem allocation in the tests. This must give a hint since that is the only difference I can see right now between those tests. |
I used the nvidia profiler
I have also done the same for a working c++ binary doing the sgemm operation:
Glad I found this :) |
That's nice, I did not know this tool yet! |
I wrote some c++ wherein I call asum with the same arguments as in our test, then I ran the profiler, yielding:
I also ran only the asum test from
Looking at difference in API calls, I think |
Essentially what we should try to use |
I ran a full trace of all API calls with
for
In |
The thing is that each test runs in a fork afaik and I am not sure how the profiler handles that. Can you try using something like |
The context is to the best of my memory wrapped in a |
All other tests were already filtered out. I think I misinterpreted context creation, it is indeed only |
I think |
which when reading the cublas specification and explained in more detail. This might be unrelated, but this implies that sync needs to be called. This is what fails in
vs. rust-cublas
which works reliably. So the difference here is the Copying the memory to the host seems to be safe (I've never seen this fail). Things to try:
|
According to the docs, to synchronize the memory from/to the device, |
I've also taken a look at the API calls for rust-cublas vs coaster-blas, the only difference is rust-cublas
vs for coaster-blas:
In rust-cublas you can see the 3 |
SideNote: I am not 100% sure I understand what you are implying. The memory leaks won't explain the behaviour of a context corrupting |
I'm just happy we're getting close to the solution 😃 |
|
Not the results I hoped for, but results after all. This leaves pretty much only hope for |
I think I understood a little more about why this is failing. I.e. The
There are benchmarks using
So I guess this is the next thing to do, create ffi bindings for |
@Anton-4 did you start digging into it any further? |
No sorry, I didn't have the time. I'll see if I can find some spare time this weekend. |
I added a call to Do you know where the |
No, unfortunately I don't right this second, I will check tonight. I guess the next step would be to call the |
They are not failing all the time, sometimes the first few pass, sometimes they all pass.
The text was updated successfully, but these errors were encountered: