-
-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dgemm bug for certain matrix dimensions where Sgemm works OK #51
Comments
Curious! SGEMM and DGEMM follow the same code path, the only differ through a template parameter in C++ and some defines in the OpenCL kernel. Have you run the included CLBlast tests? Do they pass for GEMM? |
Are you sure you copy back the full matrix C? Perhaps you are seeing old results: the double-precision version requires twice as wide CPU-GPU copies. I have just modified the
|
I am pretty sure that there is no copying issue, since I use it through the library that takes care of that, which works for most examples, and also with the custom opencl blas engine I already had. Of course, I cannot be completely sure, so I'll start from the example you've provided. How do I compile the code from the samples? Is there some hook in (c)make, or these are standalone that I should compile with vanilla gcc? And, can you please share what would be the parameters to that call? I have already installed libclblast, and it is visible globally on my system. I didn't run the tests, since I tried to avoid a pretty messy clblas build and installation. Maybe it is not THAT complicated, but it looks like it would be from the docs. That's the last resort :) |
If you want to use the CMake infrastructure, you can just set
About the tests: they now also work without clBLAS, they can be linked against a normal CPU BLAS installation as well. |
Yes, I can confirm that it works well on my machine, too, when called from C. Thanks. |
@CNugteren Cross-post from JOCLBlast. The problem also appears in CLBlast Your example works... on card 0 (AMD R9 270X), but on card 1 (AMD R9 290X) the problem is the same as when I called it from Clojure. It is not due to the defective card, because my third card (also 290X) has the same error, and my custom dgemm kernels (OpenCL 2.0) compute the result right. I tried Cedric's example, and it also has the same error on cards 1 and 2, while computes well on card 0 (which is the cheaper one)! So, the problem is either some subtle bug in CLBlast or some problem in AMD's Hawaii (but I doubt since the kernels that I wrote following Cedric's blog work well). Here is the output of your example: CL_DEVICE_NAME: Hawaii
A:
11.0 12.0 13.0 14.0 15.0
21.0 22.0 23.0 24.0 25.0
31.0 32.0 33.0 34.0 35.0
41.0 42.0 43.0 44.0 45.0
B:
11.0 12.0 13.0
21.0 22.0 23.0
31.0 32.0 33.0
41.0 42.0 43.0
51.0 52.0 53.0
C:
11.0 12.0 13.0
21.0 22.0 23.0
31.0 32.0 33.0
41.0 42.0 43.0
Result of C = 10.0 * A * B + 20.0 * C:
21370.0 240.0 260.0
37070.0 440.0 460.0
620.0 640.0 660.0
820.0 840.0 860.0 You'll notice that it computes beta * C, but leaves out alpha * AB Cedric's example from C: ./clblast_sample_sgemm
1 2
3 4
5 6
1 2 3
4 5 6
1 2
3 4
Completed SGEMM with status 0
47.000
6.000
107.000
12.000 |
Thanks for all the info! Unfortunately I can't reproduce it myself, so you'll have to help me debug it. A couple of possible sources for the bug:
To get some more insight, I would ask you to run the GEMM tests included with CLBlast. They will show in which cases it works and in which it doesn't (e.g. requiring a pre-processing transpose matrix kernel or not). Afterwards, you can play around with the values in the CLBlast database, for example by changing |
A couple days ago I have already reported problems with Hawaii tuning, but I didn't get any reports on which combinations fail. I only got segmentation faults. I'll try editing the database and will report the results. |
Replacing the tuning parameters for Dgemm in xgemm.h solved the problem: As for the performance, you can see some measurements for sgemm (they are excellent and poor at the same time :) in #53, and this is not related to this problem. |
I could update the parameters, but that doesn't satisfy me. If the tuner produced parameters that lead to incorrect results, then something is wrong somewhere. Unfortunately I don't have access to such a device myself, so could you perhaps run the CLBlast GEMM tests with the old parameters and see which of the tests fail? Some should pass at least: the tuner didn't notice any issues. And then I am also interested to see whether just changing the |
OK. The tests build with the newest development branch. I'll send the test log with the current settings this evening, and will try to experiment with SA and SB, and to tune the library again and test that tomorrow evening. |
Hawaii and Pitcairn with the current development database: |
In the previous results, Hawaii Dgemm parameters were not the official ones, but the ones that I copied from Sgemm. Your hunch about parameters was all right: changing SA and SB to 0 gives correct results in the test case that originally failed for me. Here are CLBlast test results with official Hawaii parameters with changed SA=0 and SB=0 (hawaii0.test) |
I cannot see any SGEMM or DGEMM failures in any of the tests you uploaded. Or am I missing something? I do see some failures in other routines: that's not right! Have you noticed them as well? I now re-ran the tests on the AMD R9 M370X GPU, and it shows failures as well. I'll investigate them. |
I've fixed some tests that failed previously on my machine, I believe they should fix some of the errors you saw as well. Some failures fail on AMD GPUs because of specific GEMM parameters, e.g. SYRK and SYR2K. This seems to be related to your issue and remains still unsolved. |
I've fixed a bug related to events not properly set in the GEMM routine in some cases, which had as result that |
I tested with the development branch and now it works OK with my basic tests, with official dgemm Hawaii settings. I couldn't compile the CLBlast tests, though. I got Error 2. Thank you very much for fixing this so quickly! :) |
Good to hear that correctness is fixed again! I'll close this issue. Could you perhaps open a separate issue and post the compilation error you get when compiling the tests? |
My matrix tests revealed a very, very subtle issue with Dgemm:
Most of my tests work, but this one passes for floats and falls for doubles:
A: 2x3 column matrix ((1 2) (3 4) (5 6))
B: 3x2 column ((1 2 3) (4 5 6))
C: 2x2 column ((1 2) (3 4))
xgemm 2.0 * AB + 3.0 * C should update C to: ((47 62) (107 140))
and Sgemm does!
but Dgemm gives ((47 62) (9 12))
I would have sent you a test case in code, but I am doing this in Clojure, and the code is not more complex than the example I've given here.
I guess that the problem is that you do some pre-transformations and paddings for some matrices, so in some cases, such as this one, some operations are done in "wrong" places in memory.
Or maybe I am not using the API well, but I doubt that, given that all other tests work well, and, what especially puzzles me, that Sgemm works all right in the same example!
The text was updated successfully, but these errors were encountered: