Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocblas_hgemm gives incorrect results #340

Closed
deven-amd opened this issue Aug 23, 2018 · 9 comments
Closed

rocblas_hgemm gives incorrect results #340

deven-amd opened this issue Aug 23, 2018 · 9 comments
Assignees

Comments

@deven-amd
Copy link

How to reproduce

  • unzip the Makefile.gz and main.cpp.gz into an empty folder
  • Run make float to run the testcase with float datatype
  • Run make half to run the testcase with half datatype

The testcase does a simple matmul of two 3x3 matrices
The testcase can be run for the

  • float datatype (rocblas_sgemm is called) (gives correct answer)
  • half datatype (rocblas_hgemm is called) (given incorrect answer)

What is the expected behavior

make float results are the expected/correct ones
A and B are the input matrices, C is the output

root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make float
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 3, 3, 3, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, ]
B : [9, 8, 7, 6, 5, 4, 3, 2, 1, ]
C : [90, 114, 138, 54, 69, 84, 18, 24, 30, ]

What actually happens

make half results are the incorrect ones

Notice the values in the third column of C are incorrect!

root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make half
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp -DTEST_HALF
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 3, 3, 3, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, ]
B : [9, 8, 7, 6, 5, 4, 3, 2, 1, ]
C : [90, 114, 114, 54, 69, 69, 18, 24, 24, ]

Makefile.gz
main.cpp.gz

@amcamd
Copy link
Contributor

amcamd commented Aug 23, 2018

Can you please check the version of the rocblas library that you are using. I did this for the executable rocblas-bench using ldd below:

> ldd ./rocblas-bench | grep -i rocblas
        librocblas.so.0 => /home/achapman/repos/ROCmSoftwarePlatform/master/rocBLAS/build/release/library/src/librocblas.so.0 (0x00007f20d8da6000)

> ls -l  /home/achapman/repos/ROCmSoftwarePlatform/master/rocBLAS/build/release/library/src/librocblas.so.0
lrwxrwxrwx 1 achapman achapman 22 Aug 23 13:48 /home/achapman/repos/ROCmSoftwarePlatform/master/rocBLAS/build/release/library/src/librocblas.so.0 -> librocblas.so.0.14.1.1 

In this case, the version is 14.1.1

The reason I ask is because I have just tested using the command below and the rocblas-bench test for the sizes in your bug report is passing. Note, the rocblas-bench test may be incorrect, and there may be an error.

> ./rocblas-bench -f gemm -r h -m 3 -n 3 -k 3 --lda 3 --ldb 3 --ldc 3 -v 1
Query device success: there are 1 devices
Device ID 0 : Device 6863 ------------------------------------------------------
with 17.2 GB memory, clock rate 1600MHz @ computing capability 3.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------
transA,transB,M,N,K,alpha,lda,ldb,beta,ldc,rocblas-Gflops,us,CPU-Gflops,us,norm-error
N,N,3,3,3,15360,3,3,0,3,0.00284211,19,0.00771429,7,0

You could also run the test with rocblas-bench with the 4 commands below (the command ./install.sh will take minutes to complete):

> git clone -b master https://github.com/ROCmSoftwarePlatform/rocBLAS.git
> cd rocBLAS
> ./install.sh -c
> build/release/clients/staging/rocblas-bench -f gemm -r h -m 3 -n 3 -k 3 --lda 3 --ldb 3 --ldc 3 -v 1

for more information on the command rocblas-bench, run

> /build/release/clients/staging/rocblas-bench --help

@deven-amd
Copy link
Author

The rocblas version I am using is

root@deven-dev01:/root/tensorflow# dpkg -l | grep rocblas
ii  rocblas                                0.14.1.0                              amd64        Radeon Open Compute BLAS library

@deven-amd
Copy link
Author

tried to follow the steps you provided to clone the repo and run rocblas-bench, but ran into the following error during the install step (step #3)

CMake Error at clients/benchmarks/CMakeLists.txt:25 (find_package):
  Could not find a package configuration file provided by "cblas" with any of
  the following names:

    cblasConfig.cmake
    cblas-config.cmake

  Add the installation prefix of "cblas" to CMAKE_PREFIX_PATH or set
  "cblas_DIR" to a directory containing one of the above files.  If "cblas"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!

@deven-amd
Copy link
Author

attempting to do ./install.sh -dc runs into a different error

Reading package lists... Done                     
E: The repository 'http://172.27.226.104/artifactory/list/rocm-dkms-release-1.8-ubuntu xenial Release' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

@amcamd
Copy link
Contributor

amcamd commented Aug 23, 2018

I omitted the -d flag for the command install.sh, can you try again with

> rm -rf build
> ./install.sh -dc

The d flag should install the cblas dependency.

@amcamd
Copy link
Contributor

amcamd commented Aug 23, 2018

Please install the latest version of rocBLAS. You can either install it on your machine with:

> git clone -b master https://github.com/ROCmSoftwarePlatform/rocBLAS.git
> cd rocBLAS
> ./install.sh -ic

or install from the Debian file rocblas-0.14.1.101-Linux.deb at https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/v14.1.1

@deven-amd
Copy link
Author

finally got the install script to work, and the new version does give correct results.

thanks

@deven-amd deven-amd reopened this Aug 23, 2018
@deven-amd
Copy link
Author

Tried running the full TF unittest with the new rocblas version, and ran into other errors.
There are many subtests that fail, they all seem related.

two of the failures can be reproduced as follows:
(same symptom as before, values in the last column of the output C are incorrect for half)

  1. (in main.cpp) change the defines for (DIM1, DIM2, DIM3) from (3,3,3) to (3,1,5)
root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make float
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 1, 5, 3, 5, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ]
B : [5, 4, 3, 2, 1, ]
C : [75, 90, 105, ]
root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make half
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp -DTEST_HALF
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 1, 5, 3, 5, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ]
B : [5, 4, 3, 2, 1, ]
C : [75, 90, 90, ]
  1. (in main.cpp) change the defines for (DIM1, DIM2, DIM3) from (3,3,3) to (3,3,1)
root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make float
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 1, 3, 1, 3
A : [1, 2, 3, ]
B : [3, 2, 1, ]
C : [3, 6, 9, 2, 4, 6, 1, 2, 3, ]
root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make half
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp -DTEST_HALF
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 1, 3, 1, 3
A : [1, 2, 3, ]
B : [3, 2, 1, ]
C : [3, 6, 6, 2, 4, 4, 1, 2, 2, ]

@bragadeesh
Copy link
Contributor

closing for now; please reopen if issue still present with latest versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants