rocblas_hgemm gives incorrect results #340

deven-amd · 2018-08-23T17:26:15Z

How to reproduce

unzip the Makefile.gz and main.cpp.gz into an empty folder
Run make float to run the testcase with float datatype
Run make half to run the testcase with half datatype

The testcase does a simple matmul of two 3x3 matrices
The testcase can be run for the

float datatype (rocblas_sgemm is called) (gives correct answer)
half datatype (rocblas_hgemm is called) (given incorrect answer)

What is the expected behavior

make float results are the expected/correct ones
A and B are the input matrices, C is the output

root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make float
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 3, 3, 3, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, ]
B : [9, 8, 7, 6, 5, 4, 3, 2, 1, ]
C : [90, 114, 138, 54, 69, 84, 18, 24, 30, ]

What actually happens

make half results are the incorrect ones

Notice the values in the third column of C are incorrect!

root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make half
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp -DTEST_HALF
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 3, 3, 3, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, ]
B : [9, 8, 7, 6, 5, 4, 3, 2, 1, ]
C : [90, 114, 114, 54, 69, 69, 18, 24, 24, ]

Makefile.gz
main.cpp.gz

The text was updated successfully, but these errors were encountered:

amcamd · 2018-08-23T18:54:25Z

Can you please check the version of the rocblas library that you are using. I did this for the executable rocblas-bench using ldd below:

> ldd ./rocblas-bench | grep -i rocblas
        librocblas.so.0 => /home/achapman/repos/ROCmSoftwarePlatform/master/rocBLAS/build/release/library/src/librocblas.so.0 (0x00007f20d8da6000)

> ls -l  /home/achapman/repos/ROCmSoftwarePlatform/master/rocBLAS/build/release/library/src/librocblas.so.0
lrwxrwxrwx 1 achapman achapman 22 Aug 23 13:48 /home/achapman/repos/ROCmSoftwarePlatform/master/rocBLAS/build/release/library/src/librocblas.so.0 -> librocblas.so.0.14.1.1

In this case, the version is 14.1.1

The reason I ask is because I have just tested using the command below and the rocblas-bench test for the sizes in your bug report is passing. Note, the rocblas-bench test may be incorrect, and there may be an error.

> ./rocblas-bench -f gemm -r h -m 3 -n 3 -k 3 --lda 3 --ldb 3 --ldc 3 -v 1
Query device success: there are 1 devices
Device ID 0 : Device 6863 ------------------------------------------------------
with 17.2 GB memory, clock rate 1600MHz @ computing capability 3.0
maxGridDimX 2147483647, sharedMemPerBlock 65.5 KB, maxThreadsPerBlock 1024, warpSize 64
-------------------------------------------------------------------------
transA,transB,M,N,K,alpha,lda,ldb,beta,ldc,rocblas-Gflops,us,CPU-Gflops,us,norm-error
N,N,3,3,3,15360,3,3,0,3,0.00284211,19,0.00771429,7,0

You could also run the test with rocblas-bench with the 4 commands below (the command ./install.sh will take minutes to complete):

> git clone -b master https://github.com/ROCmSoftwarePlatform/rocBLAS.git
> cd rocBLAS
> ./install.sh -c
> build/release/clients/staging/rocblas-bench -f gemm -r h -m 3 -n 3 -k 3 --lda 3 --ldb 3 --ldc 3 -v 1

for more information on the command rocblas-bench, run

> /build/release/clients/staging/rocblas-bench --help

deven-amd · 2018-08-23T19:05:11Z

The rocblas version I am using is

root@deven-dev01:/root/tensorflow# dpkg -l | grep rocblas
ii  rocblas                                0.14.1.0                              amd64        Radeon Open Compute BLAS library

deven-amd · 2018-08-23T19:10:01Z

tried to follow the steps you provided to clone the repo and run rocblas-bench, but ran into the following error during the install step (step #3)

CMake Error at clients/benchmarks/CMakeLists.txt:25 (find_package):
  Could not find a package configuration file provided by "cblas" with any of
  the following names:

    cblasConfig.cmake
    cblas-config.cmake

  Add the installation prefix of "cblas" to CMAKE_PREFIX_PATH or set
  "cblas_DIR" to a directory containing one of the above files.  If "cblas"
  provides a separate development package or SDK, be sure it has been
  installed.


-- Configuring incomplete, errors occurred!

deven-amd · 2018-08-23T19:16:11Z

attempting to do ./install.sh -dc runs into a different error

Reading package lists... Done                     
E: The repository 'http://172.27.226.104/artifactory/list/rocm-dkms-release-1.8-ubuntu xenial Release' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.

amcamd · 2018-08-23T19:16:37Z

I omitted the -d flag for the command install.sh, can you try again with

> rm -rf build
> ./install.sh -dc

The d flag should install the cblas dependency.

amcamd · 2018-08-23T19:28:44Z

Please install the latest version of rocBLAS. You can either install it on your machine with:

> git clone -b master https://github.com/ROCmSoftwarePlatform/rocBLAS.git
> cd rocBLAS
> ./install.sh -ic

or install from the Debian file rocblas-0.14.1.101-Linux.deb at https://github.com/ROCmSoftwarePlatform/rocBLAS/releases/tag/v14.1.1

deven-amd · 2018-08-23T19:46:47Z

finally got the install script to work, and the new version does give correct results.

thanks

deven-amd · 2018-08-23T21:02:02Z

Tried running the full TF unittest with the new rocblas version, and ran into other errors.
There are many subtests that fail, they all seem related.

two of the failures can be reproduced as follows:
(same symptom as before, values in the last column of the output C are incorrect for half)

(in main.cpp) change the defines for (DIM1, DIM2, DIM3) from (3,3,3) to (3,1,5)

root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make float
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 1, 5, 3, 5, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ]
B : [5, 4, 3, 2, 1, ]
C : [75, 90, 105, ]
root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make half
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp -DTEST_HALF
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 1, 5, 3, 5, 3
A : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ]
B : [5, 4, 3, 2, 1, ]
C : [75, 90, 90, ]

(in main.cpp) change the defines for (DIM1, DIM2, DIM3) from (3,3,3) to (3,3,1)

root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make float
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 1, 3, 1, 3
A : [1, 2, 3, ]
B : [3, 2, 1, ]
C : [3, 6, 9, 2, 4, 6, 1, 2, 3, ]
root@deven-dev01:/common/hip_examples/rocblas/gemm_fp16# make half
rm -rf ./a.out
/opt/rocm/bin/hipcc -I/opt/rocm/rocblas/include -std=c++11 -lrocblas -lhip_hcc main.cpp -DTEST_HALF
./a.out
hgemm example
NN: m, n, k, lda, ldb, ldc = 3, 3, 1, 3, 1, 3
A : [1, 2, 3, ]
B : [3, 2, 1, ]
C : [3, 6, 6, 2, 4, 4, 1, 2, 2, ]

bragadeesh · 2020-08-14T01:19:33Z

closing for now; please reopen if issue still present with latest versions

deven-amd assigned whchung and amcamd Aug 23, 2018

deven-amd closed this as completed Aug 23, 2018

deven-amd reopened this Aug 23, 2018

deven-amd mentioned this issue Aug 24, 2018

adding fp16 support for batched blas gemm ROCm/tensorflow-upstream#143

Merged

fshi98 mentioned this issue Dec 1, 2018

Performance comparsion: AMD with ROCm vs NVIDIA with cuDNN? ROCm/tensorflow-upstream#173

Open

bragadeesh closed this as completed Aug 14, 2020

mlse-lib-jenkins pushed a commit that referenced this issue Sep 30, 2020

SWDEV-253944-expand -test-logging (#340)

b390c97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rocblas_hgemm gives incorrect results #340

rocblas_hgemm gives incorrect results #340

deven-amd commented Aug 23, 2018

amcamd commented Aug 23, 2018 •

edited

Loading

deven-amd commented Aug 23, 2018

deven-amd commented Aug 23, 2018

deven-amd commented Aug 23, 2018

amcamd commented Aug 23, 2018 •

edited

Loading

amcamd commented Aug 23, 2018 •

edited

Loading

deven-amd commented Aug 23, 2018

deven-amd commented Aug 23, 2018

bragadeesh commented Aug 14, 2020

rocblas_hgemm gives incorrect results #340

rocblas_hgemm gives incorrect results #340

Comments

deven-amd commented Aug 23, 2018

How to reproduce

What is the expected behavior

What actually happens

amcamd commented Aug 23, 2018 • edited Loading

deven-amd commented Aug 23, 2018

deven-amd commented Aug 23, 2018

deven-amd commented Aug 23, 2018

amcamd commented Aug 23, 2018 • edited Loading

amcamd commented Aug 23, 2018 • edited Loading

deven-amd commented Aug 23, 2018

deven-amd commented Aug 23, 2018

bragadeesh commented Aug 14, 2020

amcamd commented Aug 23, 2018 •

edited

Loading

amcamd commented Aug 23, 2018 •

edited

Loading

amcamd commented Aug 23, 2018 •

edited

Loading