[cuSOLVER] `trd` tests are failed with latest oneMKL and LLVM #231

yhmtsai · 2022-09-26T18:08:13Z

Summary

With our docker image ginkgohub/oneapi:cuda11.6, 16 *trd* test in cuSOLVER are failed on NVIDIA devices (I try TitanX and A100)

Environment

HW you use: A100 and TitanX
Backend library version: cuda 11.6.2
oneMKL version: 9abb8cf
Compiler version: e1794b668cd2e539672f500fcb0e6bcbc9766077

Steps to reproduce

use nvidia docker runner to run the image from the above summary.
cd /var/tmp/oneMKL
mkdir build_cusolver && cd build_cusolver

cmake -DCMAKE_CXX_COMPILER=clang++ -DENABLE_MKLCPU_BACKEND=OFF -DENABLE_MKLGPU_BACKEND=OFF \
-DENABLE_NETLIB_BACKEND=OFF -DENABLE_CUBLAS_BACKEND=OFF -DENABLE_CUSOLVER_BACKEND=ON \
-DENABLE_CURAND_BACKEND=OFF -DBUILD_FUNCTIONAL_TESTS=ON \
-DREF_LAPACK_ROOT=/var/tmp/lapack-release/build ..

make -j24
ctest -R trd --output-on-failure

Observed behavior

16 tests are failure

The following tests FAILED:
	177 - LAPACK/RT/Hetrd/HetrdAccuracyUsm.ComplexSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	178 - LAPACK/RT/Hetrd/HetrdAccuracyUsm.ComplexDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	179 - LAPACK/RT/Hetrd/HetrdAccuracyBuffer.ComplexSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	180 - LAPACK/RT/Hetrd/HetrdAccuracyBuffer.ComplexDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	323 - LAPACK/RT/Sytrd/SytrdAccuracyUsm.RealSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	324 - LAPACK/RT/Sytrd/SytrdAccuracyUsm.RealDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	325 - LAPACK/RT/Sytrd/SytrdAccuracyBuffer.RealSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	326 - LAPACK/RT/Sytrd/SytrdAccuracyBuffer.RealDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	575 - LAPACK/CT/Hetrd/HetrdAccuracyUsm.ComplexSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	576 - LAPACK/CT/Hetrd/HetrdAccuracyUsm.ComplexDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	577 - LAPACK/CT/Hetrd/HetrdAccuracyBuffer.ComplexSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	578 - LAPACK/CT/Hetrd/HetrdAccuracyBuffer.ComplexDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	721 - LAPACK/CT/Sytrd/SytrdAccuracyUsm.RealSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	722 - LAPACK/CT/Sytrd/SytrdAccuracyUsm.RealDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	723 - LAPACK/CT/Sytrd/SytrdAccuracyBuffer.RealSinglePrecision/NVIDIA_A100_SXM4_40GB (Failed)
	724 - LAPACK/CT/Sytrd/SytrdAccuracyBuffer.RealDoublePrecision/NVIDIA_A100_SXM4_40GB (Failed)

the corresponding testing output:
oneMKL_cuSOLVER_A100_error.txt
oneMKL_cuSOLVER_TitanX_error.txt

Expected behavior

all tests should be passed

The text was updated successfully, but these errors were encountered:

mmeterel assigned ericlars Sep 26, 2022

sknepper mentioned this issue May 24, 2024

[tests][LAPACK] Avoid cuSOLVER bug in *trd tests #498

Merged

4 tasks

sknepper closed this as completed in #498 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuSOLVER] `trd` tests are failed with latest oneMKL and LLVM #231

[cuSOLVER] `trd` tests are failed with latest oneMKL and LLVM #231

yhmtsai commented Sep 26, 2022

[cuSOLVER] *trd* tests are failed with latest oneMKL and LLVM #231

[cuSOLVER] *trd* tests are failed with latest oneMKL and LLVM #231

Comments

yhmtsai commented Sep 26, 2022

Summary

Environment

Steps to reproduce

Observed behavior

Expected behavior

[cuSOLVER] `trd` tests are failed with latest oneMKL and LLVM #231

[cuSOLVER] `trd` tests are failed with latest oneMKL and LLVM #231