Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to CUDA 12.4.0 #9046

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Mar 6, 2024

Update CMake to version 3.28.3.

  • backport from Geant4 v11.2.0 a fix for CMake 3.27 and later

Update to CUDA 12.4.0:

  • CUDA runtime version 12.4.99
  • NVIDIA drivers version 550.54.14

See https://docs.nvidia.com/cuda/archive/12.4.0/cuda-toolkit-release-notes/index.html for the full CUDA 12.4.0 release notes and change log.

Force CUDA to consider only the major/minor version of GCC, to work around a problem in the GCC version checks inside cudafe++, where GCC 12.3.1 is not recognised as equivalent to GCC 12.3.0.

Rename the CUDA-related flags used in the spec files to avoid potential problems when adding,
removing or reordering them.

Update cuDNN to version 8.9:

See https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-897/release-notes/index.html for the release notes and change log.

Update NVIDIA gdrcopy to v2.4.1

Add support for new hardware (Grace Hopper, BlueField 3 NICs and DPUs).
Add support for recent OS (kernel, RHEL, SUSE).
Various bug fixes.

See https://github.com/NVIDIA/gdrcopy/releases/tag/v2.4 and https://github.com/NVIDIA/gdrcopy/releases/tag/v2.4.1 for the release notes and change log.

Update ONNX runtime to version 1.17.1:

Older version of ONNX runtime fail to compile with CUDA 12.4.
On the other hand, ONNX 1.17.1 requires CMAKE 3.26 or later, and cuDNN 8.9 or later.

This version requires two more workarounds:

  • -Wno-error=maybe-uninitialized is needed to a void a (hopefully false positive) warning about a potentially uninitialised variable with cuDNN 8.9 and 9.0
  • -Donnxruntime_NVCC_THREADS=0 is needed because the default is ON, causing nvcc to be called as nvcc ... --threads ON ..., which causes an error.

This version requires an update inside CMSSW, implemented in cms-sw/cmssw#44354.

Add missing thrust include in PyTorch

Backport of pytorch/pytorch#121421.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2024

A new Pull Request was created by @fwyzard for branch IB/CMSSW_14_1_X/master.

@iarspider, @smuzaffar, @aandvalenzuela can you please review it and eventually sign? Thanks.
@antoniovilela, @sextonkennedy, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

I can confirm that CUDA 12.4.0 fixes the last two bugs we reported to NVIDIA:

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

please test for CMSSW_14_1_CPP20_X

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2024

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0aeba6/37917/summary.html
COMMIT: 6819667
CMSSW: CMSSW_14_1_X_2024-03-05-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9046/37917/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

[958/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-999e041f1a53b3ff94ee65a9cc8b7a2c/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu.o
[959/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-999e041f1a53b3ff94ee65a9cc8b7a2c/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu.o
[960/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-999e041f1a53b3ff94ee65a9cc8b7a2c/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/ngram_repeat_block_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/ngram_repeat_block_impl.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/ngram_repeat_block_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/ngram_repeat_block_impl.cu.o
[961/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-999e041f1a53b3ff94ee65a9cc8b7a2c/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-fcb60177485aaddd6dfade8bf135aeb9/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu.o
ninja: build stopped: subcommand failed.
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.4U1zR7 (%build)


RPM build errors:
line 37: It's not recommended to have unversioned Obsoletes: Obsoletes: external+onnxruntime+1.14.1-fcb60177485aaddd6dfade8bf135aeb9
Macro expanded in comment on line 369: %{pkginstroot}/${PYTHON3_LIB_SITE_PACKAGES}


@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2024

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0aeba6/37918/summary.html
COMMIT: 6819667
CMSSW: CMSSW_14_1_CPP20_X_2024-03-04-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/9046/37918/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

[957/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-5756e26c053df6e5fff69be83ae27020/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/fast_gelu_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/fast_gelu_impl.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/fast_gelu_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/fast_gelu_impl.cu.o
[958/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-5756e26c053df6e5fff69be83ae27020/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_impl.cu.o
[959/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-5756e26c053df6e5fff69be83ae27020/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_attention_softmax.cu.o
[960/1148] /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/bin/nvcc -forward-unknown-to-host-compiler -DCPUINFO_SUPPORTED_PLATFORM=1 -DEIGEN_MPL2_ONLY -DEIGEN_USE_THREADS -DENABLE_CPU_FP16_TRAINING_OPS -DNSYNC_ATOMIC_CPP11 -DONNX_ML=1 -DONNX_NAMESPACE=onnx -DORT_ENABLE_STREAM -DPLATFORM_POSIX -DPROTOBUF_USE_DLLS -DUSE_CUDA=1 -DUSE_FLASH_ATTENTION=1 -Donnxruntime_providers_cuda_EXPORTS -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/include/onnxruntime/core/session -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/pytorch_cpuinfo-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/google_nsync-src/public -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/abseil_cpp-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/safeint-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/gsl-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/onnx-build -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/protobuf/3.21.9-5756e26c053df6e5fff69be83ae27020/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/flatbuffers-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cudnn/8.8.0.121-acaa18b242f7c97b443d981c456c94c4/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/cutlass-src/examples -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/eigen-src -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc12/external/cuda/12.4.0-db00bd44f20c40655446378926308f3f/include -I/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/build/_deps/mp11-src/include -cudart shared --expt-relaxed-constexpr --Werror default-stream-launch -Xcudafe "--diag_suppress=bad_friend_decl" -Xcudafe "--diag_suppress=unsigned_compare_with_zero" -Xcudafe "--diag_suppress=expr_has_no_effect" -O3 -DNDEBUG --generate-code=arch=compute_60,code=[compute_60,sm_60] --generate-code=arch=compute_70,code=[compute_70,sm_70] --generate-code=arch=compute_75,code=[compute_75,sm_75] --generate-code=arch=compute_80,code=[compute_80,sm_80] --generate-code=arch=compute_89,code=[compute_89,sm_89] -Xcompiler=-fPIC --diag-suppress 554 --compiler-options -Wall --compiler-options -Wno-deprecated-copy --compiler-options -Wno-nonnull-compare -Xcompiler -Wno-nonnull-compare --threads "" -Xcompiler -Wno-reorder -Xcompiler -Wno-error=sign-compare -Werror all-warnings -MD -MT CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu.o -MF CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu.o.d -x cu -c /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu -o CMakeFiles/onnxruntime_providers_cuda.dir/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/onnxruntime/1.14.1-cf84b80b2cf5cb66235e6037f7304faa/onnxruntime-1.14.1/onnxruntime/contrib_ops/cuda/bert/longformer_global_impl.cu.o
ninja: build stopped: subcommand failed.
error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.cFSkcs (%build)


RPM build errors:
line 37: It's not recommended to have unversioned Obsoletes: Obsoletes: external+onnxruntime+1.14.1-cf84b80b2cf5cb66235e6037f7304faa
Macro expanded in comment on line 369: %{pkginstroot}/${PYTHON3_LIB_SITE_PACKAGES}


@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2024

Pull request #9046 was updated.

@fwyzard fwyzard force-pushed the IB/CMSSW_14_1_X/master_cuda-12.4.0 branch from 4e99bf1 to bcbd910 Compare March 6, 2024 17:51
@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 6, 2024

Pull request #9046 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 6, 2024

please test for CMSSW_14_1_CPP20_X

@valsdav
Copy link
Contributor

valsdav commented Mar 12, 2024

@valsdav it looks like TensorFlow is somewhat unhappy about this update, too.

Running locally:

/cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9046/38020/install.sh
cd CMSSW_14_1_X_2024-03-10-0000/test/el8_amd64_gcc12
cmsenv
./testTFVisibleDevicesCUDA

now fails with

Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.
WARNING:tensorflow:From /cvmfs/cms-ci.cern.ch/week0/PR_788615ea/el8_amd64_gcc12/external/py3-tensorflow/2.12.0-ec68dc3c71c1d46813d062de5dd4889e/lib/python3.9/site-packages/tensorflow/python/framework/convert_to_constants.py:952: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This API was designed for TensorFlow v1. See https://www.tensorflow.org/guide/migrate for instructions on how to migrate your code to TensorFlow v2.

46.0

CUDA service enabled: 1
Testing CUDA backend
2024-03-11 18:19:49.499176: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-11 18:19:49.514454: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:353] MLIR V1 optimization pass is not enabled
0 /job:localhost/replica:0/task:0/device:CPU:0 type: CPU
Available devices: 1
F

testVisibleDevicesCUDA.cc:83:Assertion
Test name: testVisibleDevicesCUDA::test
assertion failed
- Expression: response.size() == 2

Failures !!!
Run: 1   Failure total: 1   Failures: 1   Errors: 0

Could you look into it ?

The test is failing because at the moment TensorFlow is not compiled with GPU support (due to problems with CUDA 12 integration) https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_1_X/master/tensorflow-requires.file#L7.

The framework sees the GPU card and execute the test, but TF does not see the card and the test fails.
Should we disable the test until we can re-enable GPU support for TF? I will try to update it to a newer version that is now supporting CUDA12.

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 12, 2024

If this test is already failing on GPU machines, can we just ignore it for the time being, and go ahead with this update ?

@valsdav
Copy link
Contributor

valsdav commented Mar 12, 2024

If this test is already failing on GPU machines, can we just ignore it for the time being, and go ahead with this update ?

@fwyzard can you test again adding this PR cms-sw/cmssw#44376 ?

@smuzaffar
Copy link
Contributor

ignore tests-rejected with ib-failure

No need to re-run the tests. testTFVisibleDevicesCUDA is already failing in GPU IBs (https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el8_amd64_gcc12/CMSSW_14_1_GPU_X_2024-03-11-2300/unitTestLogs/PhysicsTools/TensorFlow#/89-89)

@makortel
Copy link
Contributor

Should we disable the test until we can re-enable GPU support for TF?

I think we should come up with a more general mechanism for scram to skip tests that depend on Tensorflow and CUDA when the Tensorflow is built without (NVIDIA) GPU support.

@smuzaffar
Copy link
Contributor

@makortel , I proposed cms-sw/cmssw#44376 (comment)

@fwyzard
Copy link
Contributor Author

fwyzard commented Mar 12, 2024

OK, so

  • all unittests/gpu failures are due to testTFVisibleDevicesCUDA, which is known
  • the el9 unit test failure is due to SiStripDAQ_O2O_test, which looks like a timeout (it ends with TestTime:3600)
  • the aarch64 failures also look like timeouts (all three end with TestTime:3600)

Everything else looks good.

Do we need to investigate or re-run the time outs, or are we OK to merge this ?

@iarspider
Copy link
Contributor

  • the el9 unit test failure is due to SiStripDAQ_O2O_test, which looks like a timeout (it ends with TestTime:3600)

  • the aarch64 failures also look like timeouts (all three end with TestTime:3600)

These tests time out in IBs as well, so I think it's ok to merge. @smuzaffar ?

@smuzaffar
Copy link
Contributor

+externals

This looks good to go in 14.1.X . @antoniovilela @rappoccio please merge this along with cms-sw/cmssw#44354

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_14_1_X/master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-sw/cmssw#44354

@cmsbuild
Copy link
Contributor

REMINDER @rappoccio, @antoniovilela, @sextonkennedy: This PR was tested with cms-sw/cmssw#44354, please check if they should be merged together

@antoniovilela
Copy link

+1

@cmsbuild cmsbuild merged commit bd55ddd into cms-sw:IB/CMSSW_14_1_X/master Mar 13, 2024
35 of 42 checks passed
@fwyzard fwyzard deleted the IB/CMSSW_14_1_X/master_cuda-12.4.0 branch April 17, 2024 20:55
@fwyzard fwyzard mentioned this pull request Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants