-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CUDA 11.8, cuDNN and PyCUDA #8295
Update CUDA 11.8, cuDNN and PyCUDA #8295
Conversation
The main change since CUDA 11.5.x is the support for the Lovelace (sm_87) and Hopper (sm_90) architectures. See https://docs.nvidia.com/cuda/archive/11.8.0/cuda-toolkit-release-notes/index.html for the full CUDA 11.8.0 release notes and change log.
enable gpu |
please test |
A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_13_0_X/master. @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks. |
please test for el8_ppc64le_gcc11 |
please test for el8_aarch64_gcc11 |
hold |
Pull request has been put on hold by @fwyzard |
In stand-alone tests we have observed a small drop in performance with all CUDA versions after 11.5, so we should check the impact on the HLT performance before merging this. |
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-522667/30438/summary.html External BuildI found compilation error when building: Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 1138.545s, Critical Path: 175.74s INFO: 3613 processes: 778 internal, 2835 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully error: Bad exit status from /data/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.8DrGJF (%build) RPM build errors: line 37: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tensorflow-sources+2.6.4-166022720595f2d89e239d15b7dfe73d Bad exit status from /data/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.8DrGJF (%build) |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-522667/30437/summary.html The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: |
-1 Failed Tests: UnitTests The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic: You can see more details here: Unit TestsI found errors in the following unit tests: ---> test testTriggerMonitors had ERRORS Comparison SummarySummary:
GPU Comparison SummarySummary:
|
please test with #8258,cms-sw/cmssw#40645 |
-1 Failed Tests: Build ClangBuild The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:
You can see more details here: BuildI found compilation error when building: >> Leaving Package Utilities/StaticAnalyzers >> Package Utilities/StaticAnalyzers built Copying tmp/el8_amd64_gcc11/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a to productstore area: cp: cannot stat 'tmp/el8_amd64_gcc11/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a': No such file or directory >> Deleted: tmp/el8_amd64_gcc11/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a gmake: *** [config/SCRAM/GMake/Makefile.rules:1740: tmp/el8_amd64_gcc11/src/DataFormats/SoATemplate/test/testRocmSoALayoutAndView_t/libtestRocmSoALayoutAndView_t_rocm.a] Error 1 >> Entering Package Configuration/DataProcessing >> Leaving Package Configuration/DataProcessing >> Package Configuration/DataProcessing built >> Compiling /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-07-2300/src/Configuration/DataProcessing/test/TestCfg.cpp >> Building binary TestConfigDP Clang BuildI found compilation error while trying to compile with clang. Command used:
>> Entering Package Validation/TrackerDigis >> Entering Package Validation/TrackerHits >> Entering Package Validation/TrackerRecHits >> Entering Package Validation/TrackingMCTruth >> Compile sequence completed for CMSSW CMSSW_13_0_X_2023-02-07-2300 gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1 + eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-07-2300/tmp/el8_amd64_gcc11/cache/log/src '||' 'true)' ++ scram build outputlog >> Entering Package Alignment/OfflineValidation >> Compiling /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-07-2300/src/Alignment/OfflineValidation/bin/DMRmerge.cc >> Compiling /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-07-2300/src/Alignment/OfflineValidation/bin/Options.cc |
@smuzaffar it seems that making a local installation out of the bot's build does not work any more ? I've done /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8295/30498/install.sh
cd CMSSW_13_0_X_2023-02-07-2300/src
cmsenv
cd DataFormats/SoATemplate/test
scram b and I get an error like
I get the same in other packages and from a different |
also, the error seem unrelated to CUDA |
please test |
Ah, maybe it was because I had a |
-1 Failed Tests: UnitTests Unit TestsI found errors in the following unit tests: ---> test testTriggerMonitors had ERRORS Comparison SummarySummary:
GPU Comparison SummarySummary:
|
please test |
-1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-522667/30989/summary.html External BuildI found compilation error when building: INFO: Found applicable config definition build:cuda in file /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc11/external/tensorflow-sources/2.6.4-45e5831316b52b33d56b259a07cc3cdd/tensorflow-2.6.4/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior. ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: Repository command failed Expected even number of arguments error: Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.H5fzGa (%build) RPM build errors: line 37: It's not recommended to have unversioned Obsoletes: Obsoletes: external+tensorflow-sources+2.6.4-45e5831316b52b33d56b259a07cc3cdd Bad exit status from /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/rpm-tmp.H5fzGa (%build) |
Update to CUDA 11.8 and related software: