Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep the PTX in CUDA binaries, so it can be JIT'ted for newer devices #6851

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Apr 28, 2021

No description provided.

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

please test

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_0_X/master.

@smuzaffar, @mrodozov can you please review it and eventually sign? Thanks.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

please test for slc7_aarch64_gcc9

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

please test for slc7_ppc64le_gcc9

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fc616/14660/summary.html
COMMIT: a622ce3
CMSSW: CMSSW_12_0_X_2021-04-27-2300/slc7_ppc64le_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6851/14660/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test import-yaml had ERRORS

RelVals

  • 11634.91111634.911_TTbar_14TeV+2021_DD4hep+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA/step1_TTbar_14TeV+2021_DD4hep+TTbar_14TeV_TuneCP5_GenSim+Digi+Reco+HARVEST+ALCA.log

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fc616/14659/summary.html
COMMIT: a622ce3
CMSSW: CMSSW_12_0_X_2021-04-27-2300/slc7_aarch64_gcc9
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6851/14659/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test import-yaml had ERRORS

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

===== Test "import-yaml" ====
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/cms-ib.cern.ch/jenkins-env/python/shared/yaml/__init__.py", line 2, in <module>
    from error import *
ModuleNotFoundError: No module named 'error'

---> test import-yaml had ERRORS
 
^^^^ End Test import-yaml ^^^^

unrelated ?

@mrodozov
Copy link
Contributor

mrodozov commented Apr 28, 2021

unrelated. for some weird reason on Arm and PPC this test is failing.
error is a file inside yaml - if you start python3 and import yaml it's ok
but if you run python3 -c 'import yaml' it's failing.
and that's after we changed the test to be python3

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fc616/14657/summary.html
COMMIT: a622ce3
CMSSW: CMSSW_12_0_X_2021-04-27-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6851/14657/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 9559
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 9559
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 38
  • DQMHistoTests: Total histograms compared: 2877605
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2877578
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 37 files compared)
  • Checked 160 log files, 37 edm output root files, 38 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

Out of curiosity, how much does the PTX add to the (shared) object file size? (even on qualitative order-of-magnitude scale)

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 28, 2021

Here is a comparison of the libraries from a working area I had around.

before (no PTX)
1.4M    pluginEventFilterEcalRawToDigiPlugins.so
724K    pluginEventFilterHcalRawToDigiGPUPlugins.so
600K    pluginHeterogeneousCoreCUDATestPlugins.so
8.2M    pluginRecoLocalCaloEcalRecProducersPlugins.so
1.8M    pluginRecoLocalCaloHGCalRecProducersPlugins.so
6.7M    pluginRecoLocalCaloHcalRecProducers.so
1.5M    pluginRecoLocalTrackerSiPixelClusterizerPlugins.so
1.9M    pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
13M     pluginRecoPixelVertexingPixelTripletsPlugins.so
2.1M    pluginRecoPixelVertexingPixelVertexFindingPlugins.so
38M     total
after (with PTX)
1.4M    pluginEventFilterEcalRawToDigiPlugins.so
748K    pluginEventFilterHcalRawToDigiGPUPlugins.so
616K    pluginHeterogeneousCoreCUDATestPlugins.so
8.6M    pluginRecoLocalCaloEcalRecProducersPlugins.so
1.9M    pluginRecoLocalCaloHGCalRecProducersPlugins.so
7.0M    pluginRecoLocalCaloHcalRecProducers.so
1.6M    pluginRecoLocalTrackerSiPixelClusterizerPlugins.so
2.0M    pluginRecoLocalTrackerSiPixelRecHitsPlugins.so
15M     pluginRecoPixelVertexingPixelTripletsPlugins.so
2.3M    pluginRecoPixelVertexingPixelVertexFindingPlugins.so
41M     total

So, around 10% of the size for the CUDA-enabled plugins, on average ?

@makortel
Copy link
Contributor

Thanks. So noticeable but not large. Do you think the JIT'ting would be intended only for cases where new architecture appears after the build, or should we think of mainly JIT'ting the code?

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 29, 2021

For the moment I would consider it as a fallback option for new architectures (e.g. Ampere), and to possibly improve the performance on less used architectures (e.g. JIT'ting for sm_61 may have some benefit for a GeForce GTX 1080 instead of using the binary for sm_60).

Somewhere in the NVIDIA documentation I've see a suggestion that we should build with -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=[sm_86,compute_86] ... which would increase both compilation times and library sizes significantly.

I consider this PR a reasonable compromise.

@makortel
Copy link
Contributor

I consider this PR a reasonable compromise.

Sure, thanks. I was thinking more the future, e.g. should we add sm_80 etc there at some point?

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_0_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor Author

fwyzard commented Apr 29, 2021

Sure, thanks. I was thinking more the future, e.g. should we add sm_80 etc there at some point?

Sure - well, maybe once we actually have some Ampere card to test :-)

@smuzaffar
Copy link
Contributor

test parameters:

  • addpkg = HeterogeneousCore

@smuzaffar
Copy link
Contributor

please test

@smuzaffar
Copy link
Contributor

abort

we need to wait for next IB with SCRAMV3 for external PR testing

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-0fc616/14730/summary.html
COMMIT: a622ce3
CMSSW: CMSSW_12_0_X_2021-04-29-2300/slc7_amd64_gcc900
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/6851/14730/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 9559
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 9559
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: no differences found

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 37
  • DQMHistoTests: Total histograms compared: 2662646
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2662623
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 36 files compared)
  • Checked 155 log files, 37 edm output root files, 37 DQM output files
  • TriggerResults: no differences found

@smuzaffar smuzaffar merged commit bd26e28 into cms-sw:IB/CMSSW_12_0_X/master Apr 30, 2021
@fwyzard fwyzard deleted the IB/CMSSW_12_0_X/master_CUDA_keep_PTX branch May 10, 2021 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants