Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test for CUDA library build rules #40605

Merged
merged 2 commits into from
Feb 1, 2023

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 24, 2023

PR description:

This PR adds new packages to test the implementation and usage of CUDA libraries for device code in CMSSW plugins and executables.

The package HeterogeneousTest/CUDADevice implements a library that defines and exports CUDA device-only functions, and a plugin and test that use them.

The package HeterogeneousTest/CUDAKernel implements a library that imports device functions from HeterogeneousTest/CUDADevice to define and export CUDA kernels, and a plugin and test that use them.

The package HeterogeneousTest/CUDAWrapper implements a library that imports kernels from HeterogeneousTest/CUDAKernel to define and export host-only wrappers around them, usable by non-CUDA libraries, plugins and applications, and implements a plugin and test that use them.

The package HeterogeneousTest/CUDAOpaque implements a library that use the wrappers from HeterogeneousTest/CUDAKernel to define and export host-only functions around the whole CUDA section, usable by libraries, plugins and applications that are not CUDA-aware, and implements a plugin and test that use them.

PR validation:

The new unit tests compile and pass.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 24, 2023

@makortel this almost works - it needs only a fix in the build command.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 24, 2023

@smuzaffar at least locally, this fails to link because scram does not pass -lHeterogeneousCoreCUDATestDeviceLib to gcc, even if the build file does state

    <use name="HeterogeneousCore/CUDATestDeviceLib"/>

The resulting command is

>> Building binary testCudaDeviceAddition
/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02768/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/bin/c++ -O2 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -std=c++1z -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -msse3 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-deprecated-copy -Wno-unused-parameter -Wunused -Wparentheses -Wno-deprecated -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -DUSE_CMS_DEPRECATED -DBOOST_DISABLE_ASSERTS -fPIC tmp/el8_amd64_gcc11/src/HeterogeneousCore/CUDATestDeviceLib/test/testCudaDeviceAddition/testDeviceAddition.cu.o tmp/el8_amd64_gcc11/src/HeterogeneousCore/CUDATestDeviceLib/test/testCudaDeviceAddition/testCudaDeviceAddition_cudadlink.o -Wl,-E -Wl,--hash-style=gnu -L/data/user/fwyzard/CMSSW_13_0_X_2023-01-21-1100/biglib/el8_amd64_gcc11 -L/data/user/fwyzard/CMSSW_13_0_X_2023-01-21-1100/lib/el8_amd64_gcc11 -L/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-21-1100/biglib/el8_amd64_gcc11 -L/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-21-1100/lib/el8_amd64_gcc11 -L/cvmfs/cms-ib.cern.ch/sw/x86_64/week0/el8_amd64_gcc11/cms/cmssw-patch/CMSSW_13_0_X_2023-01-21-1100/external/el8_amd64_gcc11/lib -L/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02768/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_X_2023-01-19-2300/lib/el8_amd64_gcc11 -L/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02768/el8_amd64_gcc11/cms/cmssw/CMSSW_13_0_X_2023-01-19-2300/biglib/el8_amd64_gcc11 -L/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02768/el8_amd64_gcc11/external/cuda/11.5.2-66a9473808e7d5863d5bbec0824e2c4a/lib64/stubs -lcudart -lcudadevrt -lnvToolsExt -lcuda -o tmp/el8_amd64_gcc11/src/HeterogeneousCore/CUDATestDeviceLib/test/testCudaDeviceAddition/testCudaDeviceAddition

and it fails with

/cvmfs/cms-ib.cern.ch/sw/x86_64/nweek-02768/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/bin/../lib/gcc/x86_64-redhat-linux-gnu/11.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: tmp/el8_amd64_gcc11/src/HeterogeneousCore/CUDATestDeviceLib/test/testCudaDeviceAddition/testCudaDeviceAddition_cudadlink.o: in function `__cudaRegisterLinkedBinary_c84dd51f_17_DeviceAddition_cu_bcd44480':
link.stub:(.text+0x121): undefined reference to `__fatbinwrap_c84dd51f_17_DeviceAddition_cu_bcd44480'
collect2: error: ld returned 1 exit status

If I add by by hand -lHeterogeneousCoreCUDATestDeviceLib the link is successful, and the test actually runs:

$ ./tmp/el8_amd64_gcc11/src/HeterogeneousCore/CUDATestDeviceLib/test/testCudaDeviceAddition/testCudaDeviceAddition
===============================================================================
All tests passed (2097152 assertions in 1 test case)

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 24, 2023

please test

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40605/33862

  • This PR adds an extra 12KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@smuzaffar
Copy link
Contributor

@fwyzard , cms-sw/cmsdist#8264 should fix this issue. Problem was that there was a restrictions that only packages with at least one cxx files can export libs. In this case HeterogeneousCore/CUDATestDeviceLib has only .cu files that is why its lib was not exported. cms-sw/cmsdist#8264 now allows packages with any source file (cxx, c, fortran, cuda) should export their libs

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 25, 2023

Understood... thanks!

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 25, 2023

please test with cms-sw/cmsdist#8264

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40605/33864

  • This PR adds an extra 12KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40605/33979

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

Pull request #40605 was updated. @makortel, @fwyzard can you please check and sign again.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 31, 2023

+heterogeneous

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-8dc05e/30294/summary.html
COMMIT: 816674f
CMSSW: CMSSW_13_0_X_2023-01-31-1100/el8_amd64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/40605/30294/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3555495
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555470
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19862
  • DQMHistoTests: Total failures: 272
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19590
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 2 / 3 workflows

@rappoccio
Copy link
Contributor

+1

  • Tests are now successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants