Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the error message when trying to call a kernel from a .cc file #43163

Merged
merged 1 commit into from
Nov 3, 2023

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Nov 1, 2023

PR description:

If somebody tries to launch an alpaka kernel from a .cc file the error message is hard to understand, e.g.:

/data/user/fwyzard/alpaka-test/CMSSW_13_3_0_pre4/src/HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlgo.cc:55:24:   required from here
/data/cmssw/el8_amd64_gcc12/external/alpaka/develop-20230621-328794fca9695cfc66a84565d03106ee/include/alpaka/acc/AccGpuUniformCudaHipRt.hpp:277:24: error: invalid use of incomplete type 'class alpaka::TaskKernelGpuUniformCudaHipRt<alpaka::ApiCudaRt, alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<long unsigned int, 1>, unsigned int>, std::integral_constant<long unsigned int, 1>, unsigned int, alpaka_cuda_async::TestAlgoKernel, portabletest::TestSoALayout<>::ViewTemplateFreeParams<128, false, true, false>&, int, double&>'
  277 |                 return TaskKernelGpuUniformCudaHipRt<
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  278 |                     TApi,
      |                     ~~~~~
  279 |                     AccGpuUniformCudaHipRt<TApi, TDim, TIdx>,
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  280 |                     TDim,
      |                     ~~~~~
  281 |                     TIdx,
      |                     ~~~~~
  282 |                     TKernelFnObj,
      |                     ~~~~~~~~~~~~~
  283 |                     TArgs...>(workDiv, kernelFnObj, std::forward<TArgs>(args)...);
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/cmssw/el8_amd64_gcc12/external/alpaka/develop-20230621-328794fca9695cfc66a84565d03106ee/include/alpaka/acc/AccGpuUniformCudaHipRt.hpp:43:11: note: declaration of 'class alpaka::TaskKernelGpuUniformCudaHipRt<alpaka::ApiCudaRt, alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<long unsigned int, 1>, unsigned int>, std::integral_constant<long unsigned int, 1>, unsigned int, alpaka_cuda_async::TestAlgoKernel, portabletest::TestSoALayout<>::ViewTemplateFreeParams<128, false, true, false>&, int, double&>'
   43 |     class TaskKernelGpuUniformCudaHipRt;
      |           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These changes make a more understandable error message appear instead:

/data/user/fwyzard/alpaka-test/CMSSW_13_3_0_pre4/src/HeterogeneousCore/AlpakaTest/plugins/alpaka/TestAlgo.cc:55:24:   required from here
/data/user/fwyzard/alpaka-test/CMSSW_13_3_0_pre4/src/HeterogeneousCore/AlpakaInterface/interface/config.h:64:63: error: static assertion failed: You should move this files to a .dev.cc file under the alpaka/ subdirectory.
   64 |     static_assert(std::is_same_v<TApi, alpaka::ApiCudaRt> and BOOST_LANG_CUDA,
      |                                                               ^~~~~~~~~~~~~~~

PR validation:

Unit tests pass (are not affected)

@cmsbuild cmsbuild added this to the CMSSW_13_3_X milestone Nov 1, 2023
@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 1, 2023

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 1, 2023

please test

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 1, 2023

@makortel can you think of any problems with this approach ?

After discussing this with @ericcano we think it should be safe:

  • .dev.cc files do not define ALPAKA_HOST_ONLY, so they see only the original alpaka definition;
  • .cc files that do not try to launch an alpaka kernel do not try to instantiate TaskKernelGpuUniformCudaHipRt, so there should not be any trace of it;
  • .cc files that try to launch an alpaka kernel try to instantiate TaskKernelGpuUniformCudaHipRt, find this version, and fail with a static_assert.

So, there should never be any conflicting definition for any instantiation of TaskKernelGpuUniformCudaHipRt.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43163/37482

  • This PR adds an extra 12KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2023

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaInterface (heterogeneous)

@fwyzard, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @rovere this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Nov 1, 2023

On a first thought, one possible problem is that these classes add a depence on Alpaka internals (or at least I view the Task* classes as Alpaka's implementation details, because they are not visible in alpaka::exec<TAcc>() call). What if these classes evolve (e.g. change interface, get renamed) in Alpaka? (could well be an acceptable risk though)

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 1, 2023

What if these classes evolve (e.g. change interface, get renamed) in Alpaka?

I'd say if that happens, we fix their counterparts here accordingly.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 1, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a046f2/35549/summary.html
COMMIT: 8365722
CMSSW: CMSSW_13_3_X_2023-11-01-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43163/35549/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 2 lines from the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3362691
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3362663
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 19
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 39721
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 2, 2023

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 2, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@rappoccio
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 5f46c28 into cms-sw:master Nov 3, 2023
@fwyzard fwyzard deleted the better_kernel_error_message branch January 30, 2024 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants