Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMSSW Integration of LST #45117

Conversation

VourMa
Copy link
Contributor

@VourMa VourMa commented Jun 1, 2024

This PR integrates the LST algorithm in CMSSW. A summary of the algorithm and its scope can be found in the recent LST presentation at the Phase 2 Software days (April 2024).

The PR includes the following additions/modifications:

  • New package w/ the LST algorithm code (RecoTracker/LSTCore):
    • interface/alpaka:
      The interface exposed to CMSSW.
    • src/alpaka:
      The actual LST code.
    • standalone:
      Scripts to be used for compiling, using & testing LST outside of the full CMSSW framework.
      Not relevant for CMSSW review.
    • Minimal/at most header-only dependency of LSTCore on other CMSSW packages
      ⇒ Preserve ability to run in standalone.
  • New package w/ CMSSW modules related to LST (RecoTracker/LST):
    • interface:
      The input & output data formats for LST.
    • plugins:
      The producers:
      • Converting to/from the LST data formats (ED).
      • Loading the LST custom geometry files (ES).
      • Running LST to produce CMSSW collections (ED).
    • python:
      The configuration files needed for running LST.
    • src:
      Class definitions and ES producer supporting files.
    • test:
      Scripts for local testing
      → Dropped in favor of a proper workflow.
  • New process modifiers to test LST (changes in multiple existing packages):
    • trackingIters01:
      Runs only the first two iterations of tracking (initialStep & highPtTripletStep).
      Useful for comparisons, as LST (for now) replaces only those two tracking iterations.
    • trackingLST:
      Runs the LST algorithm instead of KalmanFilter for track building/seeding.
      The existence of the gpu process modifier defines the hardware the algorithm runs on (CPU or GPU).

There is a single change not strictly related to the above categories and a dedicated comment will be made on it.

In general, we prefer to have minimal or at most header-only dependency of LSTCore on other CMSSW packages to preserve the ability to run with standalone scripts.

This is a large PR, so we start it as an RFC with the main batch of files. In the next days, the following updates are to be expected, so that the PR can be merged:

  • Removal of test scripts and introduction of workflow.
  • Extraction of the LST data files from the proper directories (bot tests will probably not work currently).
  • Modifications to the standalone scripts → Not to be reviewed.

Goes together with cms-data/RecoTracker-LSTCore#1 (now merged).

@slava77 @ariostas


List of unresolved comments (to be updated in batches - last update: 2024/08/19):
SegmentLinking#75

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40456

  • This PR adds an extra 788KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

A new Pull Request was created by @VourMa for master.

It involves the following packages:

  • Configuration/ProcessModifiers (operations)
  • RecoTracker/ConversionSeedGenerators (reconstruction)
  • RecoTracker/FinalTrackSelectors (reconstruction)
  • RecoTracker/IterativeTracking (reconstruction)
  • RecoTracker/LST (****)
  • RecoTracker/LSTCore (reconstruction)

The following packages do not have a category, yet:

RecoTracker/LST
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild, @rappoccio, @jfernan2, @davidlange6, @mandrenguyen, @fabiocos, @antoniovilela can you please review it and eventually sign? Thanks.
@VourMa, @missirol, @gpetruc, @rovere, @GiacomoSguazzoni, @VinInn, @Martin-Grunewald, @mmusich, @mtosi, @dgulhan, @JanFSchulte, @fabiocos, @felicepantaleo, @makortel this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 1, 2024

test parameters:

@slava77
Copy link
Contributor

slava77 commented Jun 1, 2024

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/summary.html
COMMIT: 891eb11
CMSSW: CMSSW_14_1_X_2024-06-01-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45117/39661/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 2 errors in the following unit tests:

---> test TestDQMOnlineClient-hlt_dqm_sourceclient had ERRORS
---> test testTrackingResolution had ERRORS

Comparison Summary

Summary:

  • You potentially added 16 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3445370
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3445344
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 206 log files, 170 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Jun 2, 2024

I found 2 errors in the following unit tests:

both are apparently related to LST
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html

@slava77
Copy link
Contributor

slava77 commented Jun 3, 2024

I found 2 errors in the following unit tests:

both are apparently related to LST https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html

An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'LSTModulesDevESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.

it's not obvious how this dependency comes about from looking at https://github.com/cms-sw/cmssw/blob/master/DQM/TrackingMonitorSource/test/testTrackResolution_cfg.py (a Run3 test)

@makortel
do you see a clear way how the LST ES dependency makes it through here?

@mmusich
Copy link
Contributor

mmusich commented Jun 3, 2024

assign heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2024

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/42957/summary.html
COMMIT: 04a1165
CMSSW: CMSSW_14_2_X_2024-11-19-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45117/42957/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53058
  • DQMHistoTests: Total failures: 83
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52975
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@Moanwar
Copy link
Contributor

Moanwar commented Nov 19, 2024

+Upgrade

@mandrenguyen
Copy link
Contributor

+1

@mandrenguyen
Copy link
Contributor

merge
re-sign of alca is assumed in order to catch the 11PM IB

@cmsbuild cmsbuild merged commit 48600da into cms-sw:master Nov 19, 2024
15 checks passed
@perrotta
Copy link
Contributor

+alca

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@smuzaffar
Copy link
Contributor

smuzaffar commented Nov 20, 2024

@smuzaffar Any thoughts what happened?

@iarspider is looking in to it

@iarspider
Copy link
Contributor

@smuzaffar Any thoughts what happened?

@iarspider is looking in to it

The bot worked as expected: when processing a new commit, we reset signatures for categories of changes files and for assigned categories. In this case, commit 04a1165 changed files belonging to pdmv and upgrade (which didn't sign at that time), and PR was assigned to alca and heterogeneous, so these four signatures were reset.

@makortel
Copy link
Contributor

@smuzaffar Any thoughts what happened?

@iarspider is looking in to it

The bot worked as expected: when processing a new commit, we reset signatures for categories of changes files and for assigned categories. In this case, commit 04a1165 changed files belonging to pdmv and upgrade (which didn't sign at that time), and PR was assigned to alca and heterogeneous, so these four signatures were reset.

Thanks @iarspider for the analysis! I forgot alca and heterogeneous were assigned explicitly rather than via modified files.

@smuzaffar
Copy link
Contributor

@VourMa , GCC 13 shows some build error [a] . Can you please look in to these and provide a fix? This looks like a valid error as for layer==6

[a]

>> Compiling alpaka/cuda src/RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/cuda/12.4.1-377bccc815261ebddfd7f411953412af/bin/nvcc -x cu -MMD -MF tmp/el9_amd64_gcc13/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/LSTEvent.dev.cc.d -dc -DGNU_GCC -D_GNU_SOURCE -DTBB_USE_GLIBCXX_VERSION=130200 -DTBB_SUPPRESS_DEPRECATED_MESSAGES -DTBB_PREVIEW_RESUMABLE_TASKS=1 -DTBB_PREVIEW_TASK_GROUP_EXTENSIONS=1 -DBOOST_SPIRIT_THREADSAFE -DPHOENIX_THREADSAFE -DBOOST_MATH_DISABLE_STD_FPCLASSIFY -DBOOST_UUID_RANDOM_PROVIDER_FORCE_POSIX -DCMSSW_GIT_HASH='CMSSW_15_0_X_2024-11-22-2300' -DPROJECT_NAME='CMSSW' -DPROJECT_VERSION='CMSSW_15_0_X_2024-11-22-2300' -Isrc -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/alpaka/1.1.0-7ae37b81630ceac16bb76f6e494ad52f/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/pcre/8.43-042dbc4f51740fc51eb97282dbf28734/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/boost/1.80.0-61186ae1666445ba179e478c9cad3e59/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/bz2lib/1.0.6-452db86d4d93dcf1690037af99319ec9/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/cuda/12.4.1-377bccc815261ebddfd7f411953412af/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/libuuid/2.34-543db95ed1650444408e40e641f34da9/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/lcg/root/6.30.09-7f534092486f34025a74194fb75c2c04/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/tbb/v2021.9.0-71668544ccc37d9a4726850e93c95366/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/xz/5.2.5-b6caa493ffc2cdd51db397ec2ac3b210/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/zlib/1.2.11-cee431f26b09cbe7117839bdc0dd365e/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/fmt/10.2.1-d463a05da49f5c55c9ae0ea93d9eb317/include -I/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/md5/1.0.0-2f660b2cdbedb9b324f19c722f640619/include -std=c++20 -O3 --generate-line-info --source-in-ptx --display-error-number --expt-relaxed-constexpr --extended-lambda -gencode arch=compute_60,code=[sm_60,compute_60] -gencode arch=compute_70,code=[sm_70,compute_70] -gencode arch=compute_75,code=[sm_75,compute_75] -gencode arch=compute_80,code=[sm_80,compute_80] -gencode arch=compute_89,code=[sm_89,compute_89] -Wno-deprecated-gpu-targets -diag-suppress=3012 -diag-suppress=3189 -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --gnu_version=130200 --cudart shared -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_ACC_GPU_CUDA_ONLY_MODE -UALPAKA_HOST_ONLY --compiler-options '-O3 -pthread -pipe -Werror=main -Werror=pointer-arith -Werror=overlength-strings -Wno-vla -Werror=overflow -ftree-vectorize -Werror=array-bounds -Werror=format-contains-nul -Werror=type-limits -fvisibility-inlines-hidden -fno-math-errno --param vect-max-version-for-alias-checks=50 -Xassembler --compress-debug-sections -fuse-ld=bfd -march=x86-64-v2 -felide-constructors -fmessage-length=0 -Wall -Wno-non-template-friend -Wno-long-long -Wreturn-type -Wextra -Wpessimizing-move -Wclass-memaccess -Wno-cast-function-type -Wno-unused-but-set-parameter -Wno-ignored-qualifiers -Wno-unused-parameter -Wunused -Wparentheses -Werror=return-type -Werror=missing-braces -Werror=unused-value -Werror=unused-label -Werror=address -Werror=format -Werror=sign-compare -Werror=write-strings -Werror=delete-non-virtual-dtor -Werror=strict-aliasing -Werror=narrowing -Werror=unused-but-set-variable -Werror=reorder -Werror=unused-variable -Werror=conversion-null -Werror=return-local-addr -Wnon-virtual-dtor -Werror=switch -fdiagnostics-show-option -Wno-unused-local-typedefs -Wno-attributes -Wno-psabi -Wno-error=unused-variable -DALPAKA_DEFAULT_HOST_MEMORY_ALIGNMENT=128 -DALPAKA_DISABLE_VENDOR_RNG -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_ACC_GPU_CUDA_ONLY_MODE -DALPAKA_HOST_ONLY -DBOOST_DISABLE_ASSERTS -DPT_CUT=0.8 -std=c++20 -fPIC ' src/RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc -o tmp/el9_amd64_gcc13/src/RecoTracker/LSTCore/src/alpaka/RecoTrackerLSTCoreCudaAsync/LSTEvent.dev.cc.o
In member function 'constexpr std::array<_Tp, _Nm>::value_type& std::array<_Tp, _Nm>::operator[](size_type) [with _Tp = unsigned int; long unsigned int _Nm = 6]',
    inlined from 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfMiniDoubletsByLayer(unsigned int)' at src/RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc:1276:47:
  /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:203:25: error: array subscript 6 is above array bounds of 'std::__array_traits<unsigned int, 6>::_Type' {aka 'unsigned int [6]'} [-Werror=array-bounds=]
   203 |         return _M_elems[__n];
      |               ~~~~~~~~~~^
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array: In member function 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfMiniDoubletsByLayer(unsigned int)':
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:109:44: note: while referencing 'std::array<unsigned int, 6>::_M_elems'
  109 |       typename __array_traits<_Tp, _Nm>::_Type        _M_elems;
      |                                            ^~~~~~~~
In member function 'constexpr std::array<_Tp, _Nm>::value_type& std::array<_Tp, _Nm>::operator[](size_type) [with _Tp = unsigned int; long unsigned int _Nm = 6]',
    inlined from 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfSegmentsByLayer(unsigned int)' at src/RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc:1303:43:
  /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:203:25: error: array subscript 6 is above array bounds of 'std::__array_traits<unsigned int, 6>::_Type' {aka 'unsigned int [6]'} [-Werror=array-bounds=]
   203 |         return _M_elems[__n];
      |               ~~~~~~~~~~^
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array: In member function 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfSegmentsByLayer(unsigned int)':
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:109:44: note: while referencing 'std::array<unsigned int, 6>::_M_elems'
  109 |       typename __array_traits<_Tp, _Nm>::_Type        _M_elems;
      |                                            ^~~~~~~~
In member function 'constexpr std::array<_Tp, _Nm>::value_type& std::array<_Tp, _Nm>::operator[](size_type) [with _Tp = unsigned int; long unsigned int _Nm = 6]',
    inlined from 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfTripletsByLayer(unsigned int)' at src/RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc:1330:43:
  /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:203:25: error: array subscript 6 is above array bounds of 'std::__array_traits<unsigned int, 6>::_Type' {aka 'unsigned int [6]'} [-Werror=array-bounds=]
   203 |         return _M_elems[__n];
      |               ~~~~~~~~~~^
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array: In member function 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfTripletsByLayer(unsigned int)':
/data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:109:44: note: while referencing 'std::array<unsigned int, 6>::_M_elems'
  109 |       typename __array_traits<_Tp, _Nm>::_Type        _M_elems;
      |                                            ^~~~~~~~
In member function 'constexpr std::array<_Tp, _Nm>::value_type& std::array<_Tp, _Nm>::operator[](size_type) [with _Tp = unsigned int; long unsigned int _Nm = 6]',
    inlined from 'unsigned int alpaka_cuda_async::lst::LSTEvent::getNumberOfQuintupletsByLayer(unsigned int)' at src/RecoTracker/LSTCore/src/alpaka/LSTEvent.dev.cc:1378:46:
  /data/cmsbld/jenkins/workspace/build-any-ib/w/el9_amd64_gcc13/external/gcc/13.2.0-b4f157aad5ba3fefd6a4401833585549/include/c++/13.2.0/array:203:25: error: array subscript 6 is above array bounds of 'std::__array_traits<unsigned int, 6>::_Type' {aka 'unsigned int [6]'} [-Werror=array-bounds=]
   203 |         return _M_elems[__n];

@slava77
Copy link
Contributor

slava77 commented Nov 25, 2024

it looks like *ByLayer( methods can be just removed. The implementation doesn't really make sense (and they are not used).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.