Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPS diamond mapping fix for 2023 run (backport) [13_0_X] #41188

Merged

Conversation

grzanka
Copy link
Contributor

@grzanka grzanka commented Mar 25, 2023

PR description:

Backport of #41187

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 25, 2023

A new Pull Request was created by @grzanka (Leszek Grzanka) for CMSSW_13_0_X.

It involves the following packages:

  • CondFormats/PPSObjects (alca)
  • DQM/CTPPS (dqm)

@pmandrik, @emanueleusai, @tvami, @cmsbuild, @saumyaphor4252, @syuvivida, @rvenditti, @micsucmed, @francescobrivio can you please review it and eventually sign? Thanks.
@tocheng, @fabferro, @mmusich, @seemasharmafnal this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@francescobrivio
Copy link
Contributor

type ctpps

@francescobrivio
Copy link
Contributor

backport of #41187

@francescobrivio
Copy link
Contributor

test parameters:

  • workflows = 136.793,1043,1044

@francescobrivio
Copy link
Contributor

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-102eff/31598/summary.html
COMMIT: c287ebb
CMSSW: CMSSW_13_0_X_2023-03-26-0000/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/41188/31598/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 9 lines from the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 51
  • DQMHistoTests: Total histograms compared: 3555131
  • DQMHistoTests: Total failures: 8
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3555101
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 50 files compared)
  • Checked 221 log files, 166 edm output root files, 51 DQM output files
  • TriggerResults: no differences found

@emanueleusai
Copy link
Member

@grzanka this seems to be touching online DQM code. Can you confirm you need this deployed in DQM machines at P5?

@vavati
Copy link

vavati commented Mar 27, 2023

@emanueleusai Yes , the bug in DQM is preventing DQM -for strips to work; also the new xml for timing detector will be important online at least we can debug the detectors. So the deployment in P5 before the first stable beam (in fact before the alignment run) would be mandatory. Thanks

1 similar comment
@vavati
Copy link

vavati commented Mar 27, 2023

@emanueleusai Yes , the bug in DQM is preventing DQM -for strips to work; also the new xml for timing detector will be important online at least we can debug the detectors. So the deployment in P5 before the first stable beam (in fact before the alignment run) would be mandatory. Thanks

@tvami
Copy link
Contributor

tvami commented Mar 27, 2023

+alca

@tvami
Copy link
Contributor

tvami commented Mar 27, 2023

@cmsbuild ping

@perrotta
Copy link
Contributor

type bug-fix

@tvami
Copy link
Contributor

tvami commented Mar 28, 2023

urgent

  • as discussed at ORP this should go to 13_0_2

@emanueleusai
Copy link
Member

urgent

  • as discussed at ORP this should go to 13_0_2

This needs to be tested at P5. So it cannot be approved right away.

@vavati
Copy link

vavati commented Mar 28, 2023

@emanueleusai 1) for the xml part you can test it with any recent global run done with CTPPS_TOT partition included
2) for the bug fix in DQM: it is affecting Totem Strip which are rarely in run, that's why the bug was not spotted before. Recent global runs (test) are 365241 365244 with the DQM sequence "ctppsDQMCalibrationSource" active (otherwise these detectors are not reconstructed). Without the bugfix you should get error.
These detectors will be in data taking in the PPS alignment run in less than 2 weeks.

@rappoccio
Copy link
Contributor

Hi @emanueleusai, can the tests be done in parallel with the merge? We need to cut 13_0_2 ASAP (tomorrow would be ideal).

@emanueleusai
Copy link
Member

alright

@emanueleusai
Copy link
Member

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_13_0_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_13_1_X is complete. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@perrotta
Copy link
Contributor

+1

  • @emanueleusai in any case the fix to totemRPDQMSource_cfi.py is quite obvious, and I don't expect bad surprises from a possible test at P5

@cmsbuild cmsbuild merged commit 62d8071 into cms-sw:CMSSW_13_0_X Mar 29, 2023
@perrotta
Copy link
Contributor

  • @emanueleusai in any case the fix to totemRPDQMSource_cfi.py is quite obvious, and I don't expect bad surprises from a possible test at P5

... and with a fillDescriptions method in the TotemRPDQMSource plugin the bug would have been detected immediately, without even the need to fix it afterwards: please think about implementing it @grzanka (or someone else from @cms-sw/ctpps-dpg-l2)

@micsucmed
Copy link

Hello, I was about to test this PR in online playback DQM machines at P5, but there was a compilation error when adding the PR to 13_0_0 + PR's: 40920+41002+41049. I tried to compile without this PR and there is no issue, so could you take a look into this? Here is an example of the errors we are getting:

>> Compiling edm plugin /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/ROCmServices/plugins/ROCmService.cc
/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/ROCmServices/plugins/ROCmMonitoringService.cc:3:10: fatal error: hip/hip_runtime.h: No such file or directory
    3 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
>> Compiling  /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp
/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/ROCmServices/plugins/ROCmService.cc:8:10: fatal error: hip/hip_runtime.h: No such file or directory
    8 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
/cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/ROCmServices/test/testROCmService.cpp:7:10: fatal error: hip/hip_runtime.h: No such file or directory
    7 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.
>> Compiling  /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/ROCmServices/test/test_main.cpp
gmake: *** [tmp/slc7_amd64_gcc11/src/HeterogeneousCore/ROCmServices/plugins/HeterogeneousCoreROCmServicesPlugins/ROCmMonitoringService.cc.o] Error 1
>> Compiling edm plugin /cmsnfsdqmdata/dqmdata/dqm_cmssw/playback_0404_CMSSW_13_0_0_40920_41002_41049_41188/src/HeterogeneousCore/SonicTriton/plugins/TritonService.cc
gmake: *** [tmp/slc7_amd64_gcc11/src/HeterogeneousCore/ROCmServices/test/testROCmService/testROCmService.cpp.o] Error 1
>> Cuda Device Link tmp/slc7_amd64_gcc11/src/HeterogeneousTest/CUDADevice/plugins/HeterogeneousTestCUDADevicePlugins/HeterogeneousTestCUDADevicePlugins_cudadlink.o
gmake: *** [tmp/slc7_amd64_gcc11/src/HeterogeneousCore/ROCmServices/plugins/HeterogeneousCoreROCmServicesPlugins/ROCmService.cc.o] Error 1
>> Cuda Device Link tmp/slc7_amd64_gcc11/src/HeterogeneousTest/CUDAKernel/plugins/HeterogeneousTestCUDAKernelPlugins/HeterogeneousTestCUDAKernelPlugins_cudadlink.o
>> Building shared library tmp/slc7_amd64_gcc11/src/HeterogeneousTest/CUDAKernel/src/HeterogeneousTestCUDAKernel/libHeterogeneousTestCUDAKernel.so
Copying tmp/slc7_amd64_gcc11/src/HeterogeneousTest/CUDAKernel/src/HeterogeneousTestCUDAKernel/libHeterogeneousTestCUDAKernel.so to productstore area:
>> ROCM Device Code Obj tmp/slc7_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo.hip.cc.o.rocm_o
/opt/offline/slc7_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/bin/objcopy: 'tmp/slc7_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo.hip.cc.o': No such file
gmake: *** [tmp/slc7_amd64_gcc11/src/HeterogeneousTest/ROCmDevice/plugins/HeterogeneousTestROCmDevicePlugins/ROCmTestDeviceAdditionAlgo.hip.cc.o.rocm_o] Error 1
>> ROCM Device Code Obj tmp/slc7_amd64_gcc11/src/HeterogeneousTest/ROCmKernel/plugins/HeterogeneousTestROCmKernelPlugins/ROCmTestKernelAdditionAlgo.hip.cc.o.rocm_o
/opt/offline/slc7_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/bin/objcopy: 'tmp/slc7_amd64_gcc11/src/HeterogeneousTest/ROCmKernel/plugins/HeterogeneousTestROCmKernelPlugins/ROCmTestKernelAdditionAlgo.hip.cc.o': No such file
gmake: *** [tmp/slc7_amd64_gcc11/src/HeterogeneousTest/ROCmKernel/plugins/HeterogeneousTestROCmKernelPlugins/ROCmTestKernelAdditionAlgo.hip.cc.o.rocm_o] Error 1

@perrotta
Copy link
Contributor

perrotta commented Apr 4, 2023

@micsucmed this PR has evidently nothing to do with the issue you are encountering in the online DQM runs

@grzanka
Copy link
Contributor Author

grzanka commented Apr 4, 2023

testROCmService

To me this also has nothing in common with this PR.

After quick look it seems the closest PR related to this error is following commit:
5ebe8d8
and PR: #40832

This is however far from my knowledge of the framework

@syuvivida
Copy link
Contributor

syuvivida commented Apr 5, 2023

Hello, just to note that our deployment by default is using git cms-merge-topic 41188. With git cms-merge-topic, even though this PR #41188 has only 2 changes of files, we actually checkout 92 additional packages, including HeterogeneousCore.
If we use git cherry-pick, this pick up of other packages could be avoided. But this requires more work, since for the online DQM production, we need to deploy 4 PRs in total, and each cherry-pick would require us to find out which files are updated and checkout those packages one by one.

To avoid human-prone errors, we would like to propose NOT to deploy this PR, but wait until CMSSW_13_0_3 is deployed (which automatically merge this PR and other PRs we already included).

@fabferro
Copy link
Contributor

fabferro commented Apr 5, 2023

Hello, just to note that our deployment by default is using git cms-merge-topic 41188. With git cms-merge-topic, even though this PR #41188 has only 2 changes of files, we actually checkout 92 additional packages, including HeterogeneousCore. If we use git cherry-pick, this pick up of other packages could be avoided. But this requires more work, since for the online DQM production, we need to deploy 4 PRs in total, and each cherry-pick would require us to find out which files are updated and checkout those packages one by one.

To avoid human-prone errors, we would like to propose NOT to deploy this PR, but wait until CMSSW_13_0_3 is deployed (which automatically merge this PR and other PRs we already included).

I see the point. Fine to me. Provided that 13_0_3 comes out in a few days as discussed in last (emergency) ORP meeting.

@syuvivida
Copy link
Contributor

Dear CTPPS experts (@grzanka @fabferro @vavati ),
Thanks to the help of @francescobrivio , we built a playback release using cherry-pick to include this PR 41188, it's probably hard to check the output with the playback run files we have 365045 (since this run has CTPPS included in DAQ but standby). But just in case, could you please take a look of this DQMGUI and see if everything is as what you expected?
Thanks!!
https://cmsweb.cern.ch/dqm/online-playback/session/5dHDFL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.