-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noisy ECAL DQM in gpu validation workflows #42720
Comments
assign dqm, ecal-dpg |
A new Issue was created by @mmusich Marco Musich. @Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
New categories assigned: dqm,ecal-dpg @tjavaid,@micsucmed,@nothingface0,@wang0jin,@rvenditti,@emanueleusai,@syuvivida,@thomreis,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks |
@mmusich Do we expect the ECAL GPU validation plots to be filled for these validation tests? If not, we can we suppress the number of warnings to avoid spam |
Yes, we do. That's the whole purpose of the validation workflow. |
assign heterogeneous |
@mmusich I think what we wanted to clarify was whether the GPU validation plots are indeed getting filled for these tests despite these warnings. If the plots are getting filled, then we could suppress the number of warnings to reduce the noise. |
That test uses DQM streamer files as input, which is exactly what the online DQM sees at P5. So the input collections should be adapted in order to compute with that. Also what about the relval validation workflow that uses regular repacked RAW data introduced in #42674?
I can't really comment on that. I would invite you to run one of these tests locally and check. Hope this helps. |
@mmusich I was able to fix a bug causing the warnings for the ECAL GPU validation task and verified it using the tests introduced in #42542, but I was unable to reproduce the error in #42674 on the lxplus-gpu node. I keep getting errors like this when trying to use CMSSW on the lxplus-gpu node:
Some of the warnings for #42674 don't seem to be related to the GPU validation module, so I would like to reproduce the error myself in order to find the culprit. |
When developing for #42674 I was using an |
I just noticed that some (all ?) |
I was able to run wf |
For wf
cmssw/DQM/EcalMonitorTasks/python/CollectionTags_cfi.py Lines 56 to 59 in bac24a4
Rec hit GPU validation was turned on in the GPU module by default, but it is manually turned off in the Online ECAL DQM GPU client
If we are only running the GPU validation module (except for rec hits), then the only input tag complaining is for
This collection is unused by the GPU module, but this input tag is always checked to run any ECAL DQM processing for historical reasons. The ECAL digis for GPU validation use the same collection, but the input tags include the cmssw/DQM/EcalMonitorTasks/python/CollectionTags_cfi.py Lines 48 to 51 in bac24a4
These tags are being used and are not complaining, so would it be okay to customize
cmssw/DQM/EcalMonitorTasks/python/CollectionTags_cfi.py Lines 42 to 43 in bac24a4
If they are being used here, are these the correct input tags? If not, then we don't need to worry about these. The warning should go away by only enabling the GPU module. |
hi @alejands, Does it work for the ECAL only configurations (.511 for MC) ? |
as far as I can tell, similar warnings are visible in .511 workflows in IBs too: https://cmssdt.cern.ch/SDT/cgi-bin/logreader/el9_amd64_gcc11/CMSSW_13_3_X_2023-09-06-2300/pyRelValMatrixLogs/run/11634.511_TTbar_14TeV+2021_Patatrack_ECALOnlyCPU/step3_TTbar_14TeV+2021_Patatrack_ECALOnlyCPU.log#/ |
I was able to modify one of the cfg files produced by WF For the ECAL Trigger Primitives, there are no emulated digi collections available, only the standard ones.
There are also no Reduced Ecal Rec Hits available. Fortunately for both of these cases, we can easily toggle these off and eliminate the warnings.
There is the question of at what level do we want to change these flags? (eg. in the ECAL DQM However, for the case of
It appears that this collection is more important for several ECAL DQM modules, but as far as I know, ECAL DQM is not directly in charge of which collections are produced. |
I guess this depends if these flags are useful in other setups (and this only ECAL DQM knows about)
maybe (but ECAL DPG at large certainly is). |
I have updated the ECAL unpacker CPUDigis module to produce dummy collections for the |
I implemented the changes discussed earlier in #42848. Before knowing about the fix for |
I believe this issue is now resolved and can be closed. |
It is lacking signatures from the involved groups |
+ecal-dpg |
When adding a new workflow for patatrack validation on 2023 data (see PR #42674) the ECAL DQM for the gpu task is very noisy, emitting several times per event this sort of warnings:
These appear also in the
DQM/Integration
unit tests for the ECAL GPU client (introduced in PR #42542 ) which is run in IBS:see e.g. a log for CMSSW_13_3_X_2023-09-04-1100
Could core DQM / ECAL DQM experts have a look?
The text was updated successfully, but these errors were encountered: