-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in DQM Harvesting step with EgHLTOfflineClient #38970
Comments
A new Issue was created by @rvenditti . @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign dqm |
FYI @cms-sw/hlt-l2 @cms-sw/egamma-pog-l2 |
New categories assigned: dqm @jfernan2,@ahmad3213,@micsucmed,@rvenditti,@emanueleusai,@pmandrik you have been requested to review this Pull request/Issue and eventually sign? Thanks |
I've seen the Btw, looking at the log files of the 2 runs, I see several other error messages. For example:
Could any of these trigger the memory issue? @rvenditti [1] https://cms-talk.web.cern.ch/t/replay-for-testing-the-run-3-collisions-setup/10676 |
Hi @swagata87 thanks for the comment, indeed we have asked also to CTPPS experts #38969 to have a look. @germanfgv are there any other files to be checked in the job report folder from which we can access the stack trace for this job? |
@rvenditti we don't have access to the stack trace of the job at the moment of termination. You can find 3 sets of log files in the tarball:
Other than that, nomore information is available |
Maybe (re-)stating the obvious: the HLT-related warnings are unrelated to the main issue, i.e. #38976. I had a look at the warnings, and I think their origin is clear: the Harvesting modules in question, i.e. instances of In this particular example (and this is maybe not true in other cases), I think one could try to improve these Harvesting modules by extracting the relevant filter/path names based on the available input histograms; this way, the module could work both (1) on DQMIO inputs and (2) when DQM+Harvesting run on EDM inputs. Before updating the plugins though, it should probably be clarified whether these plugins are actually important and worth updating. This can only be answered by EGM (@swagata87) and DQM experts. (Reminder: the workflows of the HLT offline-DQM are maintained by DQM, and mostly developed by POGs; they are not under the direct watch of HLT L2s.) |
As a follow up of Express job killed at T0 for memory issues at harvesting step in runs 356381 and 356615 link we found that the log file shows some problem in HLT-Egamma client:
The message is
%MSG-e HLTConfigProvider: EgHLTOfflineClient:egHLTOffDQMClient@beginRun 29-Jul-2022 10:57:14 CEST Run: 356381
Falling back to ProcessName-only init using ProcessName 'HLT' !
%MSG
%MSG-e HLTConfigProvider: EgHLTOfflineClient:egHLTOffDQMClient@beginRun 29-Jul-2022 10:57:14 CEST Run: 356381
Process name 'HLT' not found in registry!
%MSG
We believe that this could lead to the memory issue seen in the Express reconstruction. Can HLT DQM experts have a look?
The text was updated successfully, but these errors were encountered: