Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge histograms in HGCAL Validation #30853

Closed
schneiml opened this issue Jul 21, 2020 · 19 comments
Closed

Huge histograms in HGCAL Validation #30853

schneiml opened this issue Jul 21, 2020 · 19 comments

Comments

@schneiml
Copy link
Contributor

Recently, we experienced some problems with DQMIO merge jobs running out of memory on Phase2 workflows (various threads on this issue exist). These problems are caused by two histograms:

HGCAL/HGCalSimHitsV/HitValidation/heeRecVsSimZ
HGCAL/HGCalSimHitsV/HitValidation/hefRecVsSimZ

Booked here: https://github.com/cms-sw/cmssw/blob/master/Validation/HGCalValidation/plugins/HGCalHitValidation.cc#L156-L163

Each consuming a significant fraction of the total memory used by the affected jobs.

Those huge histograms also triggered the bug ROOT-10927 [1] which lead us to uncover this issue.

It would be highly appreciated if these histograms could be replaced by less memory-hungry ones.

@srimanob FYI

[1] https://sft.its.cern.ch/jira/browse/ROOT-10927

@cmsbuild
Copy link
Contributor

A new Issue was created by @schneiml Marcel Schneider.

@Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

assign dqm, upgrade

@cmsbuild
Copy link
Contributor

New categories assigned: dqm,upgrade

@jfernan2,@andrius-k,@schneiml,@fioriNTU,@kmaeshima,@kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks

@srimanob
Copy link
Contributor

FYI @bsunanda @rovere @hatakeyamak

@rovere
Copy link
Contributor

rovere commented Jul 22, 2020

@apsallid @lecriste
would you mind taking a look and reducing the number of bins to a reasonable number?

@apsallid
Copy link
Contributor

@bsunanda @rovere @schneiml Is reducing in half (e.g. 8200 to 4100) a good choice for all sides or more is needed? I don't know if this is a trial and error procedure.

@bsunanda
Copy link
Contributor

bsunanda commented Jul 22, 2020 via email

@rovere
Copy link
Contributor

rovere commented Jul 22, 2020

@bsunanda the level or reduction should be derived from the usefulness and information that is supposed to be stored in the plots. If I were DQM convener, I'd refuse anything less than a factor 10 reduction in number of bins (for each axis). This is validation, not visualization.

@jfernan2
Copy link
Contributor

@apsallid at least factor 10 (or even 20 in line with the MEs for X and Y coordinates, unless Z is indeed as much as twice sensible) is requested yes, as @rovere pointed out, a factor 2 is totally insufficient. Please bear in mind the usual stats involved in a RelVal sample for validation and the stats you plan to get to populate those MEs.
As an example, for a TT 5k event RelVal:
https://tinyurl.com/yyrf53tx
there are many plots *VsSimX/Y/Z which seem to appear empty, apart from the two spotted which simply crash.
Please reduce hefdzVsZ and heedzVsZ too to avoid similar problems in the future. Something greater than 1k bins is more than enough
Thanks

@schneiml
Copy link
Contributor Author

The current version of these plots is so big that DQMGUI refuses to render them, as I suspected (the limit is 8MB IIRC, so somewhere around 1M-2M bins).

In general, we don't really care about a few thousand bins here and there (and that gives plenty of room to add all sorts of plots), but once it gets to millions of bins, memory consumption really starts to matter (no matter if it is in one huge plot or many small ones).

As Javi wrote, there are a bunch of other questionably large histograms, but those two really seriously stand out.

@jfernan2
Copy link
Contributor

Plots are now rendered in Jenkins tests of #30879

@kpedro88
Copy link
Contributor

kpedro88 commented Sep 3, 2020

Has this been improved sufficiently to close the issue?

@apsallid
Copy link
Contributor

apsallid commented Sep 4, 2020

I see that the plots are shown now in the DQM GUI in the latest RelVals with the ongoing 11_2_0_pre5_phase2 campaign. I don't know if the reduction in memory is sufficient from DQM side.

@kpedro88 @jfernan2

@rovere
Copy link
Contributor

rovere commented Sep 4, 2020

Ciao @kpedro88
I consider the issue as solved.

@jfernan2
Copy link
Contributor

jfernan2 commented Sep 5, 2020

Hi, from DQM side the solution is sufficient

@kpedro88
Copy link
Contributor

+upgrade

@jfernan2
Copy link
Contributor

+1

@jfernan2
Copy link
Contributor

This issue can be closed IMHO

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants