Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage in AlCaLumiPixelsCounts jobs for run 382300 #45306

Closed
davidlange6 opened this issue Jun 25, 2024 · 25 comments
Closed

Memory usage in AlCaLumiPixelsCounts jobs for run 382300 #45306

davidlange6 opened this issue Jun 25, 2024 · 25 comments

Comments

@davidlange6
Copy link
Contributor

Tier-0 reports several jobs with high memory usage in run 382300. One example that reproduces is

/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024F/AlCaHarvest/job_863561/02ce5b03-cdf6-4215-95c4-e4b3ef3ed8c1-0-1-logArchive.tar.gz

which goes to 3+ GB of RSS very quickly (eg, the start of event processing) and peaks around 6 GB.

This is writing 3 output files with (iiuc) a total of about 200 MB per lumi section and no event data

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 25, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @davidlange6.

@antoniovilela, @Dr15Jones, @sextonkennedy, @smuzaffar, @makortel, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor

assign alca

@cmsbuild
Copy link
Contributor

New categories assigned: alca

@saumyaphor4252,@perrotta,@consuegs you have been requested to review this Pull request/Issue and eventually sign? Thanks

@davidlange6
Copy link
Contributor Author

i'm feeling confused - is this application doing more than copying out parts of the lumiblock information into a new edm file? [eg, removing the TriggerResults event products and some of the lumiblock products]?

Eg, the outputs appear to share common lumiproducts and are basically the same size as the input.. For example output file copies out

*Br    7 :recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.obj : *
*         | reco::PixelClusterCounts                                         *
*Entries :        3 : Total  Size= 1894371109 bytes  File Size  =  296267534 *
*Baskets :        3 : Basket Size=    4693387 bytes  Compression=   6.39     *
*............................................................................*
*Br    8 :recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.present : *
*         | Bool_t                                                           *
*Entries :        3 : Total  Size=       1248 bytes  File Size  =        471 *
*Baskets :        3 : Basket Size=       9386 bytes  Compression=   1.00     *
*............................................................................*

@Dr15Jones
Copy link
Contributor

@davidlange6 Just to reiterate what @davidlange6 found, when we read back the LuminosityBlock, the reco::PixelClusterCounts object stored in the lumi requires on average 1.9GB/ 3 (averaging the total in memory size reported by ROOT by the 3 lumis in the file) so > 600MB. At the file boundary, the framework doesn't know if the new file being read contains more of the same LuminosityBlock from the last file we read so the framework has all the LuminosityBlock products from the last file in memory at the same time as it is reading the data products from the LuminosityBlock from the new file. So it needs ~ 1.2 GB or so for this.

If the reco::PixelClusterCounts for the different LuminosityBlocks are not roughly the same size (say one is 2x bigger than the others) then the memory requirements can get even worse.

It seems like reco::PixelClusterCounts is holder data PER EVENT which scales poorly as the number of events in a LuminosityBlock increases.

@davidlange6
Copy link
Contributor Author

@Dr15Jones - i do not think there is any per event data there. PixelClusterCounts is effectively holding two 2d histograms (hits per bx per roc/module) and and a 1D histogram (of events per bx).

@duff-ae
Copy link
Contributor

duff-ae commented Jun 26, 2024

@Dr15Jones @davidlange6 Dear all, BRIL RC is here. David is correct, we don't store per-event data, because for luminosity we are interested only in "effective" rates for every bx which we can later rescale to the luminosity. The biggest change compared to the previous release is the per roc data which increased the event and LS size. The data is extremely useful for precision luminosity measurement. We can try to remove some modules or update the thresholds to decrease the event size. But it would be helpful if you could provide us with some realistic "target" Tier-0 could tolerate.

@davidlange6
Copy link
Contributor Author

Ok, so we've understood what is new and creating problems.

Why is it useful to split the data from the input file into three pieces (eg, a data per lumi product)? Or do I miss some other functionality happening in this process?

@Dr15Jones
Copy link
Contributor

I made a trivial 'auditing' analyzer or PixelClusterCounting and had it dump information each Lumi. For the files in question, the dumps were relatively consistent with values like

%MSG-s PixelClusterCountsAudit:  PixelClusterCountsAuditor:audit@beginLumi  26-Jun-2024 16:47:23 CEST Run: 382300 Lumi: 17
Branch: recoPixelClusterCounts_alcaPCCIntegratorRandom_alcaPCCRandom_RECO.
 readCounts: 6400944
 readRocCounts: 151398720
 readEvents: 3564
 readModID: 1796
 readRocID: 42480
%MSG
%MSG-s PixelClusterCountsAudit:  PixelClusterCountsAuditor:audit@beginLumi  26-Jun-2024 16:47:23 CEST Run: 382300 Lumi: 17
Branch: recoPixelClusterCounts_alcaPCCIntegratorZeroBias_alcaPCCZeroBias_RECO.
 readCounts: 6400944
 readRocCounts: 151423668
 readEvents: 3564
 readModID: 1796
 readRocID: 42487
%MSG

give the values are int which are 4 bytes in size, that is ~600MB for each readRocCounts.

@duff-ae
Copy link
Contributor

duff-ae commented Jun 26, 2024

@davidlange6 David, maybe I am missing something, what is the third file? I thought there were 2 files: for Zero-bias and Random data. I don't understand why they are the same.

@davidlange6
Copy link
Contributor Author

Maybe the third thing Is different, I did not check. I mean

process.ALCARECOStreamAlCaPCCRandomOutPath,
process.ALCARECOStreamAlCaPCCZeroBiasOutPath,
process.ALCARECOStreamRawPCCProducerOutPath

Ah - the output of ALCARECOStreamRawPCCProducerOutPath is indeed small (2% of the others)

@duff-ae
Copy link
Contributor

duff-ae commented Jun 27, 2024

@davidlange6 We have identified a few possible solutions to reduce the number of entries and will try to implement them as soon as possible. However, I have two questions:

  1. What should be the target rate reduction factor to safely operate Tier0?

  2. How much time do we realistically have to implement this fix?

I understand the urgency of finding a solution, but we want to avoid making any physically unmotivated cuts. Sorry for any inconveniences caused.

@davidlange6
Copy link
Contributor Author

what difference would a rate change make? These objects are presumably roughly the same size regardless of having 0.1Hz or 2000 Hz, no?

As I asked above, do we need the processing step at all? (maybe something to discuss with all groups on Monday's joint ops meeting)

@duff-ae
Copy link
Contributor

duff-ae commented Jun 27, 2024

apologies for the confusion, I didn't mean trigger rates. I meant we could mask, for instance, some of the BPix inner-most layers which might be less useful for us, and it can already decrease object size multiply. Or adjust the threshold to cut some potentially noisy pixels, and so on. But it would be really helpful if you could have some estimations on the reduction factor for the object (2 times? 10?)

@davidlange6
Copy link
Contributor Author

Not so much for me to answer - but nominally this workflow should run in 2GB and currently takes ~6.

@Dr15Jones
Copy link
Contributor

Personally I’d say this data product should take less than 100MB (that would be 25M entries in the vector) and preferably closer to 10MB.

@duff-ae
Copy link
Contributor

duff-ae commented Jun 30, 2024

I've prepared a fix that should reduce readRocCounts by 3564 (effectively removing per bx granularity). readCounts will remain unchanged. PR: #45348

@germanfgv
Copy link
Contributor

The unmerged files that were the input of the original job will be removed by the usual Tier0 workflow. I copied them to this location so they can be used for testing the fix:

/eos/user/c/cmst0/public/PausedJobs/Run2024F/AlCaHarvest/input

@duff-ae
Copy link
Contributor

duff-ae commented Jul 9, 2024

Dear all, the patch went to the CMSSW_14_0_11 release. Once it is tested at T0, please let us know if it resolves the issue.

@makortel
Copy link
Contributor

makortel commented Aug 7, 2024

@cms-sw/alca-l2 Since #45348 and #45369 have been merged (long ago), I guess we could close this issue?

@srimanob
Copy link
Contributor

Kindly ping @cms-sw/alca-l2 to sign and close the issue. Thanks.

@perrotta
Copy link
Contributor

+alca

@cmsbuild
Copy link
Contributor

This issue is fully signed and ready to be closed.

@makortel
Copy link
Contributor

@cmsbuild, please close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants