Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[14_0_X] Another backports to fix memory leak #46954

Open
wants to merge 6 commits into
base: CMSSW_14_0_X
Choose a base branch
from

Conversation

srimanob
Copy link
Contributor

@srimanob srimanob commented Dec 15, 2024

PR description:

To fix memory leak reported in #46901
I need to backport more PRs, to make it consistent with master.
This PR replaces #46942

This PR includes:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @srimanob for CMSSW_14_0_X.

It involves the following packages:

  • DQM/SiStripMonitorClient (dqm)
  • DQMOffline/Trigger (dqm)

@antoniovagnerini, @cmsbuild, @rseidita can you please review it and eventually sign? Thanks.
@Fedespring, @HuguesBrun, @arossi83, @cericeci, @fioriNTU, @idebruyn, @jandrea, @jhgoh, @missirol, @mmusich, @mtosi, @rociovilar, @sroychow, @threus, @trocino, @venturia this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 15, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

Pull request #46954 was updated. @antoniovagnerini, @atpathak, @cmsbuild, @consuegs, @perrotta, @rseidita can you please check and sign again.

@srimanob
Copy link
Contributor Author

assign alca

@cmsbuild
Copy link
Contributor

Pull request #46954 was updated. @antoniovagnerini, @atpathak, @cmsbuild, @consuegs, @perrotta, @rseidita can you please check and sign again.

@srimanob
Copy link
Contributor Author

@cmsbuild please test

@srimanob
Copy link
Contributor Author

@cmsbuild abort

@srimanob
Copy link
Contributor Author

hold

@cmsbuild
Copy link
Contributor

Pull request has been put on hold by @srimanob
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

This reverts commit 5711cc9.
@cmsbuild
Copy link
Contributor

Pull request #46954 was updated. @antoniovagnerini, @atpathak, @cmsbuild, @consuegs, @perrotta, @rseidita can you please check and sign again.

@cmsbuild
Copy link
Contributor

Pull request #46954 was updated. @antoniovagnerini, @atpathak, @cmsbuild, @consuegs, @jfernan2, @mandrenguyen, @perrotta, @rseidita can you please check and sign again.

@cmsbuild
Copy link
Contributor

Pull request #46954 was updated. @antoniovagnerini, @atpathak, @cmsbuild, @consuegs, @jfernan2, @mandrenguyen, @perrotta, @rseidita can you please check and sign again.

@srimanob
Copy link
Contributor Author

@cmsbuild please test

@srimanob srimanob changed the title [14_0] Another backports to fix memory leak [14_0_X] Another backports to fix memory leak Dec 15, 2024
@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-852b27/43458/summary.html
COMMIT: 8abe08c
CMSSW: CMSSW_14_0_X_2024-12-15-0000/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46954/43458/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 89 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 46 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3349879
  • DQMHistoTests: Total failures: 391
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3349467
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.016 KiB( 47 files compared)
  • DQMHistoSizes: changed ( 1000.0 ): -0.008 KiB MessageLogger/Errors
  • DQMHistoSizes: changed ( 1000.0 ): -0.008 KiB MessageLogger/Warnings
  • Checked 202 log files, 165 edm output root files, 48 DQM output files
  • TriggerResults: found differences in 1 / 46 workflows

@jfernan2
Copy link
Contributor

+1

@perrotta
Copy link
Contributor

@srimanob as far as I can see #46574 was never backported to CMSSW_14_1_X. Would you mind preparing also that backport, in order to avoid holes in the release cycles?

@perrotta
Copy link
Contributor

+alca

@srimanob
Copy link
Contributor Author

Hi @perrotta
Here is the backport of 46574,
#46968

@srimanob
Copy link
Contributor Author

@mmusich Would you mind to re-check if I miss something, i.e. bug fixes that came later? Thanks very much.

@mmusich
Copy link
Contributor

mmusich commented Dec 17, 2024

@srimanob I have several problems with this PR:

  • is it really necessary? I am not sure all the fixes in master got tested / used yet
  • the commit history is kind of messy (there is a revert in it)
  • the original author of the commit doesn't appear in the git history

@mmusich
Copy link
Contributor

mmusich commented Dec 17, 2024

to be clear about my comments earlier today at the ORP meeting for this PR.

I think actually

could be useful to have -- but they have nothing to do with memory leaks or memory reduction.
The only point that I am wary about it is

#46942 (comment), or quoting Andrea:

on the other hand, 2024 MC production has already started with it, as far as I know, and it could be that some of the samples will finally add that info, and some (those processed with releases prior to the one in which this PR will get merged) will not. This is propably acceptable (either you have those ectra GEN.SIM infos or not, but nothing vhanges otherwise), but I'd let it to be decided by you (O&C) and PPD.

Could @cms-sw/ppd-l2 chime in about potentially inconsistent (ALCARECO) event contents in the same campaign?

What I am bit more cautious about is the backport of #46574. That offers some advantage in terms of overall memory usage, but I would feel more confident about it if it was passed first through a full release validation.

@srimanob
Copy link
Contributor Author

@antoniovilela @mandrenguyen
New PR with only backport of #46530 can be found at
#46978

@malbouis
Copy link
Contributor

Could @cms-sw/ppd-l2 chime in about potentially inconsistent (ALCARECO) event contents in the same campaign?

Hi @mmusich , I am not sure how this PR for MC AlCaReco would improve the memory consumption for the ReRECO. So in principle I'd say let's not include it, just because it relates to MC and it doesn't seem to relate to memory consumption. As for your concern, I think the best would be to not add a non-validated PR to a running MC production release indeed, although this one would only concern AlCaRecos which have a limited MC production, afaik.

What I am bit more cautious about is the backport of #46574. That offers some advantage in terms of overall memory usage, but I would feel more confident about it if it was passed first through a full release validation.

Was there a Replay to test it?

In any case, I see @srimanob already made a new PR with just the backport of 46530.

@mmusich
Copy link
Contributor

mmusich commented Dec 18, 2024

I am not sure how this PR for MC AlCaReco would improve the memory consumption for the ReRECO. So in principle I'd say let's not include it, just because it relates to MC and it doesn't seem to relate to memory consumption.

@malbouis I think there is a misunderstanding
#45357 + #45385 (as I wrote above) are not going to improve memory consumption at all (thus they don't belong to this PR).
On the other hand they are just adding some GEN level collections to the ALCARECO data-tier in MC and those added collections are in principle quite useful for both a Tracker Alignment point of view and also in view of precision electroweak physics analysis using early Run3 data - assuming MC samples are going to be created in 14.0.X for 2022, 2023, 2024 for the foreseeable future (adding them now would avoid to have user-specific campaigns to create samples to be used for that later on). So all in all I would be in favor of backporting them - in another PR. The question (to you as PPD) is if you think that's acceptable or not as some of the ALCARECO samples within the current campaign would have a slightly different event content.

Was there a Replay to test it?

I don't think so (as far as I know no replay in 14_2_X occurred yet).

@malbouis
Copy link
Contributor

malbouis commented Dec 19, 2024

Hi @mmusich, since this PR was about memory consumption, I assumed the questions were related to that only. But I see now there is this additional request of backporting this AlCaReco PR for other reasons.
In principle, as I mentioned above, since it only touches MC AlCaRecos, I wouldn't see a problem, and as far as I know, no AlCaRecos have been produced with official Summer24 production yet, so we'd be fine wrt to the content. It's a pity this backport wasn't requested with the original PR as it could have entered the new patch release just created. I'd say please go ahead, submit the backport when you have the chance and we will discuss in the PR. In principle, I see no big problem apart from not undergoing the validation campaign, but considering this touches only the MC alcarecos which have a very reduced usage in comparison to the standard MC production, I'd think it's ok. We could do a small validation within the TK group, with a couple of RelVal samples. What do you think?

@mmusich
Copy link
Contributor

mmusich commented Dec 20, 2024

Hi Helena, @malbouis,

in principle, as I mentioned above, since it only touches MC AlCaRecos, I wouldn't see a problem, and as far as I know, no AlCaRecos have been produced with official Summer24 production yet, so we'd be fine wrt to the content.

thanks for the confirmation: I opened #47016 to that effect.

It's a pity this backport wasn't requested with the original PR as it could have entered the new patch release just created.

I agree and I apologize for the oversight, I originally meant to backport it to 14.0.X long time ago (as in the original PR description #45357 (comment), but somehow I managed to forget about it).

We could do a small validation within the TK group, with a couple of RelVal samples. What do you think?

sounds perfect to me. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants