Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add early delete customisation for mkFit temporary data products #36478

Merged
merged 1 commit into from
Jan 5, 2022

Conversation

makortel
Copy link
Contributor

PR description:

Most of the event data products produced by mkFit EDProducers have temporary nature (i.e. not consumed outside of iterative tracking, other data products to not make any references to those products). This PR suggests to delete them early, i.e. as soon as their consuming EDModules have been run in order to save some memory.

PR validation:

Workflow 11824.21 step 3 runs. Here are IgProf MEM_LIVE profiles on one event of that job before
https://mkortela.web.cern.ch/mkortela/cgi-bin/navigator/mkfit_earlydelete_1220pre3/test_07.5_live/129
and after
https://mkortela.web.cern.ch/mkortela/cgi-bin/navigator/mkfit_earlydelete_1220pre3/test_27.5_live/405
(difference is ~40 MB for that event)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-36478/27340

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

  • Configuration/StandardSequences (operations)
  • RecoTracker/Configuration (reconstruction)

@perrotta, @clacaputo, @cmsbuild, @slava77, @jpata, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@mtosi, @fabiocos, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @Martin-Grunewald, @missirol, @lecriste, @gpetruc, @ebrondol, @mmusich, @dgulhan, @slomeo this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f5a0c2/21235/summary.html
COMMIT: 1984de5
CMSSW: CMSSW_12_3_X_2021-12-13-1100/slc7_amd64_gcc900
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/36478/21235/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 42
  • DQMHistoTests: Total histograms compared: 3250704
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3250676
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 41 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 177 log files, 37 edm output root files, 42 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Dec 20, 2021

Thanks for this!
For my understanding, is this memory gain per process, or per thread?
Also, what would be the way to check or know if there are additional products that can be cleaned up early?

@makortel
Copy link
Contributor Author

For my understanding, is this memory gain per process, or per thread?

The gain is per stream, and of course depends on event complexity (the particular event shown in the IgProf profiles may or may not be representative). In addition, the gain in peak RSS/PSS will likely be smaller because the memory still needs to be allocated, even if for "short" amount of time.

Also, what would be the way to check or know if there are additional products that can be cleaned up early?

We have a long-standing issue #16481 for automating it at the framework level, but so far it has not been deemed important enough compared to other priorities (although I still hope to be able to address it at some point). For this manual version a good candidate products would have the following properties

  • take "significant amount" of memory
  • other products do not make edm::Ref/edm::Ptr/raw pointer/etc to it
    • although to limited extent the python code is able to deal with them, but one needs to now exactly how they propagate, and thus the setup is fragile
  • the producer and consumers are temporally localized
    • e.g. adding products consumed by DQM(/validation) might not be worth of the effort

I.e. in general one would have to do rather detailed analysis, and even then the setup would be somewhat fragile.

@jpata
Copy link
Contributor

jpata commented Jan 3, 2022

@cmsbuild please test

(I could have signed it before the holidays but it slipped through the cracks, and now the test results are gone, so let's rerun one more time to make sure)

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 3, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-f5a0c2/21515/summary.html
COMMIT: 1984de5
CMSSW: CMSSW_12_3_X_2022-01-02-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/36478/21515/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 43
  • DQMHistoTests: Total histograms compared: 3461692
  • DQMHistoTests: Total failures: 5
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3461664
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.004 KiB( 42 files compared)
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 181 log files, 42 edm output root files, 43 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented Jan 4, 2022

+reconstruction

  • technical, removes temporary/non-persisted mkfit objects from memory early
  • no reco differences expected/observed

@qliphy
Copy link
Contributor

qliphy commented Jan 5, 2022

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 5, 2022

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit 174d797 into cms-sw:master Jan 5, 2022
@makortel makortel deleted the earlyDeleteMkFit branch January 5, 2022 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants