Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete unscheduled modules that are not consumed by any scheduled module #29553

Merged
merged 26 commits into from
Dec 2, 2020

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Apr 24, 2020

PR description:

This PR makes the framework to delete unscheduled EDModules whose output products are not consumed by any scheduled EDModule. For motivation see #26438 (which this PR mostly resolves).

Here "unscheduled" means that such EDModules are only in Tasks, i.e. not directly in any Path or EndPath, and "scheduled" means an EDModule in a Path or EndPath. If the module data dependence graph has a subgraph of unscheduled modules whose output products are not consumed by any scheduled module, those unscheduled modules are deleted.

A possible data dependence from a scheduled module from a SubProcess is also enough to keep an otherwise non-consumed EDModule alive. For example, if the Process and its SubProcess both have a module foo, and only the SubProcess has a consumer with an InputTag("foo"), the module foo in the Process is kept as well for the case that the SubProcess's foo would not produce a product for some events (assuming the product types and instance names also match).

The data dependencies are checked also for Lumi and Run products in addition of Event products, for which the EDConsumerBase::modulesWhoseProductsAreConsumed() and PathsAndConsumesOfModules were extended. They were extended also to carry the dependencies from the SubProcesses (it turned out that passing pairs of module label and process name are sufficient, the product type and instance name get handled automatically).

The non-consumed modules are deleted after preallocate signal and before beginJob() signal. Two new signals, preModuleDestruction and postModuleDestruction are added for those deletions (the "normal" module deletion at the end of the cmsRun process do not emit these signals). I took a look of all modules watching either preModuleConstruction or postModuleConstruction signals, and added watch for the destruction signals if it seemed to make sense.

Note that even if the modules are deleted, the corresponding Workers are kept alive. The Workers are removed from the containers of unscheduled and accumulator modules, so they won't get any further framework transitions after the deletion.

Resolves #26438.

PR validation:

Framework unit tests run.

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29553/14842

  • This PR adds an extra 380KB to repository

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel (Matti Kortelainen) for master.

It involves the following packages:

FWCore/Framework
FWCore/Integration
FWCore/MessageService
FWCore/ServiceRegistry
FWCore/Services
HeterogeneousCore/CUDAServices
IOMC/RandomEngine

@makortel, @smuzaffar, @cmsbuild, @Dr15Jones, @fwyzard can you please review it and eventually sign? Thanks.
@wddgit, @fabiocos, @rovere, @fwyzard this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@makortel
Copy link
Contributor Author

@Dr15Jones @wddgit Please review

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 24, 2020

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/5845/console Started: 2020/04/24 04:09

for (auto const& description : unusedModules) {
l << "\n " << description->moduleLabel();
}
});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to keep this printout (and the corresponding one in the SubProcess.cc) enabled for a couple of IBs (to ease debugging in case problems occur), and then demote them to LogInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. In the future probably would want an easy way to make this list easily visible. Maybe via the Tracer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracer already prints out the deleted modules (between "ModuleConstruction" and "BeginJob"). Also with LogInfo the "DeleteModules" messages can be enabled purely in the configuration.

@makortel
Copy link
Contributor Author

@wddgit

The data dependencies are checked also for Lumi and Run products in addition of Event products, for which the EDConsumerBase::modulesWhoseProductsAreConsumed() and PathsAndConsumesOfModules were extended.

These checks need to be extended for ProcessBlock when we get there, but I'd expect it to be straightforward.

@cmsbuild
Copy link
Contributor

-1

Tested at: 69596df

CMSSW: CMSSW_11_1_X_2020-04-23-1100
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d21d/5845/summary.html

I found follow errors while testing this PR

Failed tests: UnitTests RelVals AddOn

  • Unit Tests:

I found errors in the following unit tests:

---> test runtestPhysicsToolsPatAlgos had ERRORS
---> test runtestTqafExamples had ERRORS
---> test testJetMETCorrectionsType1MET had ERRORS
---> test runtestTqafTopEventSelection had ERRORS
---> test runtestTqafTopHitFit had ERRORS
---> test runtestTqafTopEventProducers had ERRORS
---> test runtestTqafTopTools had ERRORS
---> test runtestTqafTopJetCombination had ERRORS
---> test runtestTqafTopKinFitter had ERRORS
---> test runtestRecoTauTagRecoTau had ERRORS

  • RelVals:

The relvals timed out after 4 hours.
When I ran the RelVals I found an error in the following workflows:
4.22 step2

runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/step2_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC.log

8.0 step3
runTheMatrix-results/8.0_BeamHalo+BeamHalo+DIGICOS+RECOCOS+ALCABH+HARVESTCOS/step3_BeamHalo+BeamHalo+DIGICOS+RECOCOS+ALCABH+HARVESTCOS.log

140.56 step2
runTheMatrix-results/140.56_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18/step2_RunHI2018+RunHI2018+RECOHID18+HARVESTDHI18.log

136.731 step3
runTheMatrix-results/136.731_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2/step3_RunSinglePh2016B+RunSinglePh2016B+HLTDR2_2016+RECODR2_2016reHLT_skimSinglePh_HIPM+HARVESTDR2.log

136.793 step3
runTheMatrix-results/136.793_RunDoubleEG2017C+RunDoubleEG2017C+HLTDR2_2017+RECODR2_2017reHLT_skimDoubleEG_Prompt+HARVEST2017/step3_RunDoubleEG2017C+RunDoubleEG2017C+HLTDR2_2017+RECODR2_2017reHLT_skimDoubleEG_Prompt+HARVEST2017.log

136.874 step3
runTheMatrix-results/136.874_RunEGamma2018C+RunEGamma2018C+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Offline_L1TEgDQM+HARVEST2018_L1TEgDQM/step3_RunEGamma2018C+RunEGamma2018C+HLTDR2_2018+RECODR2_2018reHLT_skimEGamma_Offline_L1TEgDQM+HARVEST2018_L1TEgDQM.log

1325.7 step2
runTheMatrix-results/1325.7_TTbar_13_94Xv2NanoAODINPUT+TTbar_13_94Xv2NanoAODINPUT+NANOEDMMC2017_94XMiniAODv2+HARVESTNANOAODMC2017_94XMiniAODv2/step2_TTbar_13_94Xv2NanoAODINPUT+TTbar_13_94Xv2NanoAODINPUT+NANOEDMMC2017_94XMiniAODv2+HARVESTNANOAODMC2017_94XMiniAODv2.log

1000.0 step2
runTheMatrix-results/1000.0_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT/step2_RunMinBias2011A+RunMinBias2011A+TIER0+SKIMD+HARVESTDfst2+ALCASPLIT.log

1001.0 step2
runTheMatrix-results/1001.0_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVDSIPIXELCALRUN1+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5/step2_RunMinBias2011A+RunMinBias2011A+TIER0EXP+ALCAEXP+ALCAHARVDSIPIXELCALRUN1+ALCAHARVD1+ALCAHARVD2+ALCAHARVD3+ALCAHARVD4+ALCAHARVD5.log

1306.0 step3
runTheMatrix-results/1306.0_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15/step3_SingleMuPt1_UP15+SingleMuPt1_UP15+DIGIUP15+RECOUP15+HARVESTUP15.log

9.0 step4
runTheMatrix-results/9.0_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST/step4_Higgs200ChargedTaus+Higgs200ChargedTaus+DIGI+RECO+HARVEST.log

1330.0 step3
runTheMatrix-results/1330.0_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM+NANOUP15/step3_ZMM_13+ZMM_13+DIGIUP15+RECOUP15_L1TMuDQM+HARVESTUP15_L1TMuDQM+NANOUP15.log

25.0 step4
runTheMatrix-results/25.0_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT/step4_TTbar+TTbar+DIGI+RECOAlCaCalo+HARVEST+ALCATT.log

158.0 step3
runTheMatrix-results/158.0_HydjetQ_B12_5020GeV_2018_ppReco+HydjetQ_B12_5020GeV_2018_ppReco+DIGIHI2018PPRECO+RECOHI2018PPRECO+ALCARECOHI2018PPRECO+HARVESTHI2018PPRECO/step3_HydjetQ_B12_5020GeV_2018_ppReco+HydjetQ_B12_5020GeV_2018_ppReco+DIGIHI2018PPRECO+RECOHI2018PPRECO+ALCARECOHI2018PPRECO+HARVESTHI2018PPRECO.log

4.53 step4
runTheMatrix-results/4.53_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT/step4_RunPhoton2012B+RunPhoton2012B+HLTD+RECODR1reHLT+HARVESTDR1reHLT.log

10024.0 step3
runTheMatrix-results/10024.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017+ALCAFull_2017+NanoFull_2017/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017_GenSimFull+DigiFull_2017+RecoFull_2017+HARVESTFull_2017+ALCAFull_2017+NanoFull_2017.log

25202.0 step3
runTheMatrix-results/25202.0_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25/step3_TTbar_13+TTbar_13+DIGIUP15_PU25+RECOUP15_PU25+HARVESTUP15_PU25+NANOUP15_PU25.log

10824.0 step3
runTheMatrix-results/10824.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2018_GenSimFull+DigiFull_2018+RecoFull_2018+HARVESTFull_2018+ALCAFull_2018+NanoFull_2018/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2018_GenSimFull+DigiFull_2018+RecoFull_2018+HARVESTFull_2018+ALCAFull_2018+NanoFull_2018.log

11634.0 step3
runTheMatrix-results/11634.0_TTb/bin/sh:/step3_TTbar_14TeV+TTbar_14TeV_TuneCP5_2023_GenSimFull+DigiFull_2023+RecoFull_2023+HARVESTFull_2023+ALCAFull_2023.log

12434.0 step3
runTheMatrix-results/12434.0_TTbar_14TeV+TTbar_14TeV_TuneCP5_2023_GenSimFull+DigiFull_2023+RecoFull_2023+HARVESTFull_2023+ALCAFull_2023/step3_TTbar_14TeV+TTbar_14TeV_TuneCP5_2023_GenSimFull+DigiFull_2023+RecoFull_2023+HARVESTFull_2023+ALCAFull_2023.log

10224.0 step3
runTheMatrix-results/10224.0_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFull+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU+NanoFull_2017PU/step3_TTbar_13+TTbar_13TeV_TuneCUETP8M1_2017PU_GenSimFull+DigiFullPU_2017PU+RecoFullPU_2017PU+HARVESTFullPU_2017PU+NanoFull_2017PU.log

20034.0 step4
runTheMatrix-results/20034.0_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D35_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D35+RecoFullGlobal_2026D35+HARVESTFullGlobal_2026D35/step4_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D35_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D35+RecoFullGlobal_2026D35+HARVESTFullGlobal_2026D35.log

21234.0 step3
runTheMatrix-results/21234.0_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D44_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D44+RecoFullGlobal_2026D44+HARVESTFullGlobal_2026D44/step3_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D44_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D44+RecoFullGlobal_2026D44+HARVESTFullGlobal_2026D44.log

20434.0 step4
runTheMatrix-results/20434.0_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D41_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D41+RecoFullGlobal_2026D41+HARVESTFullGlobal_2026D41/step4_TTbar_14TeV+TTbar_14TeV_TuneCP5_2026D41_GenSimHLBeamSpotFull14+DigiFullTrigger_2026D41+RecoFullGlobal_2026D41+HARVESTFullGlobal_2026D41.log

250202.181 step4
runTheMatrix-results/250202.181_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25/step4_TTbar_13UP18+TTbar_13UP18+PREMIXUP18_PU25+DIGIPRMXLOCALUP18_PU25+RECOPRMXUP18_PU25+HARVESTUP18_PU25.log

23234.0 N/A

  • AddOn:

I found errors in the following addon tests:

AddOnTest might have timed out: FAILED - secs

@cmsbuild
Copy link
Contributor

Comparison not run due to runTheMatrix errors (RelVals and Igprof tests were also skipped)

@fwyzard
Copy link
Contributor

fwyzard commented Apr 24, 2020

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 30, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+1
Tested at: c38022d
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d21d/11156/summary.html
CMSSW: CMSSW_11_3_X_2020-11-30-1100
SCRAM_ARCH: slc7_amd64_gcc900

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d5d21d/11156/summary.html

Comparison Summary:

  • You potentially added 8674 lines to the logs
  • Reco comparison results: 9 differences found in the comparisons
  • DQMHistoTests: Total files compared: 35
  • DQMHistoTests: Total histograms compared: 2529593
  • DQMHistoTests: Total failures: 16
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 2529554
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 34 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 148 log files, 34 edm output root files, 35 DQM output files

@makortel
Copy link
Contributor Author

makortel commented Dec 1, 2020

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 1, 2020

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@silviodonato
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prune non-consumed EDProducers that are only in Tasks
7 participants