Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPS Full simulation integration #29916

Closed
wants to merge 16 commits into from
Closed

Conversation

mundim
Copy link
Contributor

@mundim mundim commented May 19, 2020

PR description:

Integration of PPS full simulation into cmssw. This is the follow up from the closed PR #28190.

PR validation:

scram b runtests, scram b code-checks and scram b code-format have been run again, with no issue

As for runTheMatrix, some observations need to be done:

runTheMatrix --job-reports --ibeos -j 15 -l limited -i all produced the outout

tests with errors:
4.22, 5.1, 101.0, 135.4, 8.0, 136.7611, 136.8311, 136.88811, 140.53, 1001.0 and 1325.7

tests that failed in the first attempt, but ran ok with no change (some timeout?)
4.53, 136.731, 136.793, 136.874, 140.56 and 1000.0

tests that failed because GT run1_mc (pps transport not included) when PPS it not active:
9.0 and 25.0

The following tests due to missing PPS condition CTPPSPixelGainCalibrationRcd in the GT (it has been already asked to include them in the GT) (#28190 (comment))

11634.0 and 12434.0

Tests that failed because PPS geometry from DB was not loaded (or step1.root taken from DAS, this is a file produced without PPS geometry). It has been already asked to include it in the GT (#28190 (comment)):

7.3, 158.0, 1306.0. 1330.0, 10024.0, 10042.0, 10224.0, 10824.0, 20034.0, 20434.0, 21234.0, 21234.0, 25202.0

@cmsbuild
Copy link
Contributor

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-29916/15516

  • This PR adds an extra 84KB to repository

  • Found files with invalid states:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @mundim for master.

It involves the following packages:

Configuration/StandardSequences
EventFilter/CTPPSRawToDigi
EventFilter/RawDataCollector
IOMC/RandomEngine
SimGeneral/MixingModule
SimPPS/Configuration
SimPPS/PPSPixelDigiProducer
SimTransport/PPSProtonTransport

@perrotta, @smuzaffar, @civanch, @Dr15Jones, @makortel, @emeschi, @mdhildreth, @cmsbuild, @silviodonato, @franzoni, @slava77, @mommsen, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@fabiocos, @makortel, @felicepantaleo, @forthommel, @jan-kaspar, @GiacomoSguazzoni, @rovere, @VinInn, @Martin-Grunewald, @wddgit, @lecriste, @dgulhan, @slomeo this is something you requested to watch as well.
@silviodonato, @dpiparo you are the release manager for this.

cms-bot commands are listed here

@mundim
Copy link
Contributor Author

mundim commented May 19, 2020

Two more comments (@fabferro, @malbouis @wpcarvalho)

  • concerning the missing PPS geometry: If I force a load of a sqlite file from a local version of Configuration/Geometry/python/GeometrySimDB_cff.py, most of them work
  • as mention in the PR Remaining config files needed to implement PPS simulation into cmssw … #28190, even loading PPS geometry, test 7.3 will fail because it uses a digi configuration for cosmic in which PPS digitization is not included and I wonder why should it be.

Comment on lines +9 to +10
from IOMC.RandomEngine.IOMC_cff import *
RandomNumberGeneratorService.LHCTransport.engineName = cms.untracked.string('TRandom3')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove, the RandomNumberGeneratorService should be fully defined in IOMC.RandomEngin.IOMC_cff.

Suggested change
from IOMC.RandomEngine.IOMC_cff import *
RandomNumberGeneratorService.LHCTransport.engineName = cms.untracked.string('TRandom3')

@civanch
Copy link
Contributor

civanch commented Jun 30, 2020

@mundim , make a new PR for Run-3 - it will be much more clean and less number of files.

@cvuosalo
Copy link
Contributor

@mundim For runTheMatrix, you can first run only one or a couple of the workflows that had the most errors previously:

runTheMatrix --job-reports --ibeos -j 15 -l 4.22,7.3,11634.0 -i all

as an example.
Make sure you have a voms-proxy before calling runTheMatrix.

@mundim
Copy link
Contributor Author

mundim commented Jun 30, 2020

Hi @cvuosalo . When I saw your comment, I had already submitted the test with "-l limited" option. All of the wf except the ones below ran ok. The failed tests are (with something that called my attention):
20034.0 -> uses geometry Extended2026D35
20434.0 -> usew geometry Extended2026D41
21234.0 -> uses geometry Extended2026D44
23234.0 -> uses geometry Extended2026D49
11634.0 -> SIM-GEN CMSSW_10_6_1-106X and missing souce for CTPPSPixelGainCalibrationsRcd
12434.0 -> SIM-GEN CMSSW_10_6_1-106X and missing souce for CTPPSPixelGainCalibrationsRcd

Are these observations relevant? I included the es_source by hand (tag CTPPSPixelGainCalibrations_v1_mc), then the missing es_source disappeared. Since the source root file was produced with a release in which PPS was not present, it should be ok the failure in 11634 and 12434 as pointed out by @christopheralanwest. Is there a way to do GEN-SIM step locally? does it make sense in this context?

Doing the tests as @cvuosalo said above, only the 11634.0 wf failed for the reason above.

@cvuosalo
Copy link
Contributor

@mundim You're making progress. Some of the workflows are now succeeding. You have to figure out how to get those six workflows to run without errors. I don't know what change in this PR causes wfs 11634.0 and 12434.0 to require the CTPPSPixelGainCalibrationsRcd, but that needs to be removed from this PR. For the 2026 wfs, you'll need to look through their logs to see why they had errors.

@mundim
Copy link
Contributor Author

mundim commented Jun 30, 2020

@mundim You're making progress. Some of the workflows are now succeeding. You have to figure out how to get those six workflows to run without errors. I don't know what change in this PR causes wfs 11634.0 and 12434.0 to require the CTPPSPixelGainCalibrationsRcd, but that needs to be removed from this PR. For the 2026 wfs, you'll need to look through their logs to see why they had errors.

I dug into the code and could not find anything that could cause such error. I ran a config for 2018 that should produce PPS hits if PPS was in the chain, but since I removed it, there is no PPS geometry, nor PPS hits and the job ran to completion with no problem. But when running these wfs I get the error below:

An exception of category 'ProductNotFound' occurred while
[0] Processing Event run: 1 lumi: 13 event: 1203 stream: 0
[1] Running path 'HLTAnalyzerEndpath'
[2] Prefetching for module L1TRawToDigi/'hltGtStage2Digis'
[3] Prefetching for module RawDataCollectorByLabel/'rawDataCollector'
[4] Prefetching for module CTPPSPixelDigiToRaw/'ctppsPixelRawData'
[5] Calling method for module CTPPSPixelDigiProducer/'RPixDetDigitizer'
Exception Message:
Principal::getByToken: Found zero products matching all criteria
Looking for type: CrossingFrame<PSimHit>
Looking for module label: mix
Looking for productInstanceName: g4SimHitsCTPPSPixelHits
(and also, the missing gain tag as previously mentioned).

@mundim
Copy link
Contributor Author

mundim commented Jul 1, 2020

Continuing... To solve the problem with tests 20034.0, 20434.0, 21234.0 and 23234.0 I see two possibilities: or we include PPS in the Extended2026DXX or I create a new modifier for 2026 and exclude PPS.

Some of you have other suggestions? I would like to issue the new PR only after sorting this out, if you agree.
Thanks.

@kpedro88
Copy link
Contributor

kpedro88 commented Jul 1, 2020

@mundim is PPS expected to be included in the Phase 2 CMS detector? if so, then it can be added to the Phase 2 geometries, but only if everything is functional. If more development will still be needed, better to exclude it for now using the phase2_common modifier.

@christopheralanwest
Copy link
Contributor

@mundim You're making progress. Some of the workflows are now succeeding. You have to figure out how to get those six workflows to run without errors. I don't know what change in this PR causes wfs 11634.0 and 12434.0 to require the CTPPSPixelGainCalibrationsRcd, but that needs to be removed from this PR. For the 2026 wfs, you'll need to look through their logs to see why they had errors.

The Run-3 GTs (for example, 112X_mcRun3_2021_realistic_v2) do not include a CTPPSPixelGainCalibrationsRcd but the 2018 GTs (for example, 111X_upgrade2018_realistic_v1) do. This is a historical accident (the Run-3 queues were initially created with 2017 conditions).

Normally AlCa requires that physics validation has been performed using new conditions prior to putting them in a global tag. The problem here is that the combined code+conditions doesn't run without crashing so AlCa cannot provide an updated global tag with all of the updated conditions.

However, since the PPS simulation is not yet included in CMSSW, I think that the PPS-specific tags--all tags except the SIM geometry with the Extended label could be included in a GT update and their content would simply be ignored. Is that correct? If so, AlCa could provide GT updates that include these tags in a separate PR that, once merged, would eliminate some of the spurious technical issues associated with missing tags.

@makortel
Copy link
Contributor

makortel commented Jul 1, 2020

If more development will still be needed, better to exclude it for now using the phase2_common modifier.

Perhaps excluding the (relevant) PPS modifiers from Phase2 era

Phase2 = cms.ModifierChain(Run3.copyAndExclude([phase1Pixel,trackingPhase1]), phase2_common, phase2_tracker, trackingPhase2PU140, phase2_ecal, phase2_hcal, phase2_hgcal, phase2_muon, phase2_GEM, hcalHardcodeConditions, phase2_timing, phase2_timing_layer, phase2_trigger)

would be simpler.

@mundim
Copy link
Contributor Author

mundim commented Jul 1, 2020

@mundim is PPS expected to be included in the Phase 2 CMS detector? if so, then it can be added to the Phase 2 geometries, but only if everything is functional. If more development will still be needed, better to exclude it for now using the phase2_common modifier.

Yes, but much more development will be needed. New stations will be installed, for which we don't have the full description yet and any attempt to propagate the geometry for phase 2 would delay even more the run3 integration. I prefer to exclude PPS from phase 2 now and work on a more permanent solutino afterwards.

@kpedro88
Copy link
Contributor

kpedro88 commented Jul 1, 2020

@mundim I agree, prioritize the Run 3 integration. When you remove PPS from phase 2, include a comment to note that the removal is temporary.

@mundim
Copy link
Contributor Author

mundim commented Jul 1, 2020

@mundim You're making progress. Some of the workflows are now succeeding. You have to figure out how to get those six workflows to run without errors. I don't know what change in this PR causes wfs 11634.0 and 12434.0 to require the CTPPSPixelGainCalibrationsRcd, but that needs to be removed from this PR. For the 2026 wfs, you'll need to look through their logs to see why they had errors.

The Run-3 GTs (for example, 112X_mcRun3_2021_realistic_v2) do not include a CTPPSPixelGainCalibrationsRcd but the 2018 GTs (for example, 111X_upgrade2018_realistic_v1) do. This is a historical accident (the Run-3 queues were initially created with 2017 conditions).

Normally AlCa requires that physics validation has been performed using new conditions prior to putting them in a global tag. The problem here is that the combined code+conditions doesn't run without crashing so AlCa cannot provide an updated global tag with all of the updated conditions.

However, since the PPS simulation is not yet included in CMSSW, I think that the PPS-specific tags--all tags except the SIM geometry with the Extended label could be included in a GT update and their content would simply be ignored. Is that correct? If so, AlCa could provide GT updates that include these tags in a separate PR that, once merged, would eliminate some of the spurious technical issues associated with missing tags.

This looks like a solution, but just to be clear, the correct tags are CTPPSPixelGainCalibrations_v1_mc, CTPPSPixelAnalysisMask_v1_mc and CTPPSPixelDAQMapping_v1_mc, which is already in the DB as announced by @malbouis .

@mundim
Copy link
Contributor Author

mundim commented Jul 1, 2020

If more development will still be needed, better to exclude it for now using the phase2_common modifier.

Perhaps excluding the (relevant) PPS modifiers from Phase2 era

Phase2 = cms.ModifierChain(Run3.copyAndExclude([phase1Pixel,trackingPhase1]), phase2_common, phase2_tracker, trackingPhase2PU140, phase2_ecal, phase2_hcal, phase2_hgcal, phase2_muon, phase2_GEM, hcalHardcodeConditions, phase2_timing, phase2_timing_layer, phase2_trigger)

would be simpler.

I have included PPS into Run3 modifier, not run3_common. Have I done it properly or do you have a better way? I mean, I modified Configuration.Eras.Era_Run3_cff.py with:

from Configuration.Eras.Modifier_ctpps_2021_cff import ctpps_2021

Run3 = cms.ModifierChain(Run2_2018.copyAndExclude([run2_GEM_2017]), run3_common, run3_GEM, run3_HB, stage2L1Trigger_2021,ctpps_2021)

@makortel
Copy link
Contributor

makortel commented Jul 1, 2020

If more development will still be needed, better to exclude it for now using the phase2_common modifier.

Perhaps excluding the (relevant) PPS modifiers from Phase2 era

Phase2 = cms.ModifierChain(Run3.copyAndExclude([phase1Pixel,trackingPhase1]), phase2_common, phase2_tracker, trackingPhase2PU140, phase2_ecal, phase2_hcal, phase2_hgcal, phase2_muon, phase2_GEM, hcalHardcodeConditions, phase2_timing, phase2_timing_layer, phase2_trigger)

would be simpler.

I have included PPS into Run3 modifier, not run3_common. Have I done it properly or do you have a better way? I mean, I modified Configuration.Eras.Era_Run3_cff.py with:

from Configuration.Eras.Modifier_ctpps_2021_cff import ctpps_2021

Run3 = cms.ModifierChain(Run2_2018.copyAndExclude([run2_GEM_2017]), run3_common, run3_GEM, run3_HB, stage2L1Trigger_2021,ctpps_2021)

That looks correct. I meant that to remove PPS from Phase2, it could be simpler to include ctpps_2021 modifier to the list of modifiers excluded when copying Run3 into Phase2, than to start adding phase2_common everywhere with exclusion purposes.

@kpedro88
Copy link
Contributor

kpedro88 commented Jul 1, 2020

@makortel indeed, that is probably simpler.

@mundim
Copy link
Contributor Author

mundim commented Jul 1, 2020

So, I implemented what was discussed above, removing PPS from phase2. Now ALL tests run to completion excepting 11634.0 and 12434.0 because of the CTPPSPixelGainCalibrationsRcd as above. I tried to include an es_source by hand in the step2 config file, but then there is a crash due to segmentation violation, the same occurred when using the GT candidate pointed put my @christopheralanwest .

@mundim
Copy link
Contributor Author

mundim commented Jul 2, 2020

Hello. Can I close this PR and issue a new one, even with the problems I mentioned above with wf 11634.0 and 12434.0?

Thanks.

@mundim
Copy link
Contributor Author

mundim commented Jul 2, 2020

Just one more information: I tried both tests above in a clean area, NO change at all, using the IB of yesterday, 11:00, but using the command:
runTheMatrix.py --job-reports -j 14 --ibeos -i all -l 11634.0,12434.0 --command="--conditions=111X_upgrade2018_realistic_v1"

As said by @christopheralanwest, this GT candidate is supposed to have the PPS gains and the same crash happened (it was just a try, I'm not sure this makes sense).

@cvuosalo
Copy link
Contributor

cvuosalo commented Jul 2, 2020

@christopheralanwest @mundim I think the next step is to update the GT for the PPS-specific tags mentioned here: #29916 (comment)
If I understand correctly, those tags should allow all the failing workflows to run successfully.
Then @mundim could create a new PR based upon this one for Run 3 only.

@mundim
Copy link
Contributor Author

mundim commented Jul 3, 2020

@christopheralanwest @mundim I think the next step is to update the GT for the PPS-specific tags mentioned here: #29916 (comment)
If I understand correctly, those tags should allow all the failing workflows to run successfully.
Then @mundim could create a new PR based upon this one for Run 3 only.

I'm just waiting for a "go ahead" to create the new PR... Thanks.

@perrotta
Copy link
Contributor

perrotta commented Jul 6, 2020

-1
(As far as I understand, a clean new PR will have to be created based on this one, or alternatively a rebase of this PR is in order. While waiting for things proceeding somehow, let take this PR out from those routinely kept monitored by reco)

@mundim
Copy link
Contributor Author

mundim commented Jul 6, 2020

-1
(As far as I understand, a clean new PR will have to be created based on this one, or alternatively a rebase of this PR is in order. While waiting for things proceeding somehow, let take this PR out from those routinely kept monitored by reco)

Yes, I prefer to create a clean/new PR. I thought it would be a good idea to keep this one open until we have everything sorted out, but if you think it is better to close it, I have nothing against it.

@mundim
Copy link
Contributor Author

mundim commented Jul 7, 2020

Dear all. I'll close this PR and issue the other one. Any problem with the tag eventually not yet in the GT we discuss in the new PR, ok? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment