Long startup time for simplest workflows #43062

felicepantaleo · 2023-10-19T09:14:43Z

Developing reconstruction algorithms for Phase-2 is becoming more and more difficult and frustrating as CMSSW startup times have increased in the last few years.
This is what happens when we want to reconstruct 10 events of a single charged pion shot in front of HGCAL with a single thread.
The workflow is 24896.0_CloseByPGun_CE_E_Front_120um+2026D98 using CMSSW_13_2_0_pre3
It takes 4.5 minutes to launch the job, 30 seconds to reconstruct 10 events with a single thread.
Reducing the startup time would reduce drastically the number of coffees we have every day while increasing morale.

$ date && cmsRun step3_RAW2DIGI_RECO_RECOSIM_PAT_VALIDATION_DQM.py 
**Don Okt 19 10:55:37 CEST 2023**
19-Oct-2023 10:57:51 CEST  Initiating request to open file file:step2_16065463_0.root
19-Oct-2023 10:58:06 CEST  Successfully opened file file:step2_16065463_0.root
2023-10-19 10:58:48.203903: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 19-Oct-2023 11:00:50.708 CEST
%MSG-w UndefinedPartonMode:  HadronAndPartonSelector:selectedHadronsAndPartons  19-Oct-2023 11:00:52 CEST Run: 1 Event: 1
Could not automatically determine the hadronizer type and set the correct parton selection mode. Parton-based jet flavour will not be defined.
%MSG
#--------------------------------------------------------------------------
#                         FastJet release 3.4.1
#                 M. Cacciari, G.P. Salam and G. Soyez                  
#     A software package for jet finding and analysis at colliders      
#                           http://fastjet.fr                           
#                                                                             
# Please cite EPJC72(2012)1896 [arXiv:1111.6097] if you use this package
# for scientific work and optionally PLB641(2006)57 [hep-ph/0512210].   
#                                                                       
# FastJet is provided without warranty under the GNU GPL v2 or higher.  
# It uses T. Chan's closest pair algorithm, S. Fortune's Voronoi code
# and 3rd party plugin jet algorithms. See COPYING file for details.
#--------------------------------------------------------------------------
%MSG-w UndefinedPartonMode:  HadronAndPartonSelector:patJetPartons  19-Oct-2023 11:00:52 CEST Run: 1 Event: 1
Could not automatically determine the hadronizer type and set the correct parton selection mode. Parton-based jet flavour will not be defined.
%MSG
%MSG-w UndefinedPartonMode:   HadronAndPartonSelector:selectedHadronsAndPartonsForGenJetsFlavourInfos  19-Oct-2023 11:01:05 CEST Run: 1 Event: 1
Could not automatically determine the hadronizer type and set the correct parton selection mode. Parton-based jet flavour will not be defined.
%MSG
Begin processing the 2nd record. Run 1, Event 2, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:05.542 CEST
Begin processing the 3rd record. Run 1, Event 3, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:08.168 CEST
Begin processing the 4th record. Run 1, Event 4, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:10.733 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:13.286 CEST
Begin processing the 6th record. Run 1, Event 6, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:15.435 CEST
Begin processing the 7th record. Run 1, Event 8, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:17.857 CEST
Begin processing the 8th record. Run 1, Event 7, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:20.059 CEST
Begin processing the 9th record. Run 1, Event 9, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:22.279 CEST
Begin processing the 10th record. Run 1, Event 11, LumiSection 1 on stream 0 at 19-Oct-2023 11:01:25.115 CEST

@waredjeb @makortel fyi

The text was updated successfully, but these errors were encountered:

cmsbuild · 2023-10-19T09:14:59Z

A new Issue was created by @felicepantaleo Felice Pantaleo.

@antoniovilela, @makortel, @Dr15Jones, @rappoccio, @smuzaffar, @sextonkennedy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel · 2023-10-19T09:19:50Z

assign core,upgrade

cmsbuild · 2023-10-19T09:20:05Z

New categories assigned: core,upgrade

@Dr15Jones,@AdrianoDee,@makortel,@smuzaffar,@srimanob you have been requested to review this Pull request/Issue and eventually sign? Thanks

Dr15Jones · 2023-10-19T21:05:53Z

So I used my recent changes to the Tracer service to gather statistics on your job. What I found was the following break down

 51s parsing python
  8.4s setting up source
 19.7s constructing the 2200 modules
  0.05s in begin job
  0.05s in begin stream
143s in CondDBESSource to find the conditions for the first Run
  0.0003s source reading begin run
 24s in global Begin Run
    1.8s in dqmCSCClient
    2.1s in XMLIdealGeometryESSource
    0.9s in hcalParameters
    1.6s in EcalBarrelGeometryEP
    0.9s in hgcalEEParametersInitialize
    1.1s in hgcalHESiParametersInitialize
    1.2s in gemGeometry
    2.7s in TrackerAlignmentRcd
    1.0s in ecalMonitorTask
    4.2s in VolumeBasedMagneticFieldESProducer
    1.0s in trackerGeometry

  0.2s in CondDBESSource to find conditions for first lumi

50s in stream begin Run
    48.3s in mtdNumberingGeometry
 0.002s source reading begin lumi
 0.02s in global begin lumi
 0.2s in stream begin lumi
 0.004s source reading first event

317s from start to start processing first event

So the obvious speed ups would be to

use a more efficient configuration file, either from .dumpPython or one where you pickle to results and always re-read from the pickle file.
choose a different way to get your conditions. You could choose https://github.com/cms-sw/cmssw/tree/master/CondCore/CondHDF5ESSource#readme
optimize mtdNumberingGeometry

srimanob · 2023-10-19T23:30:29Z

Thanks very much @Dr15Jones for analysis.
FYI @cms-sw/mtd-dpg-l2 @fabiocos on mtdNumberingGeometry

makortel · 2023-10-20T06:53:35Z

I investigated the first 3 steps of the workflow with VTune (as @felicepantaleo told me it felt like all the steps have long startup time). On step 1 (GEN-SIM) the job divides clearly into three phases

The first phase is the overall job startup (shown in the top thread), taking about 102 seconds wall clock time, and 85 seconds of CPU time. The CPU time divides into

2 seconds of reading the configuration
19 seconds of constructing the EventProcessor (i.e. module construction and all that)
63 seconds spent in EventSetup
- 40 seconds spent in MuonOffsetESProducer
- 5 seconds in HcalSimParametersESModule
- 4 seconds in VolumeBasedMagneticFieldESProducerFromDB
- 4 seconds in XMLIdealGeometryESSource
- 4 seconds in HGCalParametersESModule
- 3 seconds in EcalSimParametersESModule
- rest is 1 second or below

Here is flame graph of what happens in MuonOffsetESProducer

The question is then, does it really need to be this expensive? @cms-sw/geometry-l2

Second phase (middle thread in the timeline) is the Geant4 initialization in the OscarMTMasterThread, taking about 50 seconds. I don't know if there is anything that could be easily done there, but here is the flame graph of that anyway @cms-sw/simulation-l2

Third phase (mostly in the bottom thread in the timeline) is the actual event processing (most time spent in Geant4, so shows in a separate non-TBB thread), taking about 45 seconds for 10 events.

makortel · 2023-10-20T06:54:07Z

assign geometry

cmsbuild · 2023-10-20T06:54:26Z

New categories assigned: geometry

@Dr15Jones,@civanch,@bsunanda,@makortel,@mdhildreth you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2023-10-20T07:32:44Z

The step 2 (DIGI-HLT) took in total 240 seconds wall clock time and 220 second CPU time.

The configuration defines 901 EDModules, 208 ESModules, 57 (End)Paths, 116 Sequences, and 157 Tasks.

The startup phase (72 seconds CPU time) divides into

5 seconds of reading configuration
46 seconds in EventProcessor constructor
- 27 seconds in StreamSchedule constructor, i.e. loading libraries, creating modules, registering data products (of which a lot of time is spent in cling), and all that
  - 12 seconds spent in Maker::makeModule, of which 8 seconds in creating edm::stream modules. This 8 seconds increases with the number of streams, but is run serially!
    - 3 seconds in L1TPFCaloProducer constructor
      - Most of the time seems to be spent in reading data from a ROOT file via l1tpf::corrector::init_()
    - 2 seconds in DTTrigPhase2Prod constructor
      - Most of the time seems to be spent in GlobalCoordsObtainer::generate_luts()
    - 2 seconds in l1tpf::PFClusterProducerFromHGC3DClusters constructor
      - Most of the time seems to be spent in creating cut parser, and creating expression parser in l1tpf::HGC3DClusterEgID constructor
    - 0.3 seconds in MixingModule constructor, rest is below that
- 1 second constructing the source

@cms-sw/l1-l2 Is there anything that could be done to speed up the constructors of the aforementioned 3 L1T modules?

Next is beginRun transition (including most, if not all, of EventSetup), about 100 seconds of CPU time, divided into

EventSetup
- 58 seconds in MTDGeometricTimingDetESModule::produce() (which @Dr15Jones also noted above)
- 7.0 seconds in getting the CondDB data
  - notably 4 seconds for SiPixelGainCalibrationOffline, rest are below 1 second
- 4.3 seconds in magneticfield::VolumeBasedMagneticFieldESProducerFromDB::produce()
- 4.3 seconds in XMLIdealGeometryESSource::produceGeom()
- 4.0 seconds in HGCalParametersESModule::produce()
- 3.3 seconds in TrackerDigiGeometryESModule::produce()
- 2.7 seconds in CaloGeometryEP<EcalBarrelGeometry, DDCompactView>::produceAligned()
- 2.0 seconds in DTGeometryESModule::produce()
- 1.7 seconds in CSCGeometryESModule::produce()
- 1.7 seconds in GEMGeometryESModule::produce()
- 1.5 seconds in HcalParametersESModule::produce()
- 1.2 seconds in HGCalGeometryESProducer::produce()
- 1.1 seconds in HcalTPGCoderULUT::produce()
- 1.0 seconds in RPCGeometryESModule::produce()
- 0.6 seconds in MuonGeometryConstantsESModule::produce()
- rest is smaller
EDModule beginRun
- 3 seconds in trklet::TrackletEventProcessor::init()

Then comes the event data processing, 63 seconds for 10 events

4.6 seconds in OutputModule writing the data
56 seconds in EDModules
- 45 seconds in MixingModule
  - 41 seconds in HGCDigiProducer::finalizeEvent()
  - 2.7 seconds in HcalDigiProducer::finalizeEvent()
  - 0.5 seconds in EcalDigiProducer::finalizeEvent()
- rest:

makortel · 2023-10-20T07:33:01Z

assign l1

cmsbuild · 2023-10-20T07:33:04Z

New categories assigned: l1

@epalencia,@aloeliger you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2023-10-20T09:18:02Z

The step 3 (RECO+PAT+VALIDATION+DQM) took 290 seconds wall clock time with 260 seconds of CPU time.

The configuration defines 5431(!) EDModules, 483 ESModules, 72 (End)Paths, 1291 Sequences, and 758 Tasks.

The startup phase took about 170 seconds wall clock time (150 seconds CPU time), divided into (in CPU time)

70 seconds in reading the configuration
- 50 seconds is spent in miniAOD_customizeAllMC() customization function
  - 32 seconds is spent in toModify()
    - 19 seconds in _add_jetsPuppi()
      - 19 seconds in applyDeepBtagging()
        
        17 seconds in setupBTagging()
        
        16 seconds in addToProcessAndTask()
    - 12 seconds _add_metPuppi()
  - 18 seconds is spent in runMetCorAndUncForMiniAODProduction()
  - Based on the profile, most of the time seems to be spent in assignments like process.foo = cms.EDModule(...), where the process.foo already exists. In this case the framework goes to replace the module in all Tasks/Sequence/Schedule
    - 35 seconds spent in _replaceInTasks()
    - 12 seconds spent in _replaceInSequences()
    - 1 second is spent in _replaceInSchedule()
    - 1 second is spent in _delattrFromSetattr()
79 seconds in constructing EventProcessor
- 39 seconds in constructing Schedule
  - 25 seconds in edm::Factory::makeModule()
    - 15 seconds in constructing EDModules
      - 5 seconds in constructing edm::stream EDModules. This 5 seconds increases with the number of streams, but is run serially!
        
        3 seconds is spent in MuonIdProducer constructor, in reading ROOT file via MuonCaloCompatibility::configure()
    - 9 seconds in loading shared libraries
- 15 seconds in creating EventSetup modules
  - 13 seconds in loading the shared libraries
- 13 seconds in creating source

@cms-sw/xpog-l2 I found that the miniAOD_customizeAllMC() being expensive was reported already in #20220, maybe it would be finally time to speed it up?

The beginRun transition took about 93 seconds of wall clock time (87 seconds CPU time), divided into (in CPU time)

EventSetup
- 52 seconds in MTDGeometricTimingDetESModule::produce() (also noted earlier)
- 4.5 seconds in XMLIdealGeometryESSource::produceGeom()
- 4.4 seconds in magneticfield::VolumeBasedMagneticFieldESProducerFromDB::produce()
- 3.6 seconds in HGCalParametersESModule::produce()
- 2.7 seconds in CaloGeometryEP<EcalBarrelGeometry, DDCompactView>::produceAligned()
- 1.7 seconds in GEMGeometryESModule::produce()
- 1.6 seconds in TrackerDigiGeometryESModule::produce()
- 1.3 seconds in HcalParametersESModule::produce()
- 1.2 seconds in HGCalGeometryESProducer::produce()
- 1.0 seconds in DTGeometryESModule::produce()
- rest are below 1 second
EDModules
- 4.2 seconds in edm::one EDModules
  - 2.3 seconds in cscdqm::Dispatcher::book()
  - 1.5 seconds in EcalDQMonitorTask::bookHistograms()
- 1.5 seconds in edm::stream EDModules
- 0.7 seconds in edm::global EDModules

Event data processing and shutdown took about 23 seconds in wall clock time (22.5 seconds in CPU time), divided into (in CPU time)

EventSetup
- 6.7 seconds in getting the CondDB data
  - 3.7 seconds for SiPixelGainCalibrationOffline in SiPixelGainCalibrationOfflineRcd
  - 0.9 seconds for GBRForestD in GBRDWrapperRcd
  - 0.7 seconds for GBRForest in GBRWrapperRcd
  - rest is less
5.5 seconds in EDModules
- 0.7 seconds in edm::one modules (looks like DQM)
- 4.7 seconds in edm::global and edm::stream modules (i.e. reconstruction etc, nothing seems to really smell here)
3.6 seconds in DelayedReader (i.e. reading input data)
2.5 seconds in output module writing data
2.5 seconds in closing the output file

makortel · 2023-10-20T09:18:07Z

assign xpog

cmsbuild · 2023-10-20T09:18:20Z

New categories assigned: xpog

@vlimant,@simonepigazzini you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2023-10-20T09:20:31Z

assign db

For the cost of

143s in CondDBESSource to find the conditions for the first Run

in
#43062 (comment)

cmsbuild · 2023-10-20T09:20:52Z

New categories assigned: db

@francescobrivio,@saumyaphor4252,@perrotta,@consuegs you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel · 2023-10-20T09:21:30Z

+core

I think our part (in the analysis) is done

davidlange6 · 2023-10-20T14:05:00Z

re MuonOffsetESProducer and mtdNumberingGeometry - are these not just the unlikely module that builds the entire geometry? [which might be much faster if it were in the db]

Dr15Jones · 2023-10-20T14:15:14Z

As a test I did

> time python3 -c 'import step3_RAW2DIGI_RECO_RECOSIM_PAT_VALIDATION_DQM'
45.100u 1.567s 0:49.83 93.6%    0+0k 24+0io 0pf+0w

Then in python I did

> python3 -i step3_RAW2DIGI_RECO_RECOSIM_PAT_VALIDATION_DQM.py
>>> import pickle
>>> pickle.dump(process, open('step3.pkl', 'wb'))

and then

[cdj@cmslpc-el8-heavy01 24896.0_CloseByPGun_CE_E_Front_120um+2026D98]$ time python3 -c 'import pickle; process = pickle.load(open("step3.pkl", "rb"));'
1.833u 0.153s 0:02.16 91.6%     0+0k 0+0io 0pf+0w

so using a pickle file is 25x faster

Dr15Jones · 2023-10-20T14:27:31Z

@davidlange6

re MuonOffsetESProducer and mtdNumberingGeometry - are these not just the unlikely module that builds the entire geometry?

it also seems that Phase 2 is still using the old DD instead of DD4Hep.

davidlange6 · 2023-10-20T14:37:18Z

I believe that is true - (so startup will get slower with DD4Hep I guess)

makortel · 2023-10-20T14:39:39Z

@Dr15Jones asked for the flame graph for MTDGeometricTimingDetESModule::produce()

davidlange6 · 2023-10-20T14:42:57Z

looking at some of the others
putting a
theSiPixelGainCalibrationOffline.reserve(detUnitDimensions.first*(detUnitDimensions.second + detUnitDimensions.first/80);

here

cmssw/CalibTracker/SiPixelESProducers/plugins/SiPixelFakeGainOfflineESSource.cc

Line 52 in 245daa4

std::vector<char> theSiPixelGainCalibrationOffline;

probably helps a lot (I don't really understand the resize(...) followed by a memcpy going on, so there might be another factor to gain in this code

srimanob · 2023-10-20T14:48:21Z

@davidlange6

re MuonOffsetESProducer and mtdNumberingGeometry - are these not just the unlikely module that builds the entire geometry?

it also seems that Phase 2 is still using the old DD instead of DD4Hep.

Do we really gain something in performance with dd4hep? It was also a question I get from CHEP, but I never have a clear result of gaining.

By the way, we are trying to move. Last validation still see few issues, and we plan to do again with coming 13_3_0_pre4

Dr15Jones · 2023-10-20T14:59:02Z

@srimanob

Do we really gain something in performance with dd4hep?

My take is we'd rather spend time optimizing code using DD4Hep then spend the time on the obsolete DD.

mmusich · 2023-10-20T15:02:25Z

here
cmssw/CalibTracker/SiPixelESProducers/plugins/SiPixelFakeGainOfflineESSource.cc

This should not even be executed in phase2

makortel · 2023-10-20T15:09:39Z

here
cmssw/CalibTracker/SiPixelESProducers/plugins/SiPixelFakeGainOfflineESSource.cc

This should not even be executed in phase2

I confirm that in this workflow the SiPixelGainCalibrationOffline comes from the CondDB via cond::persistency::PayloadProxy<SiPixelGainCalibrationOffline>::loadPayload() etc.

mmusich · 2023-10-20T15:17:26Z

I confirm that in this workflow the SiPixelGainCalibrationOffline comes from the CondDB

I thought we cleaned that here #42794

makortel · 2023-10-20T15:30:37Z

I confirm that in this workflow the SiPixelGainCalibrationOffline comes from the CondDB

I thought we cleaned that here #42794

The reported behavior was in CMSSW_13_2_0_pre3 (the job used 131X_mcRun4_realistic_v6 Global Tag), i.e. before that cleanup PR. So one message to @felicepantaleo and @waredjeb would be to move to more recent CMSSW.

aloeliger · 2023-10-20T15:54:27Z

The step 2 (DIGI-HLT) took in total 240 seconds wall clock time and 220 second CPU time.

The configuration defines 901 EDModules, 208 ESModules, 57 (End)Paths, 116 Sequences, and 157 Tasks.

The startup phase (72 seconds CPU time) divides into

5 seconds of reading configuration

46 seconds in EventProcessor constructor

27 seconds in StreamSchedule constructor, i.e. loading libraries, creating modules, registering data products (of which a lot of time is spent in cling), and all that

12 seconds spent in Maker::makeModule, of which 8 seconds in creating edm::stream modules. This 8 seconds increases with the number of streams, but is run serially!

3 seconds in L1TPFCaloProducer constructor

Most of the time seems to be spent in reading data from a ROOT file via l1tpf::corrector::init_()

2 seconds in DTTrigPhase2Prod constructor

Most of the time seems to be spent in GlobalCoordsObtainer::generate_luts()

2 seconds in l1tpf::PFClusterProducerFromHGC3DClusters constructor

Most of the time seems to be spent in creating cut parser, and creating expression parser in l1tpf::HGC3DClusterEgID constructor

0.3 seconds in MixingModule constructor, rest is below that

1 second constructing the source

@cms-sw/l1-l2 Is there anything that could be done to speed up the constructors of the aforementioned 3 L1T modules?

@gpetruc

Yours is the name I recall in connection with the correlator team and Phase2L1ParticleFlow (That is correct right? My apologies if not). From my look at the L1TPFCaloProducer code, 3 correctors get produced from a string, an int, and a debug bool:

cmssw/L1Trigger/Phase2L1ParticleFlow/plugins/L1TPFCaloProducer.cc

Lines 73 to 77 in 245daa4

    
           emCorrector_(iConfig.getParameter<std::string>("emCorrector"), -1, debug_), 
        
           hcCorrector_(iConfig.getParameter<std::string>("hcCorrector"), -1, debug_), 
        
           hadCorrector_(iConfig.getParameter<std::string>("hadCorrector"), 
        
                         iConfig.getParameter<double>("hadCorrectorEmfMax"), 
        
                         debug_),

Which calls this constructor here:

cmssw/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc

Lines 34 to 38 in 245daa4

    
           l1tpf::corrector::corrector(const std::string &filename, float emfMax, bool debug, bool emulate) 
        
               : emfMax_(emfMax), emulate_(emulate) { 
        
             if (!filename.empty()) 
        
               init_(filename, "", debug, emulate); 
        
           }

, calling init_ with two strings and two bools here:

cmssw/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc

Lines 52 to 83 in 245daa4

    
           void l1tpf::corrector::init_(const std::string &filename, const std::string &directory, bool debug, bool emulate) { 
        
             std::string resolvedFileName = filename; 
        
             if (filename[0] != '/') { 
        
           #ifdef CMSSW_GIT_HASH 
        
               resolvedFileName = edm::FileInPath(filename).fullPath(); 
        
           #else 
        
               resolvedFileName = std::filesystem::absolute(filename); 
        
           #endif 
        
             } 
        
             TFile *lFile = TFile::Open(resolvedFileName.c_str()); 
        
             if (!lFile || lFile->IsZombie()) { 
        
           #ifdef CMSSW_GIT_HASH 
        
               throw cms::Exception("Configuration", "cannot read file " + filename); 
        
           #else 
        
               throw std::runtime_error("Cannot read file " + filename); 
        
           #endif 
        
             } 
        
             TDirectory *dir = directory.empty() ? lFile : lFile->GetDirectory(directory.c_str()); 
        
             if (!dir) { 
        
           #ifdef CMSSW_GIT_HASH 
        
               throw cms::Exception("Configuration", "cannot find directory '" + directory + "' in file " + filename); 
        
           #else 
        
               throw std::runtime_error("Cannot find directory '" + directory + "' in file " + filename); 
        
           #endif 
        
             } 
        
             init_(dir, debug); 
        
             if (emulate) 
        
               initEmulation_(dir, debug); 
        
             lFile->Close(); 
        
           }

This is mostly file checks which eventually calls another init_ here:

cmssw/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc

Lines 85 to 144 in 245daa4

    
           void l1tpf::corrector::init_(TDirectory *lFile, bool debug) { 
        
             TH1 *index = (TH1 *)lFile->Get("INDEX"); 
        
             if (!index) { 
        
           #ifdef CMSSW_GIT_HASH 
        
               throw cms::Exception("Configuration") 
        
                   << "invalid input file " << lFile->GetPath() << ": INDEX histogram not found.\n"; 
        
           #else 
        
               std::stringstream ss; 
        
               ss << "invalid input file " << lFile->GetPath() << ": INDEX histogram nit found.\n"; 
        
               throw std::runtime_error(ss.str()); 
        
           #endif 
        
             } 
        
             index_.reset((TH1 *)index->Clone()); 
        
             index_->SetDirectory(nullptr); 
        
             is2d_ = index_->InheritsFrom("TH2"); 
        
             std::unordered_map<std::string, TGraph *> graphs; 
        
             TKey *key; 
        
             TIter nextkey(lFile->GetListOfKeys()); 
        
             while ((key = (TKey *)nextkey())) { 
        
               if (strncmp(key->GetName(), "eta_", 4) == 0) { 
        
                 TGraph *gr = (TGraph *)key->ReadObj(); 
        
                 if (!gr->TestBit(TGraph::kIsSortedX)) 
        
                   gr->Sort(); 
        
                 graphs[key->GetName()] = gr; 
        
               } 
        
             } 
        
             neta_ = index_->GetNbinsX(); 
        
             nemf_ = (is2d_ ? index_->GetNbinsY() : 1); 
        
             corrections_.resize(neta_ * nemf_); 
        
             std::fill(corrections_.begin(), corrections_.end(), nullptr); 
        
             char buff[32]; 
        
             for (unsigned int ieta = 0; ieta < neta_; ++ieta) { 
        
               for (unsigned int iemf = 0; iemf < nemf_; ++iemf) { 
        
                 if (is2d_) { 
        
                   snprintf(buff, 31, "eta_bin%d_emf_bin%d", ieta + 1, iemf + 1); 
        
                 } else { 
        
                   snprintf(buff, 31, "eta_bin%d", ieta + 1); 
        
                 } 
        
                 TGraph *graph = graphs[buff]; 
        
                 if (debug) { 
        
           #ifdef CMSSW_GIT_HASH 
        
                   edm::LogPrint("corrector") << "   eta bin " << ieta << " emf bin " << iemf << " graph " << buff 
        
                                              << (graph ? " <valid>" : " <nil>") << "\n"; 
        
           #else 
        
                   std::cout << "   eta bin " << ieta << " emf bin " << iemf << " graph " << buff 
        
                             << (graph ? " <valid>" : " <nil>") << "\n"; 
        
           #endif 
        
                 } 
        
                 if (graph) { 
        
                   corrections_[ieta * nemf_ + iemf] = (TGraph *)graph->Clone(); 
        
                 } 
        
                 if (std::abs(index_->GetXaxis()->GetBinCenter(ieta + 1)) > 3.0) { 
        
                   break;  // no EMF bins beyond eta = 3 
        
                 } 
        
               } 
        
             } 
        
           }

and if the emulation flag is set, it calls another initEmulation_ here:

cmssw/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc

Lines 146 to 175 in 245daa4

    
           void l1tpf::corrector::initEmulation_(TDirectory *lFile, bool debug) { 
        
             std::unordered_map<std::string, TH1 *> hists; 
        
             TKey *key; 
        
             TIter nextkey(lFile->GetListOfKeys()); 
        
             while ((key = (TKey *)nextkey())) { 
        
               if (strncmp(key->GetName(), "emul_eta", 8) == 0) { 
        
                 TH1 *hist = (TH1 *)key->ReadObj(); 
        
                 hists[key->GetName()] = hist; 
        
               } 
        
             } 
        
             neta_ = index_->GetNbinsX(); 
        
             correctionsEmulated_.resize(neta_); 
        
             std::fill(correctionsEmulated_.begin(), correctionsEmulated_.end(), nullptr); 
        
             char buff[32]; 
        
             for (unsigned int ieta = 0; ieta < neta_; ++ieta) { 
        
               snprintf(buff, 31, "emul_eta_bin%d", ieta + 1); 
        
               TH1 *hist = hists[buff]; 
        
               if (debug) 
        
           #ifdef CMSSW_GIT_HASH 
        
                 edm::LogPrint("corrector") << "   eta bin " << ieta << " hist " << buff << (hist ? " <valid>" : " <nil>") << "\n"; 
        
           #else 
        
                 std::cout << "   eta bin " << ieta << " hist " << buff << (hist ? " <valid>" : " <nil>") << "\n"; 
        
           #endif 
        
               if (hist) { 
        
                 correctionsEmulated_[ieta] = (TH1 *)hist->Clone(); 
        
                 correctionsEmulated_[ieta]->SetDirectory(nullptr); 
        
               } 
        
             } 
        
           }

Notably, each of the init functions does some loop through TFile keys, which would be my first instinct to check for slow down in this constructor.

Would it be possible to ask if you or one of the people responsible for Phase2L1ParticleFlow could take a look at this and see if where the slowdown occurs and if it could be reworked?

@NTrevisani

You are listed as having most recently added this line:

cmssw/L1Trigger/DTTriggerPhase2/plugins/DTTrigPhase2Prod.cc

Line 197 in 245daa4

globalcoordsobtainer_->generate_luts();

to DTTriggerPhase2Prod

Which in turn calls a for loop doing a long list of mathematical functions here (listed with the atan calculation):

cmssw/L1Trigger/DTTriggerPhase2/src/GlobalCoordsObtainer.cc

Lines 43 to 195 in 245daa4

    
           void GlobalCoordsObtainer::generate_luts() { 
        
             for (auto& global_constant : global_constants) { 
        
               int sgn = 1; 
        
               DTChamberId ChId(global_constant.chid); 
        
               // typical hasPosRF function 
        
               if (ChId.wheel() > 0 || (ChId.wheel() == 0 && ChId.sector() % 4 > 1)) { 
        
                 sgn = -1; 
        
               } 
        
               auto phi1 = calc_atan_lut(12, 
        
                                         6, 
        
                                         (1. / 16) / (global_constant.sl1.perp * 10), 
        
                                         global_constant.sl1.x_phi0 / global_constant.sl1.perp, 
        
                                         1. / std::pow(2, 17), 
        
                                         10, 
        
                                         3, 
        
                                         12, 
        
                                         20, 
        
                                         sgn); 
        
               auto phi3 = calc_atan_lut(12, 
        
                                         6, 
        
                                         (1. / 16) / (global_constant.sl3.perp * 10), 
        
                                         global_constant.sl3.x_phi0 / global_constant.sl3.perp, 
        
                                         1. / std::pow(2, 17), 
        
                                         10, 
        
                                         3, 
        
                                         12, 
        
                                         20, 
        
                                         sgn); 
        
               double max_x_phi0 = global_constant.sl1.x_phi0; 
        
               if (global_constant.sl3.x_phi0 > max_x_phi0) { 
        
                 max_x_phi0 = global_constant.sl3.x_phi0; 
        
               } 
        
               auto phic = calc_atan_lut(12, 
        
                                         6, 
        
                                         (1. / 16) / ((global_constant.sl1.perp + global_constant.sl3.perp) / .2), 
        
                                         max_x_phi0 / ((global_constant.sl1.perp + global_constant.sl3.perp) / 2), 
        
                                         1. / std::pow(2, 17), 
        
                                         10, 
        
                                         3, 
        
                                         12, 
        
                                         20, 
        
                                         sgn); 
        
               auto phib = calc_atan_lut(9, 6, 1. / 4096, 0., 4. / std::pow(2, 13), 10, 3, 10, 16, sgn); 
        
               luts[global_constant.chid] = {phic, phi1, phi3, phib}; 
        
             } 
        
           } 
        
           std::map<int, lut_value> GlobalCoordsObtainer::calc_atan_lut(int msb_num, 
        
                                                                        int lsb_num, 
        
                                                                        double in_res, 
        
                                                                        double abscissa_0, 
        
                                                                        double out_res, 
        
                                                                        int a_extra_bits, 
        
                                                                        int b_extra_bits, 
        
                                                                        int a_size, 
        
                                                                        int b_size, 
        
                                                                        int sgn) { 
        
             // Calculates the coefficients needed to calculate the arc-tan function in fw 
        
             // by doing a piece-wise linear approximation. 
        
             // In fw, input (x) and output (y) are integers, these conversions are needed 
        
             //   t = x*in_res - abscissa_0 
        
             //   phi = arctan(t) 
        
             //   y = phi/out_res 
        
             //   => y = arctan(x*in_res - abcissa_0)/out_res 
        
             // The linear function is approximated as 
        
             //   y = a*x_lsb + b 
        
             // Where a, b = func(x_msb) are the coefficients stored in the lut 
        
             // a is stored as unsigned, b as signed, with their respective sizes a_size, b_size, 
        
             // previously shifted left by a_extra_bits and b_extra_bits, respectively 
        
             long int a_min = -std::pow(2, (a_size - 1)); 
        
             long int a_max = std::pow(2, (a_size - 1)) - 1; 
        
             long int b_min = -std::pow(2, (b_size - 1)); 
        
             long int b_max = std::pow(2, (b_size - 1)) - 1; 
        
             std::map<int, lut_value> lut; 
        
             for (long int x_msb = -(long int)std::pow(2, msb_num - 1); x_msb < (long int)std::pow(2, msb_num - 1); x_msb++) { 
        
               int x1 = ((x_msb) << lsb_num); 
        
               int x2 = ((x_msb + 1) << lsb_num) - 1; 
        
               double t1 = x1 * in_res - abscissa_0; 
        
               double t2 = x2 * in_res - abscissa_0; 
        
               double phi1 = sgn * atan(t1); 
        
               double phi2 = sgn * atan(t2); 
        
               double y1 = phi1 / out_res; 
        
               double y2 = phi2 / out_res; 
        
               // we want to find a, b so that the error in the extremes is the same as the error in the center 
        
               // so the error in the extremes will be the same, so the "a" is determined by those points 
        
               double a = (y2 - y1) / (x2 - x1); 
        
               // by derivating the error function and equaling to 0, you get this is the point 
        
               // towards the interval center with the highest error 
        
               // err_f = y - (a*x+b) = sgn*arctan(x*in_res - abcissa_0)/out_res - (a*x+b) 
        
               // d(err_f)/dx = sgn*(1/(1+(x*in_res - abcissa_0)^2))*in_res/out_res - a 
        
               // d(err_f)/dx = 0 => x_max_err = (sqrt(in_res/out_res/a-1) + abscissa_0)/in_res 
        
               // There is sign ambiguity in the sqrt operation. The sqrt has units of t (adimensional). 
        
               // It is resolved by setting the sqrt to have the same sign as (t1+t2)/2 
        
               double t_max_err = sqrt(sgn * in_res / out_res / a - 1); 
        
               if ((t1 + t2) < 0) { 
        
                 t_max_err *= -1; 
        
               } 
        
               double x_max_err = (t_max_err + abscissa_0) / in_res; 
        
               double phi_max_err = sgn * atan(t_max_err); 
        
               double y_max_err = phi_max_err / out_res; 
        
               // once you have the point of max error, the "b" parameter is chosen as the average between 
        
               // those two numbers, which makes the error at the center be equal in absolute value 
        
               // to the error in the extremes 
        
               // units: rad 
        
               double b = (y1 + y_max_err - a * (x_max_err - x1)) / 2; 
        
               // increase b in 1/2 of y_lsb, so that fw truncate operation on the of the result 
        
               // is equivalent to a round function instead of a floor function 
        
               b += 0.5; 
        
               // shift left and round 
        
               long int a_int = (long int)(round(a * (pow(2, a_extra_bits)))); 
        
               long int b_int = (long int)(round(b * (pow(2, b_extra_bits)))); 
        
               // tuck a, b constants into the bit size of the output (un)signed integer 
        
               std::vector<long int> as = {a_min, a_int, a_max}; 
        
               std::vector<long int> bs = {b_min, b_int, b_max}; 
        
               std::sort(as.begin(), as.end()); 
        
               std::sort(bs.begin(), bs.end()); 
        
               a_int = as.at(1); 
        
               b_int = bs.at(1); 
        
               // // convert a, b to two's complement 
        
               // auto a_signed = a_int % (long int) (pow(2, a_size)); 
        
               // auto b_signed = b_int % (long int) (pow(2, b_size)); 
        
               // convert x_msb to two's complement signed 
        
               int index = to_two_comp(x_msb, msb_num); 
        
               lut[index] = {a_int, b_int}; 
        
             } 
        
             return lut; 
        
           }

which seems a primary candidate for slowdown in this function. Could you, or someone responsible for this code look into the performance of this, and see if it could be reworked? If not, would it maybe be possible to cache or look up the results of these math functions/loops? At a quick pass these seem to be calculated against some constants.

@rekovic & @cerminar

You are listed as authors on PFClusterProducerFromHGC3DClusters and HGC3DClusterEgID. This one I cannot find a likely candidate for slow-down on, other than perhaps the instantiation of the TMVA and the creation of another corrector (see the first section).

Could you, or someone responsible for this code look into the performance of this and see if it could be reworked?

gpetruc · 2023-10-20T20:15:52Z

Hi, I can have a look but not with terribly high priority - after all, we're discussing shaving off seconds of initialization, once one runs a workflow with more events it becomes irrelevant. The two PF modules mentioned have to load a bunch of calibration histograms (in several bins of eta and em fraction), and BDTs for cluster identification, so they do need some time. Also, eventually both features will anyway have to be replaced with bitwise-accurate emulators (timescale of some months, at most a year) Giovanni Il Ven 20 Ott 2023, 17:54 Andrew Loeliger ***@***.***> ha scritto:

…

The step 2 (DIGI-HLT) took in total 240 seconds wall clock time and 220 second CPU time. The configuration defines 901 EDModules, 208 ESModules, 57 (End)Paths, 116 Sequences, and 157 Tasks. The startup phase (72 seconds CPU time) divides into - 5 seconds of reading configuration - 46 seconds in EventProcessor constructor - 27 seconds in StreamSchedule constructor, i.e. loading libraries, creating modules, registering data products (of which a lot of time is spent in cling), and all that - 12 seconds spent in Maker::makeModule, of which 8 seconds in creating edm::stream modules. *This 8 seconds increases with the number of streams, but is run serially!* - 3 seconds in L1TPFCaloProducer constructor - Most of the time seems to be spent in reading data from a ROOT file via l1tpf::corrector::init_() - 2 seconds in DTTrigPhase2Prod constructor - Most of the time seems to be spent in GlobalCoordsObtainer::generate_luts() - 2 seconds in l1tpf::PFClusterProducerFromHGC3DClusters constructor - Most of the time seems to be spent in creating cut parser, and creating expression parser in l1tpf::HGC3DClusterEgID constructor - 0.3 seconds in MixingModule constructor, rest is below that - 1 second constructing the source @cms-sw/l1-l2 <https://github.com/orgs/cms-sw/teams/l1-l2> Is there anything that could be done to speed up the constructors of the aforementioned 3 L1T modules? @gpetruc <https://github.com/gpetruc> Yours is the name I recall in connection with the correlator team and Phase2L1ParticleFlow (That is correct right? My apologies if not). From my look at the L1TPFCaloProducer code, 3 correctors get produced from a string, an int, and a debug bool: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/plugins/L1TPFCaloProducer.cc#L73-L77 Which calls this constructor here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L34-L38, calling init_ with two strings and two bools here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L52-L83 This is mostly file checks which eventually calls another init_ here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L85-L144 and if the emulation flag is set, it calls another initEmulation_ here: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/Phase2L1ParticleFlow/src/corrector.cc#L146-L175 Notably, each of the init functions does some loop through TFile keys, which would be my first instinct to check for slow down in this constructor. Would it be possible to ask if you or one of the people responsible for Phase2L1ParticleFlow could take a look at this and see if where the slowdown occurs and if it could be reworked? @NTrevisani <https://github.com/NTrevisani> You are listed as having most recently added this line: https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/DTTriggerPhase2/plugins/DTTrigPhase2Prod.cc#L197 to DTTriggerPhase2Prod Which in turn calls a for loop doing a long list of mathematical functions here (listed with the atan calculation): https://github.com/cms-sw/cmssw/blob/245daa4d7ad8e1412084c2d66381edfd8835d7ad/L1Trigger/DTTriggerPhase2/src/GlobalCoordsObtainer.cc#L43-L195 which seems a primary candidate for slowdown in this function. Could you, or someone responsible for this code look into the performance of this, and see if it could be reworked? If not, would it maybe be possible to cache or look up the results of these math functions/loops? At a quick pass these seem to be calculated against some constants. @rekovic <https://github.com/rekovic> & @cerminar <https://github.com/cerminar> You are listed as authors on PFClusterProducerFromHGC3DClusters and HGC3DClusterEgID. This one I cannot find a likely candidate for slow-down on, other than perhaps the instantiation of the TMVA and the creation of another corrector (see the first section). Could you, or someone responsible for this code look into the performance of this and see if it could be reworked? — Reply to this email directly, view it on GitHub <#43062 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDNWR65YPAQI4UAPYD6OXTYAKND7AVCNFSM6AAAAAA6G3HLI6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZSHE4TGNZXGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

vlimant · 2023-10-24T12:54:11Z

it is on our to-do list to review how PAT is constructed and simplify it together with dropping miniAOD_customizeAllMC

Dr15Jones · 2023-10-24T18:14:23Z

As an additional test, I dumped the data associated to the global tag used by the job into an HDF5 file by doing

conddb2hdf5.py --compressor zlib --output 133X_mcRun4_realistic_v1.h5conf 133X_mcRun4_realistic_v1

As the tool only handles full global tags and not individual tags, I changed the cmsRun configuration to read the global tag from the HDF5 file but had the individual tags still come from CondDBESSource. The change to the configuration was the addition of the following lines

process.GlobalTag.globaltag = ""
process.H5Tag = cms.ESSource("CondHDF5ESSource", filename = cms.untracked.string("133X_mcRun4_realistic_v1.h5conf"), globalTag = cms.string("133X_mcRun4_realistic_v1"),
                             excludeRecords = cms.vstring("SiPixelGenErrorDBObjectRcd", "SiPixelLorentzAngleRcd",
                                                                    "SiPixelLorentzAngleSimRcd","SiPixelTemplateDBObjectRcd",
                                                                    "TrackerAlignmentErrorExtendedRcd",
                                                                    "TrackerAlignmentRcd",
                                                                    "TrackerSurfaceDeformationRcd"))

Running the original job at FNAL, it took 140s for the job to retrieve all the IOVs during the begin Run phase. With this change, it now takes 1.5s.

VinInn · 2023-10-25T08:47:05Z

So for local tests one can first dump the global tag and then run.
since which version H5Tag is supported?

Dr15Jones · 2023-10-25T13:33:35Z

The CondCore/CondHDF5ESSource package was first introduced in CMSSW_13_3_0_pre2. See #42431.

davidlange6 · 2023-10-25T13:51:41Z

@Dr15Jones - what is the role of excludeRecords here? Just an ability to skip known to be not needed records? Is that responsible for much of the (very interesting..) speedup

Dr15Jones · 2023-10-25T14:12:13Z

what is the role of excludeRecords here? Just an ability to skip known to be not needed records? Is that responsible for much of the (very interesting..) speedup

For the configuration given, the CondDBESSource overrides the GlobalTag for those records and gives explicit tags to be used (see below). The CondHDF5ESSource (and its accompanying scripts) only handle using a GlobalTag (they could be extended to handle explicit tags as well) so I needed to still get these explicit tags (and their associated Records) from CondDBESSource. In order to avoid having HDF5 and the DB fight each other over which one would deliver the Record (one could es an ESAlias to explicitly set which one wins) I decided to just have the CondHDF5ESSource not deliver those Record at all.

    toGet = cms.VPSet(
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('SiPixelGenErrorDBObjectRcd'),
            snapshotTime = cms.string('2021-04-17 20:00:00'),
            tag = cms.string('SiPixelGenErrorDBObject_phase2_IT_v7.0.2_25x100_v2_mc')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('SiPixelLorentzAngleRcd'),
            snapshotTime = cms.string('2021-03-16 20:00:00.000'),
            tag = cms.string('SiPixelLorentzAngle_phase2_T25_v0_mc')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            label = cms.untracked.string('forWidth'),
            record = cms.string('SiPixelLorentzAngleRcd'),
            snapshotTime = cms.string('2020-02-23 14:00:00.000'),
            tag = cms.string('SiPixelLorentzAngle_phase2_T19_mc_forWidthEmpty')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            label = cms.untracked.string('fromAlignment'),
            record = cms.string('SiPixelLorentzAngleRcd'),
            snapshotTime = cms.string('2020-02-23 14:00:00.000'),
            tag = cms.string('SiPixelLorentzAngle_phase2_T19_mc_forWidthEmpty')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('SiPixelLorentzAngleSimRcd'),
            snapshotTime = cms.string('2021-03-16 20:00:00.000'),
            tag = cms.string('SiPixelSimLorentzAngle_phase2_T25_v0_mc')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('SiPixelTemplateDBObjectRcd'),
            snapshotTime = cms.string('2021-04-17 20:00:00'),
            tag = cms.string('SiPixelTemplateDBObject_phase2_IT_v7.0.2_25x100_v2_mc')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('TrackerAlignmentErrorExtendedRcd'),
            snapshotTime = cms.string('2023-03-16 15:30:00'),
            tag = cms.string('TrackerAlignmentErrorsExtended_Upgrade2026_T25_design_v0')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('TrackerAlignmentRcd'),
            snapshotTime = cms.string('2023-03-16 15:30:00'),
            tag = cms.string('TrackerAlignment_Upgrade2026_T25_design_v0')
        ),
        cms.PSet(
            connect = cms.string('frontier://FrontierProd/CMS_CONDITIONS'),
            record = cms.string('TrackerSurfaceDeformationRcd'),
            snapshotTime = cms.string('2023-03-16 15:30:00'),
            tag = cms.string('TrackerSurfaceDeformations_Upgrade2026_Zero')
        )
    )

fabiocos · 2023-10-26T15:55:01Z

for the MTD geometry construction part, please have a look at #43124 . The DD4hep version is definitely less problematic as far as MTD is concerned, but not the production default yet. Profiling 24807.911 with IgProf:

    5.7      20.92  MTDGeometricTimingDetESModule::produce(IdealGeometryRecord const&) [122]
    5.7      20.92  DDCmsMTDConstruction::construct(cms::DDCompactView const&) [123]

fwyzard · 2023-10-26T16:12:15Z

so using a pickle file is 25x faster

@Dr15Jones I wasn't able to understand... is the pickled file faster than the original configuration build by cmsDriver.py, or faster than the fully expanded one from edmConfigDump ?

Dr15Jones · 2023-10-26T16:15:29Z

I wasn't able to understand... is the pickled file faster than the original configuration build by cmsDriver.py, or faster than the fully expanded one from edmConfigDump ?

Faster than the original one from cmsDriver.py.

civanch · 2023-11-06T08:19:10Z

+geometry

it seems that the main problem in geometry connected with MTD is fixed.

Dr15Jones · 2023-11-08T16:42:16Z

If I use CMSSW_13_3_X_2023-11-08-1100, use a pickled configuration file and use the HDF5 conditions code then for a 8 thread job (which can do some initialization in parallel) it takes 68 seconds to go from starting the job to the first event. This job processed 100 events.

[The image is from the new tracer log viewer application]

cmsbuild added the pending-assignment label Oct 19, 2023

cmsbuild added core-pending pending-signatures upgrade-pending and removed pending-assignment labels Oct 19, 2023

cmsbuild added the geometry-pending label Oct 20, 2023

cmsbuild added the l1-pending label Oct 20, 2023

cmsbuild added the xpog-pending label Oct 20, 2023

makortel mentioned this issue Oct 20, 2023

Slow processing of reconstruction configuration #20220

Open

cmsbuild added the db-pending label Oct 20, 2023

cmsbuild added core-approved and removed core-pending labels Oct 20, 2023

fabiocos mentioned this issue Oct 26, 2023

MTD geometry: speed up MTD reco geometry construction #43124

Merged

cmsbuild added geometry-approved and removed geometry-pending labels Nov 6, 2023

Long startup time for simplest workflows #43062

Long startup time for simplest workflows #43062

Comments

felicepantaleo commented Oct 19, 2023 • edited Loading

cmsbuild commented Oct 19, 2023

makortel commented Oct 19, 2023

cmsbuild commented Oct 19, 2023

Dr15Jones commented Oct 19, 2023

srimanob commented Oct 19, 2023

makortel commented Oct 20, 2023

makortel commented Oct 20, 2023

cmsbuild commented Oct 20, 2023

makortel commented Oct 20, 2023

makortel commented Oct 20, 2023

cmsbuild commented Oct 20, 2023

makortel commented Oct 20, 2023

makortel commented Oct 20, 2023

cmsbuild commented Oct 20, 2023

makortel commented Oct 20, 2023 • edited Loading

cmsbuild commented Oct 20, 2023

makortel commented Oct 20, 2023

davidlange6 commented Oct 20, 2023

Dr15Jones commented Oct 20, 2023

Dr15Jones commented Oct 20, 2023

davidlange6 commented Oct 20, 2023

makortel commented Oct 20, 2023

davidlange6 commented Oct 20, 2023

srimanob commented Oct 20, 2023

Dr15Jones commented Oct 20, 2023

mmusich commented Oct 20, 2023

makortel commented Oct 20, 2023

mmusich commented Oct 20, 2023

makortel commented Oct 20, 2023

aloeliger commented Oct 20, 2023

gpetruc commented Oct 20, 2023 via email

vlimant commented Oct 24, 2023

Dr15Jones commented Oct 24, 2023

VinInn commented Oct 25, 2023

Dr15Jones commented Oct 25, 2023

davidlange6 commented Oct 25, 2023

Dr15Jones commented Oct 25, 2023

fabiocos commented Oct 26, 2023

fwyzard commented Oct 26, 2023

Dr15Jones commented Oct 26, 2023

civanch commented Nov 6, 2023

Dr15Jones commented Nov 8, 2023 • edited Loading

felicepantaleo commented Oct 19, 2023 •

edited

Loading

makortel commented Oct 20, 2023 •

edited

Loading

Dr15Jones commented Nov 8, 2023 •

edited

Loading