Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HLT-Scouting collections to MINIAOD event content (follow-up of #42863) [13_3_X] #43331

Merged

Conversation

missirol
Copy link
Contributor

backport of #43327

PR description:

From the description of #43327:

#42863 added the HLT-Scouting collections to the MINIAODSIM event content (i.e. MINIAOD content of standard MC samples). This PR suggests two improvements on top of #42863.

  1. Add the HLT-Scouting collections to the MINIAOD event content (not only to MINIAODSIM), as per request of the Scouting group.

    • Pro: for Primary Datasets (PDs, real data) whose RAW event content includes HLT-Scouting objects, the latter objects will also be included in MINIAOD. Right now, only one such PD exists, named ScoutingPFMonitor. It is used for offline studies related to Scouting, and the Scouting group currently uses a workflow with "two-file solution" to access offline objects from MINIAOD and HLT-Scouting objects from AOD. Having both in MINIAOD will simplify this workflow significantly. Note that this change (adding HLT-Scouting to MINIAOD) has no impact on any other PD, to my knowledge, as those PDs do not retain HLT-Scouting objects in RAW in the first place.

    • Con 1: the size of the MINIAOD samples of the ScoutingPFMonitor PD will increase. This size increase has not been quantified. It is assumed to be at most 10% based on the checks done in Adding the scouting event content to MINIAODSIM #42863. Since this only applies to a single PD with relatively low rate (below ~40 Hz during normal pp data-taking in 2023), I dare say this cost is rather small. For example, if I check the total size of all the Run2023 MINIAOD samples on DAS, I get 1.65 PB. If I restrict that to the ScoutingPFMonitor PDs, I get 2.8 TB (0.16% of the total).

      rm -f tmp.txt
      for ddd in $(dasgoclient -query "dataset dataset=/*/*Run2023*/*MINIAOD* status=VALID"); do
        dasgoclient -query "file dataset=$ddd | sum(file.size)" >> tmp.txt
      done
      cat tmp.txt | awk '{sum += $2} END {print sum}'
      
      rm -f tmp.txt
      for ddd in $(dasgoclient -query "dataset dataset=/*Scouting*/*Run2023*/*MINIAOD* status=VALID"); do
        dasgoclient -query "file dataset=$ddd | sum(file.size)" >> tmp.txt
      done
      cat tmp.txt | awk '{sum += $2} END {print sum}'
    • Con 2: the size of MINIAOD samples derived from data tiers such as FEVTDEBUGHLT will also increase (again by a guess-stimated ~10% or less). I do not know this kind of use cases in detail. I see this happens, for example, in wfs such as 141.001 where there is a reHLT step on data with --eventcontent FEVTDEBUGHLT (followed by a 2nd step with RECO, MINI, NANO, etc). Here too, I would guess this use case is limited, and the overall cost of this increase could be considered small.

  2. Integrate this better in the way HLT currently provides collections to the 'central' event contents in CMSSW. This PR defines a PSet HLTriggerMINIAOD in HLTrigger/Configuration (HLTriggerMINIAOD in this PR includes only the HLT Scouting event content), similarly to the way HLTriggerAOD and others are defined. This part of the PR is purely technical, it's just meant to homogenise how extra HLT-related collections are inserted in different data tiers.

HLTrigger_EventContent_cff.py was not modified directly, but recreated by running an updated version of HLTrigger/Configuration/test/getEventContent.py.

If approved, I would suggest to backport this PR to CMSSW_13_3_X to keep HLTrigger/Configuration as similar as possible in 13_3_X (currently used for HLT-menu development) and later cycles (and to cover the unlikely scenario of taking data relevant to Scouting in 2024 with 13_3_X).

Attn: @elfontan @kelmorab (TSG/Scouting conveners)

PR validation:

None beyond the checks done for #43327.

If this PR is a backport, please specify the original PR and why you need to backport that PR. If this PR will be backported, please specify to which release cycle the backport is meant for:

#43327

Update of MINIAOD event content for the purpose of HLT-Scouting studies.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 20, 2023

A new Pull Request was created by @missirol (Marino Missiroli) for CMSSW_13_3_X.

It involves the following packages:

  • Configuration/EventContent (operations)
  • HLTrigger/Configuration (hlt)

@davidlange6, @rappoccio, @fabiocos, @antoniovilela, @Martin-Grunewald, @cmsbuild, @mmusich can you please review it and eventually sign? Thanks.
@fabiocos, @Martin-Grunewald, @silviodonato this is something you requested to watch as well.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Nov 20, 2023

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fd1e80/35953/summary.html
COMMIT: d71d760
CMSSW: CMSSW_13_3_X_2023-11-20-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43331/35953/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 75 lines from the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 140 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3362836
  • DQMHistoTests: Total failures: 2407
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3360407
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@mmusich
Copy link
Contributor

mmusich commented Nov 20, 2023

@rappoccio
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next CMSSW_13_3_X IBs (tests are also fine) and once validation in the development release cycle CMSSW_14_0_X is complete. This pull request will be automatically merged.

@cmsbuild cmsbuild merged commit 1593699 into cms-sw:CMSSW_13_3_X Nov 21, 2023
@missirol missirol deleted the devel_hltScoutingInMINIAOD_133X branch September 13, 2024 07:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants