Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an EDFilter to select events based on their bunch crossing number (10.3.x) #25019

Closed
wants to merge 2 commits into from

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Oct 26, 2018

Implement a global EDFilter BunchCrossingFilter to select events based on their bunch crossing number (between 1 and 3564 inclusive).
Include a test configuration based on 2018 data.

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 26, 2018

backport #25018

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 26, 2018

A new Pull Request was created by @fwyzard (Andrea Bocci) for CMSSW_10_3_X.

It involves the following packages:

FWCore/Modules

@cmsbuild, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks.
@makortel, @Martin-Grunewald, @wddgit this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 26, 2018

type new-feature

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 26, 2018

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 26, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/31297/console Started: 2018/10/26 17:17

@cmsbuild
Copy link
Contributor

Pull request #25019 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

2 similar comments
@cmsbuild
Copy link
Contributor

Pull request #25019 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

@cmsbuild
Copy link
Contributor

Pull request #25019 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

Implement a global EDFilter "BunchCrossingFilter" to select events based
on their bunch cfrossing number (between 1 and 3564 inclusive).

Include a test configuration based on 2018 data.
@fwyzard fwyzard force-pushed the BunchCrossingFilter_103x branch from 63e9663 to 618cd1f Compare October 26, 2018 16:16
@cmsbuild
Copy link
Contributor

Pull request #25019 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 26, 2018

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 26, 2018

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/31309/console Started: 2018/10/26 19:00

@fwyzard fwyzard changed the title Filter events based on their bunch crossing number (10.3.x) Implement an EDFilter to select events based on their bunch crossing number (10.3.x) Oct 26, 2018
@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25019/31309/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 32
  • DQMHistoTests: Total histograms compared: 3163099
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3162901
  • DQMHistoTests: Total skipped: 197
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 31 files compared)
  • Checked 134 log files, 14 edm output root files, 32 DQM output files


# input data
process.source = cms.Source('PoolSource',
fileNames = cms.untracked.vstring('/store/data/Run2018B/ZeroBias/RAW/v1/000/317/435/00000/10FA8B3F-5A68-E811-A1D9-FA163E481185.root')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to avoid getting files from storage for framework tests since I do run these test on my laptop without internet connections. We normally either create new input files for the test or we store tiny ROOT files into the repository. Would any of the files in IOPool/Input/testdata work for your test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the files under IOPool/Input/testdata have at most one event, so it is hard to show the various ways how to use the filter.
Of course if this is a requirement, I can simplify the test case and adapt it to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could stick a tiny file in IOPool/Input/testdata if you want, or one in FWCore/Modules/testdata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I create a tiny file ?
Even if I drop all collections ("drop *"), a file with 1 event is ~1.3 MB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the following

import FWCore.ParameterSet.Config as cms

process = cms.Process("SLIM")

process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring("file:10FA8B3F-5A68-E811-A1D9-FA163E481185.root"))

process.out = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("slim.root"),
                               dropMetaData = cms.untracked.string("ALL"),
                               outputCommands = cms.untracked.vstring("drop *")
                               )

process.maxEvents = cms.untracked.PSet(input = cms.untracked.int32(10))

process.o = cms.EndPath(process.out)

the generated file was 1.5MB in size. Looking at the internals, the ParameterSets branch is 1.47MB in size. I'll see if I can come up with a simple way to fake that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you checkout IOPool/Provenance and edit IOPool/Provenance/src/CommonProvenanceFiller.cc and comment out the loop

https://github.com/cms-sw/cmssw/blob/master/IOPool/Provenance/src/CommonProvenanceFiller.cc#L24

and rebuild IOPool/Output. At that point, using the configuration I tested, you can create a slim.root file which is only 40KB in length. I was successfully able to read back that tiny file from a simple cmsRun job.

@Dr15Jones
Copy link
Contributor

Dr15Jones commented Oct 30, 2018

An alternative testing strategy could be to use
https://github.com/cms-sw/cmssw/tree/master/FWCore/TestProcessor

However, it would require some new ability in that code to manipulate the EventPrincipal.

@Dr15Jones
Copy link
Contributor

The change in the TestProcessor would be right here
https://github.com/cms-sw/cmssw/blob/master/FWCore/TestProcessor/src/TestProcessor.cc#L407

since the EventAuxilliary holds the bunchCrossing information. All that would be needed would be fore a new setBunchCrossing method to be added to TestProcessor and an associated member variable. Then you could explicitly test the EDFilter.

@Dr15Jones
Copy link
Contributor

@fwyzard how do you want to proceed with the test you added? The more I think about it, the better I like the TestProcessor approach since it avoids the neef for an auxilliary file and gives more control over the testing.

@fwyzard
Copy link
Contributor Author

fwyzard commented Oct 31, 2018

Hi @Dr15Jones,
for me the filter makes sense only in the context f real data, of course, so I would like to have a meaningful test that people can use as some kind of reference for the module.
If you would rather have something self-contained that does not need any actual inputs, I do not mind of course, but I would suggest to have both, not just the latter.
If the main issue is that you would rather not have test files under FWCore/Modules depend on external inputs, I can move the module somewhere else.

Let me know,
.Andrea

@Dr15Jones
Copy link
Contributor

@fwyzard It seems to me like there are actually two tests being proposed

  1. Test that BunchCrossingFilter properly handles the values it gets from edm::Event::bunchCrossing()

  2. Test that cmsRun properly reads bunchCrossing information from the PoolSource.

The TestProcessor allows very precise control over the first test. Running a cmsRun job would be a convolution of the two tests.

Even with that in mind, the configuration you have made doesn't seem to have a testable failure condition and does not have a launching script so it would not help in catching problems.

@Dr15Jones
Copy link
Contributor

@fwyzard was the configuration you wrote maybe intended to be an example of use? If so, then instead of a configuration in the test directory you could

  • give the commands as doxygen comments in the module's file
  • create a .md file with the documentation and put it in the FWCore/Modules directory.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 11, 2018

@Dr15Jones, if I understand correctly, the tests you are proposing would serve to test the underlying framework features related to the handling of the bunch crossing number in data (and MC).

Of course I have nothing against that, but that goes beyond the scope of adding a test that shows how the new filter is supposed to work.

then instead of a configuration in the test directory you could

  • give the commands as doxygen comments in the module's file
  • create a .md file with the documentation and put it in the FWCore/Modules directory.

I appreciate that both solutions would still include the documentation in the code base. However, I do not know who would go looking for it there (in the doxygen docs or in a .md file).

If having a meaningful and realistic example in the test directory is the only showstopper, I can move the filter under a different package (e.g. HLTrigger/HLTfilters).

@Dr15Jones
Copy link
Contributor

The showstopper is requiring reading a file in a test that is not part of the repository. That would cause problems for testing the framework stand alone (which is something I do regularly from my laptop). There was the alternative of generating a 'tiny' file, putting it in IOPool/Input/testdata and having your test configuration read that one.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 11, 2018

I'll take @fabiocos suggestion on how to proceed.

@fabiocos
Copy link
Contributor

@fwyzard @Dr15Jones sorry, which is finally the size of this file? 1.5 MB or 40 kB? In general we are trying to push whatever non trivial data file in external packages, but checking the present status of the release I see:

13:49 cmsdev15 2948> find -L /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/ -name "*.root" -exec du -k {} ; | sort -n | less
15 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_1_6_12.root
15 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_3_9_9.root
16 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_1_8_4.root
16 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_1_6_12.root
17 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_4_3_0.root
17 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_1_8_4.root
18 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_2_2_13.root
18 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_3_1_6.root
18 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_3_8_1.root
18 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_3_9_9.root
19 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_2_0_11.root
19 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/empty_old_format_CMSSW_3_6_1.root
19 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_2_1_12.root
19 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_2_2_13.root
19 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_3_1_6.root
20 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_2_0_11.root
20 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_3_6_1.root
20 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_3_8_1.root
20 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/old_format_CMSSW_4_3_0.root
28 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_3_9_9.root
29 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_2_2_13.root
29 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_3_5_0.root
30 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_3_7_0.root
30 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_3_8_1.root
35 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_4_3_0.root
38 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_4_2_7.root
38 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/complex_old_format_CMSSW_4_2_8.root
746 /cvmfs/cms-ib.cern.ch/week0/slc6_amd64_gcc700/cms/cmssw/CMSSW_10_4_X_2018-12-13-2300/src/IOPool/Input/testdata/rawData_old_format_CMSSW_4_2_8.root

Another 40 kB file is not something unmanageable, even if I wonder whether we should not put all these files into an external data package (still part of the code distribution, but in a different package/git repository). In that case we could even add 1.5 MB.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2018

The original approach proposed by Chris to create a slim input file should be OK, I only have to go through recalling the input file and following the recipe.

@cmsbuild
Copy link
Contributor

Pull request #25019 was updated. @cmsbuild, @smuzaffar, @Dr15Jones can you please check and sign again.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2018

It turned out I need a slightly bigger file (~120kB) to exercise all the various cases.

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 14, 2018

By the way, we can probably close this specific PR, unless this functionality is deemed useful for 10.3.x.
I would like to merge the one for master (#25018) of course, and the 10.2.x backport (#25020).
The use case is to skim the high pileup data and have the resulting skims still readable with a 10.2.x release, for HLT studies.

@fabiocos
Copy link
Contributor

@fwyzard fine for me to close this PR and follow up with the others

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 17, 2018

Moved to 10.2.x and 10.4.x only.

@fwyzard fwyzard closed this Dec 17, 2018
@fwyzard fwyzard deleted the BunchCrossingFilter_103x branch January 26, 2019 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants