Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests or workflows that read existing scouting data #41040

Closed
makortel opened this issue Mar 13, 2023 · 15 comments
Closed

Add tests or workflows that read existing scouting data #41040

makortel opened this issue Mar 13, 2023 · 15 comments

Comments

@makortel
Copy link
Contributor

In order to make sure we retain the ability to read old scouting data (which is part of RAW backwards compatibility guarantees), it would be good to add workflows (or other tests) that read in old scouting data.

Following up a comment in #41025 (comment).

@makortel
Copy link
Contributor Author

assign hlt, pdmv

@cmsbuild
Copy link
Contributor

New categories assigned: pdmv,hlt

@bbilin,@missirol,@sunilUIET,@kskovpen,@Martin-Grunewald you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@missirol
Copy link
Contributor

@makortel, thanks for opening this issue.

My first naive thought was to add unit test(s) in DataFormats/Scouting/test/.

This is an example for the Run-3 scouting data formats. In my understanding, this simple cfg should fail if a non-backward-compatible change is introduced.

I wonder if you had something different in mind, since you included PdmV.

Is such a unit test worth adding ?

@makortel
Copy link
Contributor Author

I thought a straightforward way would be to add a workflow in runTheMatrix, but I'd be fine with a unit test as well.

@missirol
Copy link
Contributor

I'm trying to take the unit test of #41040 (comment) one test further, writing a test with 2 steps:

  • step 1: read the Scouting collections from an input file, and write them to an EDM output file;
  • step 2: dump the content of the Scouting collections in the output of step-1 to a text file, and compare that text file to a reference.

Step-1 should fail if non-bckwd-compatible changes are introduced.

Step-2 is meant to check that the values of the products remain the same. The 'con' of step-2 is that (1) one needs to add this reference file, and (2) this reference file might have to be updated in the future (for example, if a certain variable is renamed). Also, this reference file needs to be 'small' if it is to stay in the test/ folder (as opposed to cms-data).

Does this go in the right direction? Any suggestions?

This attempt is in
https://github.com/missirol/cmssw/tree/ad597a6493b98b5e192aa628e149031d6627e24b/DataFormats/Scouting/test

(caveat: the test currently fails, because printing of Run3ScoutingHitPatternPOD needs to improved)

@Dr15Jones
Copy link
Contributor

Dr15Jones commented Mar 15, 2023

@missirol instead of doing it in 2 steps, I'd suggest creating a EDAnalyzer which takes as parameters the values you expect and then have the EDAnalyzer read the scouting object from the event and compare the values in the scouting object to the values given in the parameter. If the values do not match, have the EDAnalyzer throw an exception.

@wddgit
Copy link
Contributor

wddgit commented May 11, 2023

I'm working on the new unit tests now. I'm planning on following the pattern in #41631. The only difference will be that I will make one test for Run 3 formats and another test for the rest of them. So I will group them and not have one test per class type. If you have any comments to that pattern, then it would be easier if I knew now and I could just do it that way in the first draft.

Is this a complete list of Scouting data classes? I'm just listing all the files in the only directory that I know about. Are there any of them that were never used and never will be used to store data? Are there any still under rapid development where I should just wait and not waste time on them yet?

Singularity> ls DataFormats/Scouting/interface/
Run3ScoutingCaloJet.h
Run3ScoutingElectron.h
Run3ScoutingHitPatternPOD.h
Run3ScoutingMuon.h
Run3ScoutingPFJet.h
Run3ScoutingParticle.h
Run3ScoutingPhoton.h
Run3ScoutingTrack.h
Run3ScoutingVertex.h
ScoutingCaloJet.h
ScoutingElectron.h
ScoutingMuon.h
ScoutingPFJet.h
ScoutingParticle.h
ScoutingPhoton.h
ScoutingTrack.h
ScoutingVertex.h

@wddgit
Copy link
Contributor

wddgit commented May 11, 2023

One thing I noticed that is mildly odd. Some of these classes have class version 2 in the classes_def.xml file. I'm not sure if this is still true, but versions 0, 1, 2 are special values to ROOT. I think it causes extra byte count information to be included in the persistent format or something like that and different behavior... Was this intentional? We usually start the version at 3. I don't know whether this is a real problem or not, but it is not the usual way I have seen these before.

Some of these classes have multiple versions. Do we need test files with all versions or can the older ones be ignored and just start out with the current latest version? I think we only need the ones where there is actually data or monte carlo in long term storage somewhere.

@makortel
Copy link
Contributor Author

Is this a complete list of Scouting data classes?

This is the complete list. The Scouting* classes without the prefix can be called "run 2 scouting"

If you have any comments to that pattern

I'd include the class version numbers explicitly in the file name (in an order corresponding the alphabetical order of the scouting classes).

Some of these classes have class version 2 in the classes_def.xml file. Was this intentional?

I'd guess it was accidental.

Do we need test files with all versions or can the older ones be ignored and just start out with the current latest version? I think we only need the ones where there is actually data or monte carlo in long term storage somewhere.

I would restrict to the versions that were actually used for data. Those would (ideally) be

  • For "run 2 scouting" classes
    • The versions in 8_0_7 (2016 data)
    • The versions in 9_4_0 (2017 data)
    • The versions in 10_2_0 (2018 data)
  • For "run 3 scouting" classes
    • The versions in 12_4_0 (2022 data)
    • The versions in 13_0_3 (2023 data)

@wddgit
Copy link
Contributor

wddgit commented May 11, 2023

I guess given that all the version 2 classes are run 2 classes, there is nothing that can be done about it at this point. I'll just ignore that. Probably didn't matter much since no one noticed before.

@makortel
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor Author

Tests were added in #41834 and in #41913 (with mock data).

@makortel
Copy link
Contributor Author

+core

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants