Don't handle column types redundantly anymore #401

guitargeek · 2019-09-24T10:14:38Z

Hi NanoAOD devs!

I hope this is the right cmssw fork and branch for this PR.

Yesterday I wanted to introduce some new column types in my private nanoAOD productions (int16_t for example, to save a bit of space) and use them in the flat table producers. However, I realized that there are many parts of the NanoAOD code which have to be tweaked if you want to do this, as the way how column types are handled is not completely trivial.

One source of complication is that when you add a column to a flat table with addColumn(), you have to pass the type as a template argument as well as an enum value in the function parameters. After working a bit with the code, I understood that this is redundant, because the check_type function makes sure you always use the right enum value with the right template parameter. Therefore, we could just drop this enum parameter and deduce it from the template argument. In this situation, we would also not need check_type anymore.

The only tricky part are bool columns, which should actually be represented by a uint8_t vector. So far, the logic to take care of this had to be implemented in the plugins that made use of the FlatTable class, but I think I found a way to have this logic directly in the FlatTable class so one can just use addColumn<bool> to create bool columns and they will be internally stored in the uint8_t vector.

What do you think? This simplifies the type handling already quite a bit, and I think it's the good path towards a FlatTable class that will support all basic types that you can also store in TTrees.

I tested this with the local matrix tests so far, can the nano-bot tests still be done here? That would be very cool!

Thanks for considering this and cheers,
Jonas

guitargeek · 2019-09-24T11:27:06Z

Since there is no bot here to mention the conveners: hi @peruzzim and @fgolf

arizzi · 2019-09-24T12:07:11Z

hi,
without looking at the code, did you check that it also still works with the nanoedm + merge step as performed in production?

triggering the bot meanwhile

gpetruc-bot · 2019-09-24T12:10:46Z

Automatic test started, see https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/pipelines/1112208/builds

guitargeek · 2019-09-24T12:17:22Z

Hi @arizzi, yes I think that's what I did. I tested by stripping down the nano configs down to just the GenPart table for a quick check, and at the end of my cfg I did:

process.out = cms.OutputModule("NanoAODOutputModule",
    fileName = cms.untracked.string('out.root'),
    outputCommands = process.NanoAODEDMEventContent.outputCommands,
    compressionLevel = cms.untracked.int32(9),
    compressionAlgorithm = cms.untracked.string("LZMA"),
    dataset = cms.untracked.PSet(
        dataTier = cms.untracked.string('NANOAODSIM'),
        filterName = cms.untracked.string('')
    ),
)

It that what you mean?

gpetruc-bot · 2019-09-24T12:24:58Z

Please update PhysicsTools/NanoAOD/python/nanoDQM_cfi.py: take this patch or run prepareDQM.py -d -u nano_file_mc.root, and then if needed adjust the plot range using some human common sense.

arizzi · 2019-09-24T12:27:10Z

nope, I mean using PoolOutputModule rather than NanoAodOutputModule and then convert to actual (flat) nano in the merge step.

guitargeek · 2019-09-24T12:44:04Z

Sorry for not being experienced with this yet. Alright I swapped NanoAODOutputModule for PoolOutputModule in my config and have now a root file that contains the flat table object as expected. What exactly do you mean by merge step? Is that something in CMSSW? Just one link pointing to some short example or explanation would be great, thanks in advance!

arizzi · 2019-09-24T13:01:14Z

https://github.com/cms-sw/cmssw/blob/master/Configuration/DataProcessing/python/Merge.py

gpetruc-bot · 2019-09-24T13:22:42Z

The "long" bot tests for data do run the EDM output + merging and conversion to NANO Giovanni Il Mar 24 Set 2019, 15:16 arizzi <[email protected]> ha scritto:

…

https://github.com/cms-sw/cmssw/blob/master/Configuration/DataProcessing/python/Merge.py — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#401?email_source=notifications&email_token=AHYE7UJCNXQZE4ZK2ALEQXLQLIHLHA5CNFSM4IZ5UQUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7OIXJA#issuecomment-534547364>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHYE7UNS7JKGQRSGJGCFGQTQLIHLHANCNFSM4IZ5UQUA> .

guitargeek · 2019-09-24T13:23:40Z

Okay I used the mergeProcess function to dump a little config:

from Configuration.DataProcessing.Merge import mergeProcess

process = mergeProcess(
    ["file:out1.root", "file:out2.root", "file:out3.root"],
    process_name = "Merge",
    output_file = "Merged.root",
    output_lfn = None,
    newDQMIO = False,
    mergeNANO = True,
    bypassVersionCheck = False)

print(process.dumpPython())

It worked fine, and in the end I got once more a "flat ntuple".

gpetruc-bot

Automatic test report for 1112208

gitlab pipeline at https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/pipelines/1112208/builds
outputs at https://cms-nanoaod-integration.web.cern.ch/integration/test_pr_401/

Code integration

Code checks passed for this PR

Please update PhysicsTools/NanoAOD/python/nanoDQM_cfi.py: take this patch or run prepareDQM.py -d -u nano_file_mc.root, and then if needed adjust the plot range using some human common sense.

Tests

Long test data102X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data106X (9000 events): passed, no significant changes; dqm plots: all, diff
Long test data80X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data80Xhip (3000 events): passed, no significant changes; dqm plots: all, diff
Long test data94X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data94X2016 (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data94Xv2 (10000 events): passed, no significant changes; dqm plots: all, diff
Long test mc102X (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc106X (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc80X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test mc94X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test mc94X2016 (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc94Xv2 (9000 events): passed, no significant changes; dqm plots: all, diff
Test mc_94Xv2: passed
Test mc_102X: passed
Test data_94X: passed
Test data_102X: passed

Disk size report

Sample	kb/event	ref kb/event	diff
TTbar MC 102X	1.831	1.831	0.000 ( +0.0% )
TTbar MC 94Xv1	1.924	1.924	0.000 ( +0.0% )
TTbar MC 94Xv2	1.956	1.956	0.000 ( +0.0% )
TTbar MC 94X2016	1.745	1.746	-0.000 ( -0.0% )
TTbar MC 80X	1.902	1.900	0.001 ( +0.1% )
Data 102X	0.963	0.963	0.000 ( +0.0% )
Data 94Xv1	0.913	0.913	-0.000 ( -0.0% )
Data 80X	0.793	0.793	-0.000 ( -0.0% )
Data 80X, Mu Run2016E	0.775	0.775	0.000 ( +0.1% )

gpetruc-bot · 2019-11-05T09:41:03Z

Automatic test started, see https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/pipelines/1202133/builds

guitargeek · 2020-07-21T12:43:58Z

Hi @mariadalfonso, the commits here are not exactly the same as in the cms-sw PR (cms-sw#30436). There is just one additional commit from cms-sw#30273 on which my developments depended. I think we should also backport this boost commit in cms-sw together with the nano-types PR to really keep the nano-types changes nicely in sync.

gpetruc-bot · 2020-07-21T13:06:04Z

Automatic test started, see https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/pipelines/1812488/builds

gpetruc-bot

Automatic test report for 1812488

gitlab pipeline at https://gitlab.cern.ch/cms-nanoAOD/nanoAOD-integration/pipelines/1812488/builds
outputs at https://cms-nanoaod-integration.web.cern.ch/integration/test_pr_401/

Code integration

Code checks passed for this PR

Code format passed for this PR

Tests

Long test data102X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data106Xul17 (9000 events): passed, no significant changes; dqm plots: all, diff
Long test data106Xul18 (9000 events): passed, no significant changes; dqm plots: all, diff
Long test data80X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data94X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data94X2016 (10000 events): passed, no significant changes; dqm plots: all, diff
Long test data94Xv2 (10000 events): passed, no significant changes; dqm plots: all, diff
Long test mc102X (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc106Xul16 (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc106Xul17 (9000 events): passed, with differences; dqm plots: all, diff
Long test mc106Xul18 (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc80X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test mc94X (10000 events): passed, no significant changes; dqm plots: all, diff
Long test mc94X2016 (9000 events): passed, no significant changes; dqm plots: all, diff
Long test mc94Xv2 (9000 events): passed, no significant changes; dqm plots: all, diff
Test mc_94Xv2: passed
Test mc_102X: passed
Test data_94X: passed
Test data_102X: passed

Disk size report

Sample	kb/event	ref kb/event	diff
TTbar MC 102X	1.998	1.997	0.001 ( +0.0% )
TTbar MC 94Xv1	2.054	2.054	0.000 ( +0.0% )
TTbar MC 94Xv2	2.095	2.095	-0.000 ( -0.0% )
TTbar MC 94X2016	1.889	1.886	0.002 ( +0.1% )
TTbar MC 80X	2.006	2.007	-0.001 ( -0.1% )
Data 102X	1.068	1.067	0.000 ( +0.0% )
Data 94Xv1	1.019	1.018	0.000 ( +0.0% )
Data 80X	0.870	0.870	0.000 ( +0.0% )

mariadalfonso · 2020-07-22T11:44:18Z

tests are successful,
the cms-sw#30436 can be merged

guitargeek · 2020-07-22T11:52:38Z

Thanks, so I can close this PR I guess.

arizzi added the to be tested by bot label Sep 24, 2019

gpetruc-bot added test scheduled (bot) test started (bot) and removed to be tested by bot test scheduled (bot) labels Sep 24, 2019

gpetruc-bot added the code checks ok (bot) label Sep 24, 2019

gpetruc-bot added the dqm config update (bot) label Sep 24, 2019

gpetruc-bot reviewed Sep 24, 2019

View reviewed changes

gpetruc-bot added test ok (bot) and removed test started (bot) labels Sep 24, 2019

peruzzim added to be tested by bot and removed code checks ok (bot) dqm config update (bot) test ok (bot) labels Nov 5, 2019

gpetruc-bot added test scheduled (bot) test started (bot) and removed to be tested by bot test scheduled (bot) labels Nov 5, 2019

gpetruc-bot added the code checks ok (bot) label Nov 5, 2019

camolezi and others added 9 commits July 21, 2020 14:37

Replaced boost::ptr_vector for std::vector

66aed4f

don't handle column types redundantly anymore

ec6e81a

make ColumnType and enum class for type safety

88a35e0

fix clang warning in SimpleFlatTableProducer.h

ffc5cd6

reorganize more redundant checks in FlatTable

12bf18d

throw exceptions in NanoAOD if column type not supported

aec5fec

Avoid use of const_cast with helper function

b6f40d2

introduce edm::Span

43b9b40

change unneeded universal reference to const&

4dbd8d1

mariadalfonso added test started (bot) test scheduled (bot) to be tested by bot and removed test started (bot) test scheduled (bot) labels Jul 21, 2020

gpetruc-bot added test scheduled (bot) and removed to be tested by bot test scheduled (bot) labels Jul 21, 2020

gpetruc-bot added test started (bot) code checks ok (bot) code format ok (bot) and removed code checks ok (bot) code format ok (bot) test started (bot) labels Jul 21, 2020

gpetruc-bot reviewed Jul 21, 2020

View reviewed changes

guitargeek closed this Jul 22, 2020

guitargeek deleted the master-cmsswmaster_nano-types branch July 24, 2020 12:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't handle column types redundantly anymore #401

Don't handle column types redundantly anymore #401

guitargeek commented Sep 24, 2019

guitargeek commented Sep 24, 2019

arizzi commented Sep 24, 2019

gpetruc-bot commented Sep 24, 2019

guitargeek commented Sep 24, 2019 •

edited

Loading

gpetruc-bot commented Sep 24, 2019

arizzi commented Sep 24, 2019

guitargeek commented Sep 24, 2019 •

edited

Loading

arizzi commented Sep 24, 2019

gpetruc-bot commented Sep 24, 2019 via email

guitargeek commented Sep 24, 2019 •

edited

Loading

gpetruc-bot left a comment

gpetruc-bot commented Nov 5, 2019

guitargeek commented Jul 21, 2020

gpetruc-bot commented Jul 21, 2020

gpetruc-bot left a comment

mariadalfonso commented Jul 22, 2020

guitargeek commented Jul 22, 2020

Don't handle column types redundantly anymore #401

Don't handle column types redundantly anymore #401

Conversation

guitargeek commented Sep 24, 2019

guitargeek commented Sep 24, 2019

arizzi commented Sep 24, 2019

gpetruc-bot commented Sep 24, 2019

guitargeek commented Sep 24, 2019 • edited Loading

gpetruc-bot commented Sep 24, 2019

arizzi commented Sep 24, 2019

guitargeek commented Sep 24, 2019 • edited Loading

arizzi commented Sep 24, 2019

gpetruc-bot commented Sep 24, 2019 via email

guitargeek commented Sep 24, 2019 • edited Loading

gpetruc-bot left a comment

Choose a reason for hiding this comment

Automatic test report for 1112208

Code integration

Tests

Disk size report

gpetruc-bot commented Nov 5, 2019

guitargeek commented Jul 21, 2020

gpetruc-bot commented Jul 21, 2020

gpetruc-bot left a comment

Choose a reason for hiding this comment

Automatic test report for 1812488

Code integration

Tests

Disk size report

mariadalfonso commented Jul 22, 2020

guitargeek commented Jul 22, 2020

guitargeek commented Sep 24, 2019 •

edited

Loading

guitargeek commented Sep 24, 2019 •

edited

Loading

guitargeek commented Sep 24, 2019 •

edited

Loading