Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

guitargeek · 2019-01-12T01:20:44Z

Hi,

I always felt the potential of machine learning applied to the ECAL is not fully tapped, while it's 2 dimensional nature provides a very good playing ground to test modern neural network architectures designed for image processing.

To enable quick studies in this direction for myself and everybody, I always wanted to (optionally) add the raw rec hit energies in a N times N matrix around the seed crystal to the Electron/PhotonMVANtuplizers. This is now possible, with N being a configurable parameter.

For this purpose, I wrote a new function EcalClusterLazyTools<T>::getEnergies and generally went over the EcalClusterTools/EcalClusterLazyTools duo, tying to make it a bit more slick. This includes:

new helper class to easily iterate over rectangular ranges around seed crystals
don't separate member function implementation from declaration for template classes because it adds a lot of distracting template lines (see the lazy tools)
detabify, removal of trailing whitespaces, occasionally better line breaks

cmsbuild · 2019-01-12T01:23:07Z

The code-checks are being triggered in jenkins.

cmsbuild · 2019-01-12T01:29:30Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-25633/7950

This PR adds an extra 56KB to repository

cmsbuild · 2019-01-12T01:29:57Z

A new Pull Request was created by @guitargeek (Jonas Rembser) for master.

It involves the following packages:

RecoCaloTools/Navigation
RecoEcal/EgammaClusterProducers
RecoEcal/EgammaCoreTools
RecoEgamma/ElectronIdentification
RecoEgamma/PhotonIdentification

@perrotta, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@Sam-Harper, @jainshilpi, @rovere, @argiro, @lgray, @varuns23 this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

guitargeek · 2019-01-12T01:39:30Z

RecoEcal/EgammaCoreTools/interface/EcalClusterTools.h

    }
-    // slow elegant version


I was a bit surprised by this comment here: why was that solution so slow? Is looping over the DetIDs twice and keeping a vector of them for the second loop so expensive? That statement prompted me to do this rectangle range solution where you only loop over the detIDs once. I hope it combines the speed of the previous solution with the "elegance" of the commented-out solution.

So this solution is slow because a cluster can own many many hits due to how PF clustering works. As you know, PF clustering shares the energy of a crystal between local energy maximums. It does this fairly globally and (initially at least) did not clean well. So you would end up with half the ECAL assigned as a "rec-hit" of the cluster but with fraction of 1E-8 or similar. This was later cleaned up to be >1E-4/E-5 (and may have been further changed) but at the time of writing you were running over a huge number of hits, hence the "slow" comment. Its not clear to me if this was actually measured to be slow or just thought to be from the above argument, I think it was actually slow from dimly remembered conversations.

One thing which I want to achieve in the shutdown is to review the minimum fraction (or fraction*energy) for a hit to be part of a PFCluster to further reduce this problem. Although due to the raised minimum fraction this may not be a problem anymore.

That is interesting, thanks for explaining! But I was talking about a different thing. You explained why recHitEnergy() can be slow, right? I was more talking about the fact that the comments labels this as slow (if you "unwrap" the functions and remove the frac related stuff):

CaloNavigator<DetId> cursor = CaloNavigator<DetId>( id, topology->getSubdetectorTopology( id ) ); std::vector<DetId> v; for ( int i = ixMin; i <= ixMax; ++i ) { for ( int j = iyMin; j <= iyMax; ++j ) { cursor.home(); cursor.offsetBy( i, j ); if ( *cursor != DetId(0) ) v.push_back( *cursor ); } } float energy = 0; for ( std::vector<DetId>::const_iterator it = v.begin(); it != v.end(); ++it ) { energy += recHitEnergy( *it, recHits ); }

and this as _fast:

float energy = 0; CaloNavigator<DetId> cursor = CaloNavigator<DetId>( id, topology->getSubdetectorTopology( id ) ); std::vector<DetId> v; for ( int i = ixMin; i <= ixMax; ++i ) { for ( int j = iyMin; j <= iyMax; ++j ) { cursor.home(); cursor.offsetBy( i, j ); energy += recHitEnergy( *cursor, recHits ) } }

In the "slow" solution, the DetIds in the rectangle are precomputed and stored, and in the "fast" solution the energies are already added in the loop where the DetHits are obtained (and the DetHits are not stored). If one believes that this really makes a difference, it made sense to implement this CaloRectangleRange I thought.

Ah you're right, I miss read the code. Okay, I have no idea :)

slava77 · 2019-01-12T02:06:14Z

@cmsbuild please test

cmsbuild · 2019-01-12T02:06:41Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/32547/console Started: 2019/01/12 07:04

cmsbuild · 2019-01-12T09:06:19Z

+1
Tested at: a1a713e
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25633/32547/summary.html

cmsbuild · 2019-01-12T09:06:24Z

Comparison job queued.

cmsbuild · 2019-01-12T10:50:28Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25633/32547/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 11968 differences found in the comparisons
DQMHistoTests: Total files compared: 33
DQMHistoTests: Total histograms compared: 3153717
DQMHistoTests: Total failures: 2390
DQMHistoTests: Total nulls: 7
DQMHistoTests: Total successes: 3151116
DQMHistoTests: Total skipped: 204
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.027 KiB( 32 files compared)
DQMHistoSizes: changed ( 136.731 ): 0.027 KiB JetMET/SUSYDQM
Checked 137 log files, 14 edm output root files, 33 DQM output files

guitargeek · 2019-01-12T16:58:13Z

Sorry I have no idea where these differences come from! I didn't see them neither in comparisons of the egamma MVA nutuplizer outputs (comparing to a more minimalist implementation of getting the energy matrix), nor in the outputs of the RecoEcal/EgammaCoreTools/test/testEcalCluster(Lazy)Tools.py. I will just update this PR with the revival of these tests, but otherwise I don't know how to proceed here. Anyway, it's maybe not so important.

perrotta · 2019-02-27T14:51:48Z

RecoEgamma/ElectronIdentification/plugins/ElectronMVANtuplizer.cc

 class ElectronMVANtuplizer : public edm::one::EDAnalyzer<edm::one::SharedResources>  {
   public:
      explicit ElectronMVANtuplizer(const edm::ParameterSet&);
-      ~ElectronMVANtuplizer() override;


Even if it is a "one" module, why to remove the destructor?

A counter argument, why declare it if the compiler can write the default?

I would naively say that otherwise the not overridden dtor of the base class is used instead, and this would fail in deleting specific members of the derived one: isn't it so? Or a virtual method is always overridden by a default one in the derived class?

The compiler will aways write a destructor for a class if you do not specifically state you will write the destructor. This is also true for virtual inheritance. I.e. the compiler will write the correct destructor for ElectronMVANtuplizer which will call the destructors for all member data of ElectronMVANtuplizer.

Thank you @Dr15Jones ! Comment retracted, then

perrotta · 2019-02-28T09:52:43Z

+1

Code changes are in line with the PR description and the following comments received during the review
The restructuring of the code does not affect the final outputs; I verified that also the computing timing is not modified in any significant way when using either real data or TTbar PU samples
Jenkins tests pass

cmsbuild · 2019-02-28T09:53:13Z

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

fabiocos · 2019-03-01T13:48:48Z

please test

the random unit test errors should be gone

cmsbuild · 2019-03-01T13:49:21Z

The tests are being triggered in jenkins.
https://cmssdt.cern.ch/jenkins/job/ib-any-integration/33352/console Started: 2019/03/01 14:49

cmsbuild · 2019-03-01T17:11:31Z

+1
Tested at: 634f84e
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25633/33352/summary.html

cmsbuild · 2019-03-01T17:11:36Z

Comparison job queued.

cmsbuild · 2019-03-01T19:08:53Z

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-25633/33352/summary.html

Comparison Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 32
DQMHistoTests: Total histograms compared: 3114826
DQMHistoTests: Total failures: 1
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3114628
DQMHistoTests: Total skipped: 197
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 31 files compared)
Checked 133 log files, 14 edm output root files, 32 DQM output files

fabiocos · 2019-03-03T17:53:34Z

+1

the updated test configurations run smoothly

cmsbuild added this to the CMSSW_10_5_X milestone Jan 12, 2019

cmsbuild added code-checks-pending comparison-pending orp-pending pending-signatures reconstruction-pending tests-pending labels Jan 12, 2019

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 12, 2019

guitargeek commented Jan 12, 2019

View reviewed changes

cmsbuild added tests-started and removed tests-pending labels Jan 12, 2019

cmsbuild added tests-approved and removed tests-started labels Jan 12, 2019

cmsbuild added comparison-available and removed comparison-pending labels Jan 12, 2019

cmsbuild added code-checks-pending comparison-pending and removed code-checks-approved comparison-available tests-approved labels Jan 12, 2019

cmsbuild added comparison-available and removed comparison-pending labels Feb 26, 2019

perrotta reviewed Feb 27, 2019

View reviewed changes

cmsbuild added fully-signed reconstruction-approved and removed pending-signatures reconstruction-pending labels Feb 28, 2019

cmsbuild added comparison-pending tests-pending and removed comparison-available tests-rejected labels Mar 1, 2019

cmsbuild added tests-started and removed tests-pending labels Mar 1, 2019

cmsbuild added tests-approved and removed tests-started labels Mar 1, 2019

cmsbuild added comparison-available and removed comparison-pending labels Mar 1, 2019

cmsbuild added orp-approved and removed orp-pending labels Mar 3, 2019

cmsbuild merged commit 268ea31 into cms-sw:master Mar 3, 2019

guitargeek deleted the Egamma_5x5_shapes branch March 4, 2019 08:16

slava77 mentioned this pull request Sep 7, 2021

[UBSAN] Undefined behavior in Reco* and TrackingTools reco packages #35036

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

guitargeek commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

guitargeek Jan 12, 2019 •

edited

Loading

Sam-Harper Jan 13, 2019

guitargeek Jan 13, 2019

Sam-Harper Jan 13, 2019

slava77 commented Jan 12, 2019

cmsbuild commented Jan 12, 2019 •

edited

Loading

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

guitargeek commented Jan 12, 2019

perrotta Feb 27, 2019

Dr15Jones Feb 27, 2019

perrotta Feb 27, 2019

Dr15Jones Feb 27, 2019

perrotta Feb 27, 2019

perrotta commented Feb 28, 2019

cmsbuild commented Feb 28, 2019

fabiocos commented Mar 1, 2019

cmsbuild commented Mar 1, 2019 •

edited

Loading

cmsbuild commented Mar 1, 2019

cmsbuild commented Mar 1, 2019

cmsbuild commented Mar 1, 2019

fabiocos commented Mar 3, 2019

Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

Some EcalClusterTools maintainance and energy matrix in Egamma MVA Ntuplizers #25633

Conversation

guitargeek commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

guitargeek Jan 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slava77 commented Jan 12, 2019

cmsbuild commented Jan 12, 2019 • edited Loading

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

cmsbuild commented Jan 12, 2019

guitargeek commented Jan 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perrotta commented Feb 28, 2019

cmsbuild commented Feb 28, 2019

fabiocos commented Mar 1, 2019

cmsbuild commented Mar 1, 2019 • edited Loading

cmsbuild commented Mar 1, 2019

cmsbuild commented Mar 1, 2019

cmsbuild commented Mar 1, 2019

fabiocos commented Mar 3, 2019

guitargeek Jan 12, 2019 •

edited

Loading

cmsbuild commented Jan 12, 2019 •

edited

Loading

cmsbuild commented Mar 1, 2019 •

edited

Loading