Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add energy regression and particle identification to RecoHGCal/TICL (reopened) #27917

Merged
merged 5 commits into from
Sep 25, 2019

Conversation

riga
Copy link
Contributor

@riga riga commented Sep 2, 2019

PR description

This PR adds energy regression and particle identification for trackster objects to the TrackstersProducer of the RecoHGCal/TICL package. Both the regressed energy and ID probabilities for different types of particles are the outputs of a single neural network, stored as a TensorFlow model. The first version of this model is proposed in

cms-data/RecoHGCal-TICL#1.

This PR is a clean version of #27634. The changes since the last review comments can be found here.

Performance

Tested with workflow 20634.0 (TTbar_14TeV_TuneCUETP8M1_2026D41PU with 200 PU).

The CPU runtime for inference scales linearly with the batch size, i.e., the number of tracksters, as expected for the no_threads TensorFlow session. The average runtime per trackster is ~0.6 ms (see slope below). The time includes trackster selection, preprocessing and filling of input data, the actual inference, and handling of results (== PatternRecognitionbyCA::energyRegressionAndID).

runtime

The memory consumption, obtained using the SimpleMemoryCheck with and without running the regression and ID, is 178 MB for a maximum batch size of ~750 tracksters.

To get an estimate of the memory used per event, I queried VmRSS of /proc/self/status (link) at the beginning and at the end of PatternRecognitionbyCA::energyRegressionAndID, and also before and after loading the graph and setting up the session.

  • Load graph: + 3.6 MB
  • Load session: +7.2 MB
  • energyRegressionAndID:
    memory

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2019

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2019

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-27917/11736

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2019

A new Pull Request was created by @riga (Marcel R.) for master.

It involves the following packages:

DataFormats/HGCalReco
RecoHGCal/TICL

@perrotta, @cmsbuild, @kpedro88, @slava77 can you please review it and eventually sign? Thanks.
@felicepantaleo, @apsallid, @rovere, @clelange, @hatakeyamak this is something you requested to watch as well.
@davidlange6, @slava77, @fabiocos you are the release manager for this.

cms-bot commands are listed here

@perrotta
Copy link
Contributor

perrotta commented Sep 2, 2019

please test workflow 20493.52 with cms-sw/cmsdist#5120, cms-data/RecoHGCal-TICL#1

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2019

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-sw/cmsdist#5120,cms-data/RecoHGCal-TICL#1
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/2316/console Started: 2019/09/02 18:01

std::array<float, 8> id_probabilities;

// convenience method to return the ID probability for a certain particle type
inline float id_probability(ParticleType type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

}

void TrackstersProducer::globalEndJob(TrackstersCache* cache) {
if (cache->eidGraphDef != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to protect a delete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

const TrackstersCache *trackstersCache = dynamic_cast<const TrackstersCache *>(cache);
if (trackstersCache->eidGraphDef != nullptr) {
eidSession_ = tensorflow::createSession(trackstersCache->eidGraphDef);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else ???

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the eidGraphDef in the cache is not set (which happens when no graph path was set) the eidSession_ should not be set as well and the inference is not performed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this distinguish a bug upstream from a desired silent do nothing behavior?
The code should not be allowed to silently skip when it was expected to do something.


// mount the tensorflow graph onto the session when set
const TrackstersCache *trackstersCache = dynamic_cast<const TrackstersCache *>(cache);
if (trackstersCache->eidGraphDef != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually one has to test if the dynamic_cast failed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

theGraph_ = std::make_unique<HGCGraph>();
min_cos_theta_ = conf.getParameter<double>("min_cos_theta");
min_cos_pointing_ = conf.getParameter<double>("min_cos_pointing");
missing_layers_ = conf.getParameter<int>("missing_layers");
min_clusters_per_ntuplet_ = conf.getParameter<int>("min_clusters_per_ntuplet");
max_delta_time_ = conf.getParameter<double>("max_delta_time");
eidInputName_ = conf.getParameter<std::string>("eid_input_name");
Copy link
Contributor

@VinInn VinInn Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all configurable data members should be declared const and initialized as
theGraph_(std::make_unique<HGCGraph>()), etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make const all all configurable data members

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

for (int k = seenClusters[j]; k < eidNClusters_; k++) {
float *features = &input.tensor<float, 4>()(i, j, k, 0);
for (int l = 0; l < eidNFeatures_; l++, features++) {
*features = 0.f;
Copy link
Contributor

@VinInn VinInn Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not
*(features++) as everywhere else?
just for consistency...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔︎

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2019

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2019

Comparison job queued.

@cmsbuild
Copy link
Contributor

Pull request #27917 was updated. @perrotta, @cmsbuild, @kpedro88, @slava77 can you please check and sign again.

@riga
Copy link
Contributor Author

riga commented Sep 23, 2019

@perrotta I made all members const. Also, I ran igprof and browsed through the results. This is the output for the added energyRegressionAndID method:

igprof

Fyi: Running igprof-navigator on an lxplus machine (in a CMSSW env) returned an empty page in my browser and the process complains about a "SQLite header and source version mismatch". In the end, I used

docker run --rm -ti -p 8080:8080 -v `pwd`/thefile.sql3:/thefile.sql3 igprof/igprof igprof-navigator /thefile.sql3

Maybe this can go on the Twiki page you pointed me to in case other people have the same issue.

@perrotta
Copy link
Contributor

please test workflow 20493.52 with cms-sw/cmsdist#5120, cms-data/RecoHGCal-TICL#1

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 23, 2019

The tests are being triggered in jenkins.
Tested with other pull request(s) cms-sw/cmsdist#5120,cms-data/RecoHGCal-TICL#1
https://cmssdt.cern.ch/jenkins/job/ib-run-pr-tests/2621/console Started: 2019/09/23 14:12

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Comparison job queued.

@cmsbuild
Copy link
Contributor

Comparison is ready
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b9a975/2621/summary.html

Comparison Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 34
  • DQMHistoTests: Total histograms compared: 2958092
  • DQMHistoTests: Total failures: 1
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 2957750
  • DQMHistoTests: Total skipped: 341
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 33 files compared)
  • Checked 145 log files, 15 edm output root files, 34 DQM output files

@perrotta
Copy link
Contributor

+1

  • Energy regression and particle id implemented for HGCal trackster objects
  • When enabled, cpu and memory consumption shown to be acceptable even at 200 PU
  • TrackstersProducer left configured to skip regression and id: therefore no change is expected in the default config: jenkins tests pass and show no differences

@fabiocos
Copy link
Contributor

@kpedro88 could you please check this PR?

@kpedro88
Copy link
Contributor

+upgrade

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @davidlange6, @slava77, @smuzaffar, @fabiocos (and backports should be raised in the release meeting by the corresponding L2)

@fabiocos
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 81891b0 into cms-sw:master Sep 25, 2019
@fwyzard fwyzard mentioned this pull request May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants