Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert decrease of precision introduced in ea3f693d #43253

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

maxgalli
Copy link
Contributor

@maxgalli maxgalli commented Nov 13, 2023

The decrease of precision for shower shape variables, introduced in this PR, seems to introduce unwanted features in the shapes.
A comparison of Run2 samples with precision 8 and 10 can be found at the following links:

  • precision 10, no unwanted features (S4 plot)
  • precision 8, unwanted features in the tail (S4 plot)
  • precision 10, no unwanted features (R9 plot)
  • precision 8, unwanted features in the tail (R9 plot)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43253/37632

  • This PR adds an extra 20KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @maxgalli (Massimiliano Galli) for master.

It involves the following packages:

  • PhysicsTools/NanoAOD (xpog)

@cmsbuild, @simonepigazzini, @vlimant can you please review it and eventually sign? Thanks.
@AnnikaStein, @gpetruc this is something you requested to watch as well.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@simonepigazzini
Copy link
Contributor

enable nano

@simonepigazzini
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1e7569/35766/summary.html
COMMIT: 0fe35c7
CMSSW: CMSSW_14_0_X_2023-11-13-1100/el8_amd64_gcc12
Additional Tests: NANO
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43253/35766/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 224 lines to the logs
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363028
  • DQMHistoTests: Total failures: 41
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3362965
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

NANO Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 90 differences found in the comparisons
  • DQMHistoTests: Total files compared: 15
  • DQMHistoTests: Total histograms compared: 16335
  • DQMHistoTests: Total failures: 147
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 16188
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 14 files compared)
  • Checked 34 log files, 16 edm output root files, 15 DQM output files

Nano size comparison Summary:

Sample kb/ev ref kb/ev diff kb/ev ev/s/thd ref ev/s/thd diff rate mem/thd ref mem/thd
2500.0 2.539 2.535 0.004 ( +0.2% ) 5.30 5.27 +0.5% 2.109 2.123
2500.001 2.685 2.680 0.005 ( +0.2% ) 4.78 4.82 -0.7% 2.127 2.559
2500.002 2.622 2.616 0.006 ( +0.2% ) 4.96 4.97 -0.2% 2.134 2.540
2500.01 1.311 1.307 0.003 ( +0.2% ) 9.92 9.79 +1.3% 2.210 2.235
2500.011 1.726 1.721 0.005 ( +0.3% ) 5.36 5.34 +0.4% 2.012 2.433
2500.012 1.572 1.568 0.004 ( +0.3% ) 7.66 7.76 -1.4% 2.028 2.347
2500.1 2.190 2.187 0.003 ( +0.1% ) 5.34 5.38 -0.7% 1.958 1.982
2500.2 2.301 2.298 0.003 ( +0.1% ) 6.20 6.19 +0.0% 1.845 1.899
2500.21 1.177 1.174 0.003 ( +0.3% ) 4.38 4.43 -1.2% 1.868 2.200
2500.211 1.539 1.535 0.004 ( +0.3% ) 3.91 3.89 +0.4% 1.972 2.277
2500.3 2.057 2.053 0.004 ( +0.2% ) 12.97 13.05 -0.6% 1.883 1.887
2500.31 1.252 1.248 0.004 ( +0.3% ) 20.30 20.63 -1.6% 2.276 2.288
2500.311 1.638 1.635 0.004 ( +0.2% ) 14.14 14.48 -2.3% 2.363 2.367
2500.4 2.057 2.053 0.004 ( +0.2% ) 12.92 13.09 -1.3% 1.884 1.885
2500.5 19.556 19.556 0.000 ( +0.0% ) 1.21 1.26 -4.2% 1.302 1.290

@simonepigazzini
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@antoniovilela
Copy link
Contributor

+1

@nsmith-
Copy link
Contributor

nsmith- commented Dec 6, 2023

Wouldn't another solution here be to bin the relevant variables in powers of two? e.g. if s4 were binned with 128 bins between 0-1 I would expect the bumps to go away.

@JaLuka98
Copy link
Contributor

JaLuka98 commented Dec 6, 2023

Wouldn't another solution here be to bin the relevant variables in powers of two? e.g. if s4 were binned with 128 bins between 0-1 I would expect the bumps to go away.

Hi @nsmith-, nice idea, although I am not completely following. To what extent is this a "solution"? People would like to work with the variables as inputs to MVA algorithms, which usually benefit from smooth input distributions (continuous, differentiable). We had cases of almost catastrophic failure as especially density estimation methods seem to have problems with such "pathological" cases. Rebinning might make bumps go away but it does not help for training a model, right? Or did I misunderstand your suggestion?

@davidlange6
Copy link
Contributor

in either case, variables have a finite precision, so nothing is continuous if you bin finely enough.... (eg, use more bins, you will see the same effect in the "correct" plots)

@maxgalli
Copy link
Contributor Author

maxgalli commented Dec 7, 2023

Wouldn't another solution here be to bin the relevant variables in powers of two? e.g. if s4 were binned with 128 bins between 0-1 I would expect the bumps to go away.

Hi Nick, thanks. Adding to what @JaLuka98 already correctly said: the binning used in the plots is just a way to show the difference in behaviour between Run2 (precision 10, where we did not see any problem with the training) and Run3 (where the precision was decreased to 8). We already saw that if we choose a finer binning the bumps "go away" (as one would expect), but that does not solve the underlying problem unfortunately.
Concerning the reason why this is a problem for the flows, I have hypotheses but no definitive answers. I saw in other trainings that the flows are not particularly good at simulating "deltas" (see for instance the peak at 0 in the isolation variables, which has to be smeared in preprocessing for the flow to be able to learn it) and by decreasing the precision we are sort of "making more deltas" in the distribution, so maybe that's the reason why the performance decreases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants