Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🩹 Fix Performance Regressions #740

Merged

Conversation

joernweissenborn
Copy link
Member

@joernweissenborn joernweissenborn commented Jul 8, 2021

This PR Addresses performance regressions introduce in staging branch.

  • Adds pytest benchmark for analysis.Problem class
    *Adds numerous performance tweaks
  • Regressions: some internal objects are no longer available via public API, namely the reduced clps.

Checklist

  • ✔️ Passing the tests (mandatory for all PR's)
  • 👌 Closes issue (mandatory for ✨ feature and 🩹 bug fix PR's)

Closes issues

None

@joernweissenborn joernweissenborn requested a review from jsnel as a code owner July 8, 2021 13:36
@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2021

Binder 👈 Launch a binder notebook on branch joernweissenborn/pyglotaran/performance/fixregressions

@codecov
Copy link

codecov bot commented Jul 8, 2021

Codecov Report

Merging #740 (3287157) into staging (cf16572) will increase coverage by 0.0%.
The diff coverage is 96.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##           staging    #740   +/-   ##
=======================================
  Coverage     80.1%   80.2%           
=======================================
  Files           70      70           
  Lines         3904    3950   +46     
  Branches       676     692   +16     
=======================================
+ Hits          3130    3169   +39     
- Misses         653     660    +7     
  Partials       121     121           
Impacted Files Coverage Δ
glotaran/examples/sequential.py 0.0% <0.0%> (ø)
glotaran/analysis/util.py 90.9% <91.6%> (-0.6%) ⬇️
glotaran/analysis/problem_grouped.py 95.9% <97.7%> (+0.5%) ⬆️
glotaran/analysis/problem.py 90.0% <100.0%> (-1.8%) ⬇️
glotaran/analysis/problem_ungrouped.py 98.0% <100.0%> (-0.3%) ⬇️
glotaran/analysis/simulation.py 82.9% <100.0%> (+0.4%) ⬆️
...tin/megacomplexes/baseline/baseline_megacomplex.py 88.2% <100.0%> (-0.7%) ⬇️
...coherent_artifact/coherent_artifact_megacomplex.py 73.4% <100.0%> (-0.6%) ⬇️
...n/builtin/megacomplexes/decay/decay_megacomplex.py 82.2% <100.0%> (-0.3%) ⬇️
...tin/megacomplexes/spectral/spectral_megacomplex.py 81.5% <100.0%> (-0.5%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf16572...3287157. Read the comment docs.

@s-weigand
Copy link
Member

s-weigand commented Jul 8, 2021

I think those 4 functions could be one since benchmark is explicitly called.
Never mind the benchmark fixture can only be used once per tests function or you will get a pytest_benchmark.fixture.FixtureAlreadyUsed error.

joernweissenborn and others added 4 commits July 9, 2021 22:35
@s-weigand s-weigand force-pushed the performance/fixregressions branch from acc295e to bd05752 Compare July 9, 2021 20:35
@joernweissenborn joernweissenborn changed the title Added benchmark for Problem class Fix Perfomance Regrssions Jul 10, 2021
@s-weigand s-weigand force-pushed the performance/fixregressions branch from 3876441 to 20a021a Compare July 10, 2021 13:07
@s-weigand s-weigand force-pushed the performance/fixregressions branch from 20a021a to bc4e951 Compare July 10, 2021 13:14
@s-weigand s-weigand changed the title Fix Perfomance Regrssions 🩹 Fix Performance Regressions Jul 10, 2021
@jsnel
Copy link
Member

jsnel commented Jul 14, 2021

A note on the spectral model validation

Results for paramGUI case 05

Compare the estimated parameters from pyglotaran

  * __spectral__:
    | _Label_   |      _Value_ |    _StdErr_ |
    |-----------|--------------|-------------|
    | amp1      |     1        | 0           |
    | loc1      | 14282        | 0.0193176   |
    | width1    |   808.004    | 0.0343336   |
    | skew1     |     0.400081 | 8.92553e-05 |
    | amp2      |     1        | 0           |
    | loc2      | 13707        | 0.0331115   |
    | width2    |   654.987    | 0.0556831   |
    | skew2     |    -0.299748 | 0.000158642 |

with those from paramGUI as recored in specFitSummary.txt:
Parameters:

     Estimate Std. Error t value Pr(>|t|)    
t1  1.428e+04  1.971e-02  724710   <2e-16 ***
t2  8.080e+02  3.503e-02   23069   <2e-16 ***
t3  4.001e-01  9.106e-05    4394   <2e-16 ***
t4  1.371e+04  3.378e-02  405740   <2e-16 ***
t5  6.550e+02  5.681e-02   11529   <2e-16 ***
t6 -2.997e-01  1.619e-04   -1852   <2e-16 ***

Also note the similarity in plotted results: paramGUI vs pyglotaran
So you see the values estimated are essentially the same (when units are compatible).

By compatible I mean that the units of the spectral model parameters are the same as the spectral axis of the dataset (in this case both cm^-1), this is not possible in all cases (even with the option energy_spectrum: True).
Some mechanism is needed which not only allows one to specify whether a megacomplex is 'energy-spectrum' but also if the dataset is 'energy-spectrum', so you can 'cancel out' if they are already compatible and only convert when needed. In the future some kind of type-annotation of units is needed (nm, cm^-1, eV, etc) for all data-types: dataset axis, model axis, parameters.

For now the results are good enough to merge into staging, at least from this perspective.

Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking note of my comment w.r.t the spectral model validation, this PR is good enough to merge to staging,
pending possibly some more detailed general comments by @s-weigand.

Copy link
Member

@s-weigand s-weigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional issues I found.

indices: dict[str, int] | None,

indices should have type dict[str, int] not dict[str, int] | None, since the mega complexes in builtin have that signature and don't defend against None.

@joernweissenborn
Copy link
Member Author

Additional issues I found.

indices: dict[str, int] | None,

indices should have type dict[str, int] not dict[str, int] | None, since the mega complexes in builtin have that signature and don't defend against None.

Right, i will fix that.

joernweissenborn and others added 2 commits July 14, 2021 21:08
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jul 14, 2021

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.14%.

Quality metrics Before After Change
Complexity 4.20 ⭐ 4.70 ⭐ 0.50 👎
Method Length 82.52 🙂 81.56 🙂 -0.96 👍
Working memory 9.06 🙂 8.99 🙂 -0.07 👍
Quality 63.93% 🙂 64.07% 🙂 0.14% 👍
Other metrics Before After Change
Lines 2571 2544 -27
Changed files Quality Before Quality After Quality Change
glotaran/analysis/problem.py 81.72% ⭐ 80.73% ⭐ -0.99% 👎
glotaran/analysis/problem_grouped.py 55.74% 🙂 57.43% 🙂 1.69% 👍
glotaran/analysis/problem_ungrouped.py 62.94% 🙂 66.84% 🙂 3.90% 👍
glotaran/analysis/simulation.py 64.72% 🙂 63.93% 🙂 -0.79% 👎
glotaran/analysis/util.py 63.96% 🙂 62.39% 🙂 -1.57% 👎
glotaran/analysis/test/models.py 65.96% 🙂 66.66% 🙂 0.70% 👍
glotaran/analysis/test/test_optimization.py 25.03% 😞 25.08% 😞 0.05% 👍
glotaran/analysis/test/test_problem.py 61.90% 🙂 62.58% 🙂 0.68% 👍
glotaran/examples/sequential.py 45.29% 😞 46.88% 😞 1.59% 👍
glotaran/model/test/test_model.py 69.28% 🙂 69.21% 🙂 -0.07% 👎

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
glotaran/analysis/test/test_optimization.py test_optimization 19 😞 520 ⛔ 15 😞 25.08% 😞 Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/problem_grouped.py GroupedProblem._append_to_grouped_bag 12 🙂 288 ⛔ 23 ⛔ 28.33% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/util.py calculate_matrix 19 😞 152 😞 13 😞 40.33% 😞 Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/problem_ungrouped.py UngroupedProblem._calculate_residual 13 🙂 228 ⛔ 12 😞 40.50% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/problem_grouped.py GroupedProblem.init_bag 10 🙂 176 😞 16 ⛔ 41.68% 😞 Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@s-weigand
Copy link
Member

Last benchmark before review suggestions got applies

All benchmarks:

       before           after         ratio
     [dc00e6da]       [8cc6e6e0]
     <v0.4.0>                   
-      58.2±0.5ms       39.7±0.9ms     0.68  BenchmarkOptimize.time_optimize(False, False, False)
-         275±2ms         128±40ms     0.46  BenchmarkOptimize.time_optimize(False, False, True)
         82.9±3ms         67.2±2ms    ~0.81  BenchmarkOptimize.time_optimize(False, True, False)
+      82.4±0.9ms         128±20ms     1.55  BenchmarkOptimize.time_optimize(False, True, True)
-      58.6±0.9ms         48.5±1ms     0.83  BenchmarkOptimize.time_optimize(True, False, False)
-         267±5ms         109±30ms     0.41  BenchmarkOptimize.time_optimize(True, False, True)
         83.1±1ms         78.3±2ms     0.94  BenchmarkOptimize.time_optimize(True, True, False)
+        84.2±2ms         144±10ms     1.71  BenchmarkOptimize.time_optimize(True, True, True)
             183M             180M     0.98  IntegrationTwoDatasets.peakmem_create_result
             202M             198M     0.98  IntegrationTwoDatasets.peakmem_optimize
-         255±3ms          200±4ms     0.78  IntegrationTwoDatasets.time_create_result
-      5.43±0.02s       1.50±0.08s     0.28  IntegrationTwoDatasets.time_optimize

@github-actions
Copy link
Contributor

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)

All benchmarks:

       before           after         ratio
     [dc00e6da]       [dc14f88a]
     <v0.4.0>                   
-        61.6±2ms         39.3±1ms     0.64  BenchmarkOptimize.time_optimize(False, False, False)
-         281±2ms         124±20ms     0.44  BenchmarkOptimize.time_optimize(False, False, True)
-        82.1±5ms         66.0±4ms     0.80  BenchmarkOptimize.time_optimize(False, True, False)
         87.1±5ms         127±20ms    ~1.45  BenchmarkOptimize.time_optimize(False, True, True)
         61.1±2ms         52.1±3ms    ~0.85  BenchmarkOptimize.time_optimize(True, False, False)
-         299±4ms         123±40ms     0.41  BenchmarkOptimize.time_optimize(True, False, True)
         89.7±2ms         78.9±2ms    ~0.88  BenchmarkOptimize.time_optimize(True, True, False)
         87.4±4ms         146±20ms    ~1.67  BenchmarkOptimize.time_optimize(True, True, True)
             181M             184M     1.02  IntegrationTwoDatasets.peakmem_create_result
             201M             202M     1.01  IntegrationTwoDatasets.peakmem_optimize
-         255±4ms          200±3ms     0.78  IntegrationTwoDatasets.time_create_result
       5.70±0.05s       1.34±0.08s    ~0.23  IntegrationTwoDatasets.time_optimize

@s-weigand s-weigand merged commit f792747 into glotaran:staging Jul 15, 2021
jsnel pushed a commit that referenced this pull request Aug 14, 2021
* Added benchmark for Problem class

* removed print

* ♻️ 'Refactored by Sourcery'

* 🧹 Moved glotaran/analysis/test/test_relations.py

to benchmark/pytest/analysis/test_problem.py

Use 'pytest benchmark/pytest/' to run the benchmarks

* Numerous performance tweaks

* Don't weight data with ones if no weight supplied

* Fic duplicate call to create result_dataset

* Removed dead code

* Switched back to pure numpy in problem

* Fixed example

* Cleanunp

* Update glotaran/analysis/problem_ungrouped.py

Co-authored-by: Sebastian Weigand <[email protected]>

* Update glotaran/analysis/problem_ungrouped.py

Co-authored-by: Sebastian Weigand <[email protected]>

* Update glotaran/analysis/util.py

Co-authored-by: Sebastian Weigand <[email protected]>

Co-authored-by: Sourcery AI <>
Co-authored-by: s-weigand <[email protected]>
jsnel pushed a commit to jsnel/pyglotaran that referenced this pull request Aug 14, 2021
* Added benchmark for Problem class

* removed print

* ♻️ 'Refactored by Sourcery'

* 🧹 Moved glotaran/analysis/test/test_relations.py

to benchmark/pytest/analysis/test_problem.py

Use 'pytest benchmark/pytest/' to run the benchmarks

* Numerous performance tweaks

* Don't weight data with ones if no weight supplied

* Fic duplicate call to create result_dataset

* Removed dead code

* Switched back to pure numpy in problem

* Fixed example

* Cleanunp

* Update glotaran/analysis/problem_ungrouped.py

Co-authored-by: Sebastian Weigand <[email protected]>

* Update glotaran/analysis/problem_ungrouped.py

Co-authored-by: Sebastian Weigand <[email protected]>

* Update glotaran/analysis/util.py

Co-authored-by: Sebastian Weigand <[email protected]>

Co-authored-by: Sourcery AI <>
Co-authored-by: s-weigand <[email protected]>
jsnel pushed a commit that referenced this pull request Sep 16, 2021
* Added benchmark for Problem class

* removed print

* ♻️ 'Refactored by Sourcery'

* 🧹 Moved glotaran/analysis/test/test_relations.py

to benchmark/pytest/analysis/test_problem.py

Use 'pytest benchmark/pytest/' to run the benchmarks

* Numerous performance tweaks

* Don't weight data with ones if no weight supplied

* Fic duplicate call to create result_dataset

* Removed dead code

* Switched back to pure numpy in problem

* Fixed example

* Cleanunp

* Update glotaran/analysis/problem_ungrouped.py

Co-authored-by: Sebastian Weigand <[email protected]>

* Update glotaran/analysis/problem_ungrouped.py

Co-authored-by: Sebastian Weigand <[email protected]>

* Update glotaran/analysis/util.py

Co-authored-by: Sebastian Weigand <[email protected]>

Co-authored-by: Sourcery AI <>
Co-authored-by: s-weigand <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants