🧪🚇 Add integration test result validation #754

s-weigand · 2021-07-17T18:15:08Z

Right now the integration tests only show breaking API changes, but checking validity is still a manual task.
This PR adds a way to compare the results to a "gold standard" (tag, branch, commit) set by this workflow on the pyglotaran-examples repo.

Change summary

Adds numerical comparison to "gold standard" results defined by pyglotaran-examples

Checklist

✔️ Passing the tests (mandatory for all PR's)
🧪 Adds new tests for the feature (mandatory for ✨ feature and 🩹 bug fix PR's)

Closes issues

closes #753

github-actions · 2021-07-17T18:15:20Z

👈 Launch a binder notebook on branch s-weigand/pyglotaran/compare-results

codecov · 2021-07-17T18:18:20Z

Codecov Report

Merging #754 (dd6ae0c) into staging (2a6b3e6) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           staging    #754   +/-   ##
=======================================
  Coverage     84.5%   84.5%           
=======================================
  Files           75      75           
  Lines         4200    4200           
  Branches       756     756           
=======================================
  Hits          3549    3549           
  Misses         518     518           
  Partials       133     133

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2a6b3e6...dd6ae0c. Read the comment docs.

s-weigand · 2021-07-17T18:24:23Z

For this workflow to pass we might need to fix how problem.save_parameters_for_history is implemented in optimize and make a fix release 0.4.1, since it is done wrong according to @joernweissenborn 🤷‍♀️.
At least we now have a way to compare the results across versions.

jsnel

LGTM

s-weigand · 2021-08-14T21:08:07Z

The clp_label swapping for ex_two_datasets in this CI run
might be related to the implementation of involved_compartments

pyglotaran/glotaran/builtin/megacomplexes/decay/k_matrix.py

Line 47 in d2b297b

compartments = list(set(compartments))

Since set does not preserve order.

Using frozenset might solve this, or even better using the user-provided order from the defined model (not sure if we still have access to that on this point).

s-weigand · 2021-08-16T01:30:20Z

@jsnel and I did a long debugging session today where we found the following:

equal area penalties are applied differently (compartment order) between 0.4.1 and staging (failing ex_two_datasets and possibly simultaneous_analysis_6d_disp); because clps and clp_labels are not aligned the areas of the wrong spectra are calculated (in some cases where e.g. s1,s2,s3 -> s1,s3,s2 or similar)
the missing coordinate spectral is due to different code paths when auto determining index dependency UngroupedProblem.create_index_dependent_result_dataset vs. UngroupedProblem.create_index_independent_result_dataset (also this is kinda random or maybe I messed the tests up Missing coordinate: 'spectral' in 'dataset1.nc', data_var 'matrix' vs. spectral is apparently present)
problems in simultaneous_analysis_6d_disp might be due to missing noise and/or the differences in how weights are applied

s-weigand · 2021-08-26T04:05:41Z

Updated the "gold standard".

Error summary:

optimized_parameters (simultaneous_analysis_6d_disp)
matrix (simultaneous_analysis_6d_disp)
weighted_residual (simultaneous_analysis_3d_weight) ~1e-8
weighted_residual (ex_two_datasets) ~1e-8
weighted_residual (simultaneous_analysis_3d_nodisp) ~1e-8

Testing 0.4.1 against 0.4.1: results should actually be identical. For further quantification of difference use: ```python diff = current_data.data - expected_var_value.data diff_abs_sum = np.sum(np.abs(diff)) if diff_abs_sum>0: print(f"{expected_var_name:<42s}: {diff_abs_sum:.4g}") ```

Before the cwd of the subprocess was changed which might have lead to problems when using the debugger, this method should be saver not to mess with the pyglotaran repository itself.

…_dataset' Currently this example is under review for changes and show numerical instability. Ref: glotaran/pyglotaran-examples PR42

See also in this PR: 8804a48

This example needs to be reviewed since it shows numerical instability.

weighted_data were always calculated and now will only be calculated when weights are applied

This improves sensible usage of errors caused by floating-point inaccuracy and take SVD vector scaling with SV into account

See notes: https://numpy.org/doc/stable/reference/generated/numpy.allclose.html

This way we can see failing tests for all data_vars instead of a failing test aborting the loop, hiding other failing tests.

Using the environment variable COMPARE_RESULTS_LOCAL which can be set in tox.ini a local path can be specified to look for the results folder to use as a reference in lieu of the (gold standard) comparison-results branch in the pyglotaran-examples repository Added some comments about which tolerances are used and their origin.

The spectral coordinate is (logically) missing from some data variables

for the (right) singular vectors, if they are transposed, *but* throw a warning.

sonarcloud · 2021-09-05T14:31:27Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

github-actions · 2021-09-05T14:32:03Z

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [dc00e6da]       [dd6ae0c4]
     <v0.4.0>                   
-        68.9±1ms       49.2±0.7ms     0.71  BenchmarkOptimize.time_optimize(False, False, False)
-         375±4ms         62.9±2ms     0.17  BenchmarkOptimize.time_optimize(False, False, True)
         97.1±3ms         81.5±2ms    ~0.84  BenchmarkOptimize.time_optimize(False, True, False)
         97.4±2ms         92.1±6ms     0.95  BenchmarkOptimize.time_optimize(False, True, True)
         68.1±1ms       67.3±0.7ms     0.99  BenchmarkOptimize.time_optimize(True, False, False)
-         372±3ms        78.7±50ms     0.21  BenchmarkOptimize.time_optimize(True, False, True)
         96.9±3ms          105±2ms     1.08  BenchmarkOptimize.time_optimize(True, True, False)
         99.2±2ms         123±40ms    ~1.24  BenchmarkOptimize.time_optimize(True, True, True)
             182M             179M     0.98  IntegrationTwoDatasets.peakmem_create_result
             197M             197M     1.00  IntegrationTwoDatasets.peakmem_optimize
-         304±5ms         261±10ms     0.86  IntegrationTwoDatasets.time_create_result
        6.80±0.1s       2.21±0.07s    ~0.32  IntegrationTwoDatasets.time_optimize

* 🧪 Added result consistency test script * 🚇 Added result consistency testing to the integration test workflow * 👌 Propagate git interaction errors to inform users * 🔧 Add `pytest-allclose` for more usefull error reporting * 👌 Use colored pytest output for better readability * 👌 Show the commits used to create the results * 👌 Added tests for optimized parameters * 🧹 Renamed variables used for comparing to expected and current * 👌 Print up to 20 different values if "allclose" fails * Adjust tolerances to pass on CI if reference and current are the same * 👌 Make git interactions more save by passing the folder to use to git * 👌 Implemented 'EXAMPLE_BLOCKLIST' and added 'transient_absorption_two_dataset' * 👌 Only lower absolute tolerance for SVD comparison * 👌 Added 'ex_spectral_guidance' to EXAMPLE_BLOCKLIST * 📚 Added instructions to locally run result consistency test * 👌 Added dataset file name to report on failing test * 👌 Added data_var name to error reporting * 👌 Special cased missing weighted_data * 👌 Added label displaying on test_result_parameter_consistency fail * 🩹 Fixed line length issue * 👌 Use value difference to check data_vars * 👌 Swap abs_diff and float_resolution, that way rtol has some effect * 👌 Use float32 precision as absolute tolerance * 👌 Improved error reporting on failure by showing mean difference * ♻️ Refactored data_var tests not to run in a loop but using fixtures * 👌 Made epsilon for residual scale with original data * 👌 Add option to specify path for local comparison * 👌 Allow missing coords in some variables * 🩹 Reorder dimensions before comparison The same PR was merged to the v0.4.1 maintenance branch for comparison, see #760 Co-authored-by: Joris Snellenburg <[email protected]>

s-weigand requested a review from a team as a code owner July 17, 2021 18:15

s-weigand added the Type: Tooling Tools used for the project (CI, CD, docs etc.) label Jul 17, 2021

s-weigand marked this pull request as draft July 17, 2021 19:22

s-weigand force-pushed the compare-results branch 2 times, most recently from 12a381c to 17c55bb Compare July 20, 2021 21:45

jsnel approved these changes Jul 31, 2021

View reviewed changes

s-weigand force-pushed the compare-results branch from 17c55bb to fc4f5cf Compare August 8, 2021 14:27

s-weigand marked this pull request as ready for review August 8, 2021 18:00

s-weigand force-pushed the compare-results branch from a839137 to 9467b39 Compare August 13, 2021 19:51

jsnel linked an issue Aug 13, 2021 that may be closed by this pull request

🚇Add result validation step to CI #753

Closed

jsnel force-pushed the staging branch from 1062bab to 3f663cf Compare August 14, 2021 10:10

s-weigand requested a review from joernweissenborn as a code owner August 14, 2021 10:10

jsnel force-pushed the staging branch from 3f663cf to d2b297b Compare August 14, 2021 10:12

s-weigand force-pushed the compare-results branch from 9467b39 to 647ab30 Compare August 14, 2021 15:23

jsnel force-pushed the compare-results branch from 647ab30 to 22b1313 Compare August 15, 2021 14:07

s-weigand force-pushed the compare-results branch 2 times, most recently from dc6bfb9 to bb7c55e Compare August 15, 2021 17:37

s-weigand marked this pull request as draft August 16, 2021 00:52

jsnel mentioned this pull request Aug 23, 2021

🐛 Compartment ordering not not consistent everywhere (problem in spectral penalties) #787

Closed

jsnel force-pushed the compare-results branch 2 times, most recently from 40b5a03 to 8509de0 Compare August 25, 2021 02:03

s-weigand mentioned this pull request Aug 26, 2021

🧪🚇 Add integration test result validation (maintain/0.4.1) #760

Merged

2 tasks

s-weigand force-pushed the compare-results branch 2 times, most recently from 3148781 to 06cf708 Compare August 28, 2021 17:10

s-weigand and others added 19 commits September 4, 2021 02:57

👌 Print up to 20 different values if "allclose" fails

729c88f

👌 Make git interactions more save by passing the folder to use to git

fb72bf3

Before the cwd of the subprocess was changed which might have lead to problems when using the debugger, this method should be saver not to mess with the pyglotaran repository itself.

👌 Implemented 'EXAMPLE_BLOCKLIST' and added 'transient_absorption_two…

a3ba418

…_dataset' Currently this example is under review for changes and show numerical instability. Ref: glotaran/pyglotaran-examples PR42

👌 Only lower absolute tolerance for SVD comparison

2aff29c

See also in this PR: 8804a48

👌 Added 'ex_spectral_guidance' to EXAMPLE_BLOCKLIST

1f67b01

This example needs to be reviewed since it shows numerical instability.

📚 Added instructions to locally run result consistency test

8524bb5

👌 Added dataset file name to report on failing test

52e3cce

👌 Added data_var name to error reporting

ab9ed30

👌 Special cased missing weighted_data

3a47128

weighted_data were always calculated and now will only be calculated when weights are applied

👌 Added label displaying on test_result_parameter_consistency fail

f2d979c

🩹 Fixed line length issue

26f2c4e

👌 Use value difference to check data_vars

bb7dd4e

This improves sensible usage of errors caused by floating-point inaccuracy and take SVD vector scaling with SV into account

👌 Swap abs_diff and float_resolution, that way rtol has some effect

5fa4165

See notes: https://numpy.org/doc/stable/reference/generated/numpy.allclose.html

👌 Use float32 precission as absolute tolerance

7d795aa

👌 Improved error reporting on failure by showing mean difference

9a723a4

♻️ Refactored data_var tests not to run in a loop but using fixtures

a675150

This way we can see failing tests for all data_vars instead of a failing test aborting the loop, hiding other failing tests.

👌 Made epsilon for residual scale with original data

11342a1

jsnel force-pushed the compare-results branch from 92dc411 to 7522d3a Compare September 4, 2021 00:58

jsnel added 2 commits September 4, 2021 10:08

👌 Allow missing coords in some variables

056c0c4

The spectral coordinate is (logically) missing from some data variables

🩹 Reorder dimensions before comparison

dd6ae0c

for the (right) singular vectors, if they are transposed, *but* throw a warning.

jsnel marked this pull request as ready for review September 5, 2021 15:01

jsnel merged commit 5238409 into glotaran:staging Sep 5, 2021

jsnel deleted the compare-results branch September 5, 2021 18:59

s-weigand mentioned this pull request Sep 8, 2021

🚇Add result validation step to CI #753

Closed

jsnel mentioned this pull request Sep 16, 2021

✨✨✨ Megacomplex based models, full models (spectrotemporal), damped oscillations models (DOAS), 🧪 test result validation ✔️ #778

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🧪🚇 Add integration test result validation #754

🧪🚇 Add integration test result validation #754

s-weigand commented Jul 17, 2021 •

edited by jsnel

Loading

github-actions bot commented Jul 17, 2021

codecov bot commented Jul 17, 2021 •

edited

Loading

s-weigand commented Jul 17, 2021

jsnel left a comment

s-weigand commented Aug 14, 2021

s-weigand commented Aug 16, 2021 •

edited by jsnel

Loading

s-weigand commented Aug 26, 2021

sonarcloud bot commented Sep 5, 2021

github-actions bot commented Sep 5, 2021

🧪🚇 Add integration test result validation #754

🧪🚇 Add integration test result validation #754

Conversation

s-weigand commented Jul 17, 2021 • edited by jsnel Loading

Change summary

Checklist

Closes issues

github-actions bot commented Jul 17, 2021

codecov bot commented Jul 17, 2021 • edited Loading

Codecov Report

s-weigand commented Jul 17, 2021

jsnel left a comment

Choose a reason for hiding this comment

s-weigand commented Aug 14, 2021

s-weigand commented Aug 16, 2021 • edited by jsnel Loading

s-weigand commented Aug 26, 2021

sonarcloud bot commented Sep 5, 2021

github-actions bot commented Sep 5, 2021

s-weigand commented Jul 17, 2021 •

edited by jsnel

Loading

codecov bot commented Jul 17, 2021 •

edited

Loading

s-weigand commented Aug 16, 2021 •

edited by jsnel

Loading