-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π§ͺπ Add integration test result validation #754
Conversation
Codecov Report
@@ Coverage Diff @@
## staging #754 +/- ##
=======================================
Coverage 84.5% 84.5%
=======================================
Files 75 75
Lines 4200 4200
Branches 756 756
=======================================
Hits 3549 3549
Misses 518 518
Partials 133 133 Continue to review full report at Codecov.
|
For this workflow to pass we might need to fix how |
12a381c
to
17c55bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
17c55bb
to
fc4f5cf
Compare
a839137
to
9467b39
Compare
9467b39
to
647ab30
Compare
The
Since set does not preserve order. Using |
dc6bfb9
to
bb7c55e
Compare
@jsnel and I did a long debugging session today where we found the following:
|
40b5a03
to
8509de0
Compare
Error summary:
|
3148781
to
06cf708
Compare
Testing 0.4.1 against 0.4.1: results should actually be identical. For further quantification of difference use: ```python diff = current_data.data - expected_var_value.data diff_abs_sum = np.sum(np.abs(diff)) if diff_abs_sum>0: print(f"{expected_var_name:<42s}: {diff_abs_sum:.4g}") ```
Before the cwd of the subprocess was changed which might have lead to problems when using the debugger, this method should be saver not to mess with the pyglotaran repository itself.
β¦_dataset' Currently this example is under review for changes and show numerical instability. Ref: glotaran/pyglotaran-examples PR42
See also in this PR: 8804a48
This example needs to be reviewed since it shows numerical instability.
weighted_data were always calculated and now will only be calculated when weights are applied
This improves sensible usage of errors caused by floating-point inaccuracy and take SVD vector scaling with SV into account
This way we can see failing tests for all data_vars instead of a failing test aborting the loop, hiding other failing tests.
Using the environment variable COMPARE_RESULTS_LOCAL which can be set in tox.ini a local path can be specified to look for the results folder to use as a reference in lieu of the (gold standard) comparison-results branch in the pyglotaran-examples repository Added some comments about which tolerances are used and their origin.
92dc411
to
7522d3a
Compare
The spectral coordinate is (logically) missing from some data variables
for the (right) singular vectors, if they are transposed, *but* throw a warning.
Kudos, SonarCloud Quality Gate passed!Β Β 0 Bugs No Coverage information |
Benchmark is done. Checkout the benchmark result page. Benchmark diffParametrized benchmark signatures: BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)
|
* π§ͺ Added result consistency test script * π Added result consistency testing to the integration test workflow * π Propagate git interaction errors to inform users * π§ Add `pytest-allclose` for more usefull error reporting * π Use colored pytest output for better readability * π Show the commits used to create the results * π Added tests for optimized parameters * π§Ή Renamed variables used for comparing to expected and current * π Print up to 20 different values if "allclose" fails * Adjust tolerances to pass on CI if reference and current are the same * π Make git interactions more save by passing the folder to use to git * π Implemented 'EXAMPLE_BLOCKLIST' and added 'transient_absorption_two_dataset' * π Only lower absolute tolerance for SVD comparison * π Added 'ex_spectral_guidance' to EXAMPLE_BLOCKLIST * π Added instructions to locally run result consistency test * π Added dataset file name to report on failing test * π Added data_var name to error reporting * π Special cased missing weighted_data * π Added label displaying on test_result_parameter_consistency fail * π©Ή Fixed line length issue * π Use value difference to check data_vars * π Swap abs_diff and float_resolution, that way rtol has some effect * π Use float32 precision as absolute tolerance * π Improved error reporting on failure by showing mean difference * β»οΈ Refactored data_var tests not to run in a loop but using fixtures * π Made epsilon for residual scale with original data * π Add option to specify path for local comparison * π Allow missing coords in some variables * π©Ή Reorder dimensions before comparison The same PR was merged to the v0.4.1 maintenance branch for comparison, see #760 Co-authored-by: Joris Snellenburg <[email protected]>
Right now the integration tests only show breaking API changes, but checking validity is still a manual task.
This PR adds a way to compare the results to a "gold standard" (tag, branch, commit) set by this workflow on the
pyglotaran-examples
repo.Change summary
pyglotaran-examples
Checklist
Closes issues
closes #753