✨ Add parameter IO support for more formats supported by pandas #896

patrickhaetti · 2021-11-10T18:05:44Z

Change summary

added xlsx & ods parameter:
- added xlsx.py in builtin/io/pandas. Replace inf / -inf with NaN when saving and replace NaN with inf in 'maximum' and -inf in 'minimum 'column of Dataframe ✨ Use ∞ instead of inf in excel parameter files #888
- setup.cfg: add xlsx = glotaran.builtin.io.pandas.xlsx
- added test_pandas_parameters.py and add 'data' subfolder which includes parameter.xlsx and parameter.yaml
- added tests with parameter files *.xlsx and *.ods
- added openpyxl>=3.0.9 to setup.cfg / install_requires
- added odfpy>=1.4.1 to setup.cfg / install_requires
- add openpyxl==3.0.9 to requirements_dev.txt to read/write xlsx

Added test files to pandas/tests/data with parameter.yaml and create csv, tsv, xlsx parameter files including one type of each parameter definition.
added tsv parameter:
- add tsv.py in builtin/io/pandas for tab-separated-value parameter
- setup.cfg: add tsv = glotaran.builtin.io.pandas.tsv
- added tsv.py to builtin/io/pandas.
- added parameter.tsv in test/data

Checklist

✔️ Passing the tests (mandatory for all PR's)
👌 Closes issue (mandatory for ✨ feature and 🩹 bug fix PR's)
🧪 Adds new tests for the feature (mandatory for ✨ feature and 🩹 bug fix PR's)
📚 Adds documentation of the feature

github-actions · 2021-11-10T18:06:00Z

👈 Launch a binder notebook on branch patrickhaetti/pyglotaran/create-xlsx-parameter

codecov · 2021-11-11T18:31:11Z

Codecov Report

Merging #896 (9602f53) into main (491d5f2) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main    #896     +/-   ##
=======================================
+ Coverage   85.6%   85.8%   +0.1%     
=======================================
  Files         90      92      +2     
  Lines       4909    4959     +50     
  Branches     927     933      +6     
=======================================
+ Hits        4207    4258     +51     
  Misses       552     552             
+ Partials     150     149      -1

Impacted Files	Coverage Δ
glotaran/builtin/io/pandas/csv.py	`100.0% <100.0%> (ø)`
glotaran/builtin/io/pandas/tsv.py	`100.0% <100.0%> (ø)`
glotaran/builtin/io/pandas/xlsx.py	`100.0% <100.0%> (ø)`
glotaran/parameter/parameter_group.py	`88.1% <100.0%> (ø)`
glotaran/utils/io.py	`100.0% <100.0%> (ø)`
glotaran/parameter/parameter.py	`95.9% <0.0%> (+0.3%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 491d5f2...9602f53. Read the comment docs.

s-weigand

Thanks for the contribution, but there are some changes needed (see review).

In addition:

It would also be useful to have a *.csv file for testing that it looks like it should.
Having an option for writing to csv e.g. strip_default=True would be useful when csv files are edited in excel (to reduce code duplication best factor the replacing out in a separate function which then can be unit tested).
Please also add a plugin for tsv (tab separated values) the only difference to csv should be sep=+\t in the pandas method.

@jsnel @joernweissenborn

What do you think about adding openpyxl as a runtime dependency?
What do you think about setting all default parameter values to empty strings, this would reduce cognitive load and make parameter files easier to read.

glotaran/builtin/io/pandas/test/data/parameter.yaml

glotaran/builtin/io/pandas/test/test_pandas_parameters.py

requirements_dev.txt

joernweissenborn · 2021-11-11T21:35:25Z

Thanks for the contribution, but there are some changes needed (see review).

In addition:

It would also be useful to have a *.csv file for testing that it looks like it should.

Having an option for writing to csv e.g. strip_default=True would be useful when csv files are edited in excel (to reduce code duplication best factor the replacing out in a separate function which then can be unit tested).

Please also add a plugin for tsv (tab separated values) the only difference to csv should be sep=+\t in the pandas method.

@jsnel @joernweissenborn

What do you think about adding openpyxl as a runtime dependency?

What do you think about setting all default parameter values to empty strings, this would reduce cognitive load and make parameter files easier to read.

I am ok with both.

jsnel · 2021-11-17T19:18:30Z

What do you think about adding openpyxl as a runtime dependency?

What do you think about setting all default parameter values to empty strings, this would reduce cognitive load and make parameter files easier to read.

Yes to both.

joernweissenborn · 2021-11-19T01:00:42Z

@patrickhaetti Looking at the commit mess I suggest you close this PR and and open a new one, with clean commit history. I think @glotaran/maintainers would also prefer it.

s-weigand · 2021-11-19T10:50:21Z

@patrickhaetti Looking at the commit mess I suggest you close this PR and and open a new one, with clean commit history. I think @glotaran/maintainers would also prefer it.

Let's keep the PR but I will remove the mess 😉

github-actions · 2021-11-19T11:08:59Z

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff v0.5.1 vs. main

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [96b42630]       [9602f531]
     <v0.5.1>                   
       51.9±0.3ms       52.1±0.4ms     1.01  BenchmarkOptimize.time_optimize(False, False, False)
         56.7±6ms       56.5±0.4ms     1.00  BenchmarkOptimize.time_optimize(False, False, True)
       52.1±0.3ms       52.5±0.3ms     1.01  BenchmarkOptimize.time_optimize(False, True, False)
       55.5±0.3ms        59.0±10ms     1.06  BenchmarkOptimize.time_optimize(False, True, True)
       62.8±0.4ms       63.9±0.7ms     1.02  BenchmarkOptimize.time_optimize(True, False, False)
        66.5±30ms        71.4±10ms     1.07  BenchmarkOptimize.time_optimize(True, False, True)
       63.0±0.4ms       64.1±0.4ms     1.02  BenchmarkOptimize.time_optimize(True, True, False)
        68.3±30ms        67.4±10ms     0.99  BenchmarkOptimize.time_optimize(True, True, True)
             200M             203M     1.01  IntegrationTwoDatasets.peakmem_optimize
       1.75±0.05s       1.65±0.08s     0.94  IntegrationTwoDatasets.time_optimize

Benchmark diff main vs. PR

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [491d5f24]       [9602f531]
        54.0±20ms       52.1±0.4ms     0.97  BenchmarkOptimize.time_optimize(False, False, False)
       55.3±0.5ms       56.5±0.4ms     1.02  BenchmarkOptimize.time_optimize(False, False, True)
       51.5±0.3ms       52.5±0.3ms     1.02  BenchmarkOptimize.time_optimize(False, True, False)
         56.0±2ms        59.0±10ms     1.05  BenchmarkOptimize.time_optimize(False, True, True)
       62.7±0.5ms       63.9±0.7ms     1.02  BenchmarkOptimize.time_optimize(True, False, False)
        69.7±40ms        71.4±10ms     1.02  BenchmarkOptimize.time_optimize(True, False, True)
       63.0±0.2ms       64.1±0.4ms     1.02  BenchmarkOptimize.time_optimize(True, True, False)
        67.3±30ms        67.4±10ms     1.00  BenchmarkOptimize.time_optimize(True, True, True)
             200M             203M     1.01  IntegrationTwoDatasets.peakmem_optimize
       1.75±0.08s       1.65±0.08s     0.94  IntegrationTwoDatasets.time_optimize

s-weigand · 2021-11-19T11:19:11Z

@patrickhaetti I cleaned up the commit history please make sure to reset your local branch to the remote state.

E.g. if your remote is called origin

git reset --hard orgin/create-xlsx-parameter

If you have uncommitted changes just stash them before (git stash) and pop the stash afterward (git stash pop).
If you have local commits you didn't push yet just create a branch at the current state(git branch save-xlsx-branch) and cherry-pick the commits over after you reset to the remote branch.

patrickhaetti · 2021-11-19T15:23:14Z

@s-weigand thank you for cleaning up, looks way more better now

s-weigand · 2021-11-25T01:22:58Z

I will implement the strip_default=True functionality on the parameter level, since we will also need this for yaml round trips.

jsnel · 2021-12-12T02:01:36Z

In this PR result comparison was failing on a singular point in the residual. This is not a problem with this PR but reason to further improve the github actions that does the results comparison, see issue #937 which will be fixed in addressed in #937.

FAILED .github/test_result_consistency.py::test_result_data_var_consistency[study_transient_absorption_target_analysis-residual]

E AssertionError: Result data_var data mismatch: 'residual' in 'dataset1.nc'.
E With sum of absolute difference: 2.1183739743033033e-07 and shape: (335, 346)
E Mean difference: 1.827602428007336e-12
E
E assert False
E + where False = <function allclose.._allclose at 0x7efc69aaa280>(<xarray.DataArray 'residual' (time: 335, spectral: 346)>\narray([[-4.030000e-03, 1.100000e-04, -6.014408e-15, ..., 2....95 0.02343 0.04457 ... 869.9 889.9 909.9\n * spectral (spectral) float64 378.0 379.4 380.9 382.3 ... 851.8 853.1 854.4, <xarray.DataArray 'residual' (time: 335, spectral: 346)>\narray([[-4.030000e-03, 1.100000e-04, -6.017322e-15, ..., 2....95 0.02343 0.04457 ... 869.9 889.9 909.9\n * spectral (spectral) float64 378.0 379.4 380.9 382.3 ... 851.8 853.1 854.4, atol=array([[1.934e-10, 1.934e-10, 1.934e-10, ..., 1.934e-10, 1.934e-10,\n 1.934e-10],\n [1.934e-10, 1.934e-10, ...1.934e-10,\n 1.934e-10],\n [1.934e-10, 1.934e-10, 1.934e-10, ..., 1.934e-10, 1.934e-10,\n 1.934e-10]]), rtol=1e-05, print_fail=20)

.github/test_result_consistency.py:221: AssertionError
----------------------------- Captured stdout call -----------------------------
allclose first 1 failures:
(87, 344): 4.579833638334098e-06 4.579591726559538e-06
=========================== short test summary info ============================
FAILED .github/test_result_consistency.py::test_result_data_var_consistency[study_transient_absorption_target_analysis-residual]
======================== 1 failed, 296 passed in 5.59s =========================
Error: Process completed with exit code 1.

Refine calculation of epsilon for residual check Should resolve issue signaled in glotaran#896

* 🚇Update test_result_consistency Update float_resolution for comparison of spectral values. Should resolve issues with result comparison signaled in #927 * 🚇👌 Further refinement of test_result_consistency Refine calculation of epsilon for residual check Should resolve issues with result comparison signaled in #896 * 🚇👌 Failing tests now also show the tolerances used for the comparison

Functions added it utils/io.py and applied in csv.py. Function safe_parameters_fillna for loading and safe_parameters_replace for saving parameters. Now it will be checked if columns exist before replacing inf/-inf and empty strings.

Take out 'parameter_diff_param.yaml' from builtin/io/pandas/test/ which had varied values of 'parameter.yaml'. Taken out test functions from 'test_pandas_parameters.py' which were based on 'parameter_diff_param.yaml'.

Added ods parameter to xlsx.py Added 'ods' to /glotaran/builtin/io/pandas/xlsx.py. tests added to 'test_pandas_parameters.py' add odfpy>=1.4.1 to setup.cfg Append 'ods' to docstrings.

…x.py Use functions from glotaran.utils.io. Corrected spelling mistake in csv.py

For functions safe_parameters_fillna and safe_parameters_replace

Improved readability (`parameter` -> `parameters`, 'testparameters' -> 'test_parameters') More distinctive filenames from just 'parameter.xlsx' to 'reference_parameter.xlsx' so it doesn't pollute the parameter* file namespace when searching through the project.

glotaran/deprecation/modules/test/test_parameter_parameter_group.py

joernweissenborn

@s-weigand Only one Q for before my lgtm, it is in the review.

joernweissenborn

lgtm

✨ Added 'replace_infinfinity' optional argument to keep inf values in csv and tsv files

sourcery-ai · 2022-01-30T19:13:40Z

Sourcery Code Quality Report

✅ Merging this PR will increase code quality in the affected files by 0.04%.

Quality metrics	Before	After	Change
Complexity	2.29 ⭐	2.24 ⭐	-0.05 👍
Method Length	49.87 ⭐	51.65 ⭐	1.78 👎
Working memory	6.12 ⭐	6.09 ⭐	-0.03 👍
Quality	78.38% ⭐	78.42% ⭐	0.04% 👍

Other metrics	Before	After	Change
Lines	1690	1780	90

Changed files	Quality Before	Quality After	Quality Change
glotaran/parameter/parameter_group.py	75.47% ⭐	75.47% ⭐	0.00%
glotaran/parameter/test/test_parameter_group.py	74.28% 🙂	74.28% 🙂	0.00%
glotaran/plugin_system/test/test_project_io_registration.py	90.16% ⭐	90.16% ⭐	0.00%
glotaran/utils/io.py	76.31% ⭐	77.49% ⭐	1.18% 👍
glotaran/utils/test/test_io.py	77.70% ⭐	78.03% ⭐	0.33% 👍

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
glotaran/utils/io.py	_load_datasets	15 🙂	156 😞	12 😞	44.60% 😞	Try splitting into smaller methods. Extract out complex expressions
glotaran/parameter/parameter_group.py	ParameterGroup.from_dataframe	14 🙂	134 😞	10 😞	51.47% 🙂	Try splitting into smaller methods. Extract out complex expressions
glotaran/parameter/parameter_group.py	ParameterGroup.markdown	3 ⭐	128 😞	11 😞	60.51% 🙂	Try splitting into smaller methods. Extract out complex expressions
glotaran/parameter/parameter_group.py	ParameterGroup.__repr__	6 ⭐	72 🙂	12 😞	63.49% 🙂	Extract out complex expressions
glotaran/utils/test/test_io.py	test_load_datasets_single_dataset	0 ⭐	208 ⛔	7 🙂	64.44% 🙂	Try splitting into smaller methods

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

sonarqubecloud · 2022-01-30T19:15:55Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
2.0% Duplication

patrickhaetti · 2022-01-31T09:41:06Z

Hi everybody, I am happy to hear the merge happened!

patrickhaetti requested review from s-weigand and a team as code owners November 10, 2021 18:05

s-weigand mentioned this pull request Nov 10, 2021

renaming parameter package from "csv" to "pandas" #844 #895

Closed

4 tasks

s-weigand changed the title ~~Create xlsx parameter #844~~ Add parameter IO support for more formats supported by pandas Nov 10, 2021

s-weigand requested changes Nov 11, 2021

View reviewed changes

s-weigand mentioned this pull request Nov 16, 2021

🐛 Optimized parameters are missing StdErr when saved to file #902

Closed

s-weigand force-pushed the create-xlsx-parameter branch from f2da31e to 631dca8 Compare November 19, 2021 11:04

patrickhaetti requested review from joernweissenborn and jsnel as code owners November 23, 2021 14:34

s-weigand added this to the v0.6.0 milestone Nov 23, 2021

patrickhaetti force-pushed the create-xlsx-parameter branch from 997883c to c6906df Compare November 24, 2021 20:43

patrickhaetti force-pushed the create-xlsx-parameter branch from ddf8d49 to 62e3ddc Compare November 26, 2021 17:20

s-weigand force-pushed the create-xlsx-parameter branch 3 times, most recently from 0467af6 to a44cb82 Compare December 1, 2021 08:38

jsnel force-pushed the create-xlsx-parameter branch from a44cb82 to 04b5a0e Compare December 1, 2021 23:17

jsnel added a commit to jsnel/pyglotaran that referenced this pull request Dec 12, 2021

🚇👌 Further refinement of test_result_consistency

a115606

Refine calculation of epsilon for residual check Should resolve issue signaled in glotaran#896

jsnel added a commit to jsnel/pyglotaran that referenced this pull request Dec 12, 2021

🚇👌 Further refinement of test_result_consistency

5044316

Refine calculation of epsilon for residual check Should resolve issue signaled in glotaran#896

patrickhaetti and others added 14 commits January 30, 2022 19:12

Updated Docstrings in pandas parameter package

1404967

Take out additional parameter file

93d00cb

Take out 'parameter_diff_param.yaml' from builtin/io/pandas/test/ which had varied values of 'parameter.yaml'. Taken out test functions from 'test_pandas_parameters.py' which were based on 'parameter_diff_param.yaml'.

Added ods parameter to xlsx.py

71de262

Added ods parameter to xlsx.py Added 'ods' to /glotaran/builtin/io/pandas/xlsx.py. tests added to 'test_pandas_parameters.py' add odfpy>=1.4.1 to setup.cfg Append 'ods' to docstrings.

Use functions safe_parameters_fillna + safe_parameters_replace in xls…

294d200

…x.py Use functions from glotaran.utils.io. Corrected spelling mistake in csv.py

Added tests in utils/tests/test_io.py

417a2c3

For functions safe_parameters_fillna and safe_parameters_replace

♻️🧪 Renamed functions in utils.io and raised test coverage to 100%

1087038

♻️🧪 Refactored tests to reduce code duplication

5f70690

🧪 Adjusted reference files to pass direct comparison test

d50b73b

👌 Added as_optimized argument to save_parameters methods

501e110

📚 Updated module descriptions

e09eaf5

🚧 Added change to changelog

fc9479f

👌 Made as_optimized keyword only

6203cfe

s-weigand dismissed their stale review via 6203cfe January 30, 2022 18:12

s-weigand force-pushed the create-xlsx-parameter branch from f7058cf to 6203cfe Compare January 30, 2022 18:12

joernweissenborn reviewed Jan 30, 2022

View reviewed changes

glotaran/deprecation/modules/test/test_parameter_parameter_group.py Outdated Show resolved Hide resolved

joernweissenborn reviewed Jan 30, 2022

View reviewed changes

joernweissenborn previously approved these changes Jan 30, 2022

View reviewed changes

👌 Made csv and tsv drop inf values consistently by default

37bb74a

✨ Added 'replace_infinfinity' optional argument to keep inf values in csv and tsv files

s-weigand dismissed joernweissenborn’s stale review via 37bb74a January 30, 2022 19:04

🧪🗑️ Restored original behaviour of deprecated ParameterGroup.to_csv

9602f53

s-weigand approved these changes Jan 30, 2022

View reviewed changes

s-weigand merged commit f59b92c into glotaran:main Jan 30, 2022

This was referenced Feb 2, 2022

✨ Use ∞ instead of inf in excel parameter files #888

Closed

✨ Support all everything what pandas dataframe supports for parameter io #844

Closed

s-weigand mentioned this pull request Oct 27, 2022

♻️ Refactor: Model and Parameters #1135

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add parameter IO support for more formats supported by pandas #896

✨ Add parameter IO support for more formats supported by pandas #896

patrickhaetti commented Nov 10, 2021 •

edited

Loading

github-actions bot commented Nov 10, 2021

codecov bot commented Nov 11, 2021 •

edited

Loading

s-weigand left a comment •

edited

Loading

joernweissenborn commented Nov 11, 2021

jsnel commented Nov 17, 2021

joernweissenborn commented Nov 19, 2021

s-weigand commented Nov 19, 2021

github-actions bot commented Nov 19, 2021 •

edited

Loading

s-weigand commented Nov 19, 2021

patrickhaetti commented Nov 19, 2021

s-weigand commented Nov 25, 2021

jsnel commented Dec 12, 2021 •

edited

Loading

joernweissenborn left a comment

joernweissenborn left a comment

sourcery-ai bot commented Jan 30, 2022

sonarqubecloud bot commented Jan 30, 2022

patrickhaetti commented Jan 31, 2022 •

edited

Loading

✨ Add parameter IO support for more formats supported by pandas #896

✨ Add parameter IO support for more formats supported by pandas #896

Conversation

patrickhaetti commented Nov 10, 2021 • edited Loading

Change summary

Checklist

github-actions bot commented Nov 10, 2021

codecov bot commented Nov 11, 2021 • edited Loading

Codecov Report

s-weigand left a comment • edited Loading

Choose a reason for hiding this comment

joernweissenborn commented Nov 11, 2021

jsnel commented Nov 17, 2021

joernweissenborn commented Nov 19, 2021

s-weigand commented Nov 19, 2021

github-actions bot commented Nov 19, 2021 • edited Loading

s-weigand commented Nov 19, 2021

patrickhaetti commented Nov 19, 2021

s-weigand commented Nov 25, 2021

jsnel commented Dec 12, 2021 • edited Loading

joernweissenborn left a comment

Choose a reason for hiding this comment

joernweissenborn left a comment

Choose a reason for hiding this comment

sourcery-ai bot commented Jan 30, 2022

Sourcery Code Quality Report

Legend and Explanation

sonarqubecloud bot commented Jan 30, 2022

patrickhaetti commented Jan 31, 2022 • edited Loading

patrickhaetti commented Nov 10, 2021 •

edited

Loading

codecov bot commented Nov 11, 2021 •

edited

Loading

s-weigand left a comment •

edited

Loading

github-actions bot commented Nov 19, 2021 •

edited

Loading

jsnel commented Dec 12, 2021 •

edited

Loading

patrickhaetti commented Jan 31, 2022 •

edited

Loading