Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

👌 Allow more natural column names in pandas parameters file reading #1174

Merged
merged 6 commits into from
Nov 12, 2022

Conversation

s-weigand
Copy link
Member

@s-weigand s-weigand commented Nov 10, 2022

PR #1135 changed the column names from 'non-negative' -> 'non_negative' and 'standard-error' -> 'standard_error' which causes crashes in load_parameters when reading parameters files that were created before the merge (e.g. 'optimized_parameters.csv').

In addition to reenabling the reading of old parameter files saved with the pandas plugins (csv, tsv, xlsx, ods), this PR allows the same more natural/shorter column names which are allowed in yaml files for the pandas plugins and casts the column names to lower strings to counter capitalization problems caused by auto-capitalization from excel and the likes.

OPTION_NAMES_SERIALIZED = {
"expression": "expr",
"maximum": "max",
"minimum": "min",
"non_negative": "non-negative",
"standard_error": "standard-error",
}

And it removed the unintended requirement to define the non_negative and vary columns, which are optional and can be filled with default values.

Change summary

Checklist

  • ✔️ Passing the tests (mandatory for all PR's)
  • 🚧 Added changes to changelog (mandatory for all PR's)
  • 🧪 Adds new tests for the feature (mandatory for ✨ feature and 🩹 bug fix PR's)

@s-weigand s-weigand requested review from jsnel, joernweissenborn and a team as code owners November 10, 2022 22:21
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 10, 2022

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.85%.

Quality metrics Before After Change
Complexity 3.32 ⭐ 2.96 ⭐ -0.36 👍
Method Length 63.08 🙂 62.89 🙂 -0.19 👍
Working memory 6.83 🙂 6.66 🙂 -0.17 👍
Quality 74.79% 🙂 75.64% 0.85% 👍
Other metrics Before After Change
Lines 918 971 53
Changed files Quality Before Quality After Quality Change
glotaran/builtin/io/pandas/csv.py 79.27% ⭐ 78.41% ⭐ -0.86% 👎
glotaran/builtin/io/pandas/xlsx.py 91.00% ⭐ 87.20% ⭐ -3.80% 👎
glotaran/builtin/io/pandas/test/test_pandas_parameters.py 74.11% 🙂 78.35% ⭐ 4.24% 👍
glotaran/parameter/parameters.py 71.42% 🙂 71.26% 🙂 -0.16% 👎
glotaran/parameter/test/test_parameters.py 78.01% ⭐ 78.77% ⭐ 0.76% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
glotaran/parameter/parameters.py Parameters.from_dataframe 14 🙂 177 😞 11 😞 45.17% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/parameter/parameters.py param_dict_to_markdown 6 ⭐ 168 😞 13 😞 49.98% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/parameter/test/test_parameters.py test_parameters_array_conversion 1 ⭐ 228 ⛔ 9 🙂 57.56% 🙂 Try splitting into smaller methods
glotaran/builtin/io/pandas/test/test_pandas_parameters.py test_roundtrips 4 ⭐ 145 😞 8 🙂 63.75% 🙂 Try splitting into smaller methods
glotaran/parameter/parameters.py Parameters.get_label_value_and_bounds_arrays 4 ⭐ 94 🙂 11 😞 63.96% 🙂 Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@github-actions
Copy link
Contributor

Binder 👈 Launch a binder notebook on branch s-weigand/pyglotaran/fix-parameters-io

@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Base: 87.6% // Head: 87.7% // Increases project coverage by +0.1% 🎉

Coverage data is based on head (2397bf2) compared to base (e70f760).
Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1174     +/-   ##
=======================================
+ Coverage   87.6%   87.7%   +0.1%     
=======================================
  Files        103     103             
  Lines       4898    4904      +6     
  Branches     806     808      +2     
=======================================
+ Hits        4291    4304     +13     
+ Misses       489     486      -3     
+ Partials     118     114      -4     
Impacted Files Coverage Δ
glotaran/builtin/io/pandas/csv.py 100.0% <100.0%> (ø)
glotaran/builtin/io/pandas/xlsx.py 100.0% <100.0%> (ø)
glotaran/parameter/parameters.py 96.3% <100.0%> (+4.2%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

glotaran/parameter/parameters.py Outdated Show resolved Hide resolved
Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's move the column renaming out of the parameters.py, validity of column names should be checked at parsing time (it's part of the user-facing API and should be validated, and errors (e.g. typos) reported back to the user.
In this case, csv.py and xlsx.py can handle the renaming.

Some other remarks while reviewing related code:

  • from_dataframe implicitly introduces required column names (e.g. minimum, maximum, non_negative, vary) which are not required in other input formats (e.g. yaml) because they have default values. This should not behave differently in that regard.
  • please allow for shorthand notation as well (expr | expression, min | minimum, etc)

@jsnel
Copy link
Member

jsnel commented Nov 11, 2022

Also please, if possible, check for (accidental) capitalization and provide feedback to the user (error in yaml, warning in csv/xlsx). However, in the case of xlsx format just ignore it (lowercase it) (e.g. LibreOffice does auto capitalization of string labels).

PR glotaran#1135 changed the colmn names from 'non-negative' -> 'non_negative' and 'standard-error' -> 'standard_error' which causes crashes in load_parameters when reading parameters files that were created before the merge (e.g. 'optimized_parameters.csv').

This change allows reading of those old files, while saving still uses the new version
👌 Allow alternate notation used in yaml for pandas plugins
👌 Make column names case insensitive for pandas plugins
@s-weigand s-weigand marked this pull request as draft November 12, 2022 14:15
@s-weigand s-weigand force-pushed the fix-parameters-io branch 2 times, most recently from bb9696c to 1d196c3 Compare November 12, 2022 15:51
@s-weigand s-weigand changed the title 🩹 Fix reading of old pandas saved parameters files 👌 Allow more natural column names in pandas parameters file reading Nov 12, 2022
@s-weigand s-weigand marked this pull request as ready for review November 12, 2022 16:16
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@jsnel jsnel dismissed joernweissenborn’s stale review November 12, 2022 21:26

The requested changes has been implemented. parameters.py no longer does df.rename, its has moved to csv.py and xlsx.py

@jsnel jsnel merged commit 8116af0 into glotaran:main Nov 12, 2022
@jsnel jsnel deleted the fix-parameters-io branch November 12, 2022 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants