Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Refactor Result and Scheme loading to to use 'file' fields #903

Merged
merged 9 commits into from
Nov 18, 2021

Conversation

s-weigand
Copy link
Member

@s-weigand s-weigand commented Nov 15, 2021

This PR removes the file representation fields from the augmented dataclasses completely and thus simplifies the API
from:

scheme = Scheme(
model,
parameter,
{"dataset_1": dataset},
model_file="m.yml",
parameters_file="p.csv",
data_files={"dataset_1": "d.nc"},
)

to

    scheme = Scheme(
        model,
        parameter,
        {"dataset_1": dataset},
    )

Additional side effects and improvements:

  • There now is a glotaran.typing module
  • FileLoadable classes (Model, ParameterGroup, Scheme, Result, DatasetMapping) know their own file origin
  • There is a new convenience io function load_datasets which can load datasets in bulk, which then can be consumed by Scheme

Change summary

  • ♻️👌 Removed file fields in ProjectIo like classes and used unified field
  • ♻️🔌 Refactored load_dataset to always return xr.Dataset
  • ♻️ Added type 'StrOrPath' and refactored io plugins with new type
  • ✨ Implemented convenience function 'load_datasets'
  • ♻️👌 Made 'DatasetMapping.source_path' a property accessing the dataset
  • ♻️👌 Replaced all file_representation_field with file_loadable_field
  • ♻️✨ Factored making paths relative and posix style out and added support for Sequence like FileLoadable classes
  • ♻️ Refactored bool_str_repr after sourcery suggested a different change
  • ♻️🩹 Changed implementation of relative_posix_path to use os.path.relpath

Checklist

  • ✔️ Passing the tests (mandatory for all PR's)
  • 👌 Closes issue (mandatory for ✨ feature and 🩹 bug fix PR's)
  • 🧪 Adds new tests for the feature (mandatory for ✨ feature and 🩹 bug fix PR's)

Closes issues

closes #858

@s-weigand s-weigand requested review from joernweissenborn, jsnel and a team as code owners November 15, 2021 22:24
@github-actions
Copy link
Contributor

Binder 👈 Launch a binder notebook on branch s-weigand/pyglotaran/remove-file-fields

@codecov
Copy link

codecov bot commented Nov 15, 2021

Codecov Report

Merging #903 (ecdb930) into main (9865243) will increase coverage by 0.3%.
The diff coverage is 95.1%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #903     +/-   ##
=======================================
+ Coverage   84.8%   85.1%   +0.3%     
=======================================
  Files         81      85      +4     
  Lines       4610    4761    +151     
  Branches     851     880     +29     
=======================================
+ Hits        3910    4053    +143     
- Misses       558     561      +3     
- Partials     142     147      +5     
Impacted Files Coverage Δ
glotaran/builtin/io/yml/yml.py 90.5% <80.0%> (-0.2%) ⬇️
glotaran/project/dataclass_helpers.py 83.8% <81.4%> (+3.2%) ⬆️
glotaran/plugin_system/data_io_registration.py 97.2% <88.8%> (-2.8%) ⬇️
glotaran/builtin/io/folder/folder_plugin.py 97.6% <100.0%> (ø)
glotaran/io/__init__.py 100.0% <100.0%> (ø)
glotaran/model/model.py 85.6% <100.0%> (+0.2%) ⬆️
glotaran/parameter/parameter_group.py 89.3% <100.0%> (+0.2%) ⬆️
glotaran/parameter/parameter_history.py 79.5% <100.0%> (+2.8%) ⬆️
glotaran/plugin_system/io_plugin_utils.py 100.0% <100.0%> (ø)
glotaran/plugin_system/project_io_registration.py 100.0% <100.0%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9865243...ecdb930. Read the comment docs.

jsnel
jsnel previously approved these changes Nov 16, 2021
Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive piece of refactoring ♻️. LGTM.

If the field isn't an instance of the targetClass it will try to load the class instance from file. This also allows classes like scheme to be initialized with file paths directly.
When dataclasses with file loadable fileds are serialized the objects will be replaced with their source path.

In addition projectIO load and save functions will set the ``source_path`` attribute of the class instances.
Also ensure that paths are passed as posix formatted path
and file loadable wrapper class 'DatasetMapping'.

This also allows to load all datasets need for a opimization directly from file path passing a dict with the keys used for the dataset names and the paths as values.
This way it will update when the source_path of the dataset is updated e.g. by calling 'save_dataset'.
Also, removed file_representation_field compleatly.
and added support  for Sequence like FileLoadable classes
instead of pathlib.Pathrelative_to This prevents crashes as long at the files are on the same drive.
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 17, 2021

Sourcery Code Quality Report

❌  Merging this PR will decrease code quality in the affected files by 0.73%.

Quality metrics Before After Change
Complexity 3.96 ⭐ 4.11 ⭐ 0.15 👎
Method Length 41.47 ⭐ 43.76 ⭐ 2.29 👎
Working memory 6.55 🙂 6.65 🙂 0.10 👎
Quality 78.30% 77.57% -0.73% 👎
Other metrics Before After Change
Lines 4035 4266 231
Changed files Quality Before Quality After Quality Change
glotaran/builtin/io/folder/folder_plugin.py 55.31% 🙂 59.69% 🙂 4.38% 👍
glotaran/builtin/io/yml/yml.py 79.57% ⭐ 78.43% ⭐ -1.14% 👎
glotaran/builtin/io/yml/test/test_save_result.py 88.79% ⭐ 83.82% ⭐ -4.97% 👎
glotaran/builtin/io/yml/test/test_save_scheme.py 78.73% ⭐ 86.35% ⭐ 7.62% 👍
glotaran/deprecation/modules/test/test_project_scheme.py 75.33% ⭐ 75.33% ⭐ 0.00%
glotaran/io/init.py 87.99% ⭐ 87.99% ⭐ 0.00%
glotaran/model/model.py 70.82% 🙂 70.82% 🙂 0.00%
glotaran/parameter/parameter_group.py 69.41% 🙂 69.42% 🙂 0.01% 👍
glotaran/parameter/parameter_history.py 94.86% ⭐ 94.62% ⭐ -0.24% 👎
glotaran/plugin_system/data_io_registration.py 92.43% ⭐ 84.42% ⭐ -8.01% 👎
glotaran/plugin_system/io_plugin_utils.py 84.60% ⭐ 85.22% ⭐ 0.62% 👍
glotaran/plugin_system/project_io_registration.py 87.08% ⭐ 83.86% ⭐ -3.22% 👎
glotaran/plugin_system/test/test_data_io_registration.py 92.24% ⭐ 90.24% ⭐ -2.00% 👎
glotaran/plugin_system/test/test_project_io_registration.py 91.22% ⭐ 90.24% ⭐ -0.98% 👎
glotaran/project/dataclass_helpers.py 61.09% 🙂 59.78% 🙂 -1.31% 👎
glotaran/project/result.py 76.37% ⭐ 77.52% ⭐ 1.15% 👍
glotaran/project/scheme.py 72.83% 🙂 72.07% 🙂 -0.76% 👎
glotaran/project/test/test_dataclass_helpers.py 84.03% ⭐ 83.83% ⭐ -0.20% 👎
glotaran/project/test/test_result.py 80.28% ⭐ 80.10% ⭐ -0.18% 👎
glotaran/project/test/test_scheme.py 80.83% ⭐ 80.83% ⭐ 0.00%

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
glotaran/parameter/parameter_group.py ParameterGroup.from_dataframe 28 😞 267 ⛔ 13 😞 26.41% 😞 Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
glotaran/project/dataclass_helpers.py asdict 18 🙂 122 😞 13 😞 44.50% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/model/model.py Model.markdown 13 🙂 168 😞 10 😞 48.74% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/project/scheme.py Scheme.__post_init__ 16 🙂 127 😞 11 😞 48.75% 😞 Try splitting into smaller methods. Extract out complex expressions
glotaran/builtin/io/yml/yml.py YmlProjectIo.save_model 15 🙂 105 🙂 13 😞 49.06% 😞 Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 11 Code Smells

No Coverage information No Coverage information
1.8% 1.8% Duplication

@github-actions
Copy link
Contributor

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff v0.5.0rc1 vs. main

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)

All benchmarks:

       before           after         ratio
     [d05c042a]       [ecdb9304]
     <v0.5.0rc1>                 
         72.7±1ms         75.0±2ms     1.03  BenchmarkOptimize.time_optimize(False, False, False)
        99.8±30ms         154±20ms    ~1.54  BenchmarkOptimize.time_optimize(False, False, True)
       71.5±0.9ms         74.9±2ms     1.05  BenchmarkOptimize.time_optimize(False, True, False)
        86.6±30ms         147±40ms    ~1.70  BenchmarkOptimize.time_optimize(False, True, True)
         90.1±2ms         93.2±1ms     1.03  BenchmarkOptimize.time_optimize(True, False, False)
         97.8±4ms        99.9±30ms     1.02  BenchmarkOptimize.time_optimize(True, False, True)
         89.0±3ms         91.8±2ms     1.03  BenchmarkOptimize.time_optimize(True, True, False)
         101±20ms        99.2±30ms     0.98  BenchmarkOptimize.time_optimize(True, True, True)
             192M             196M     1.02  IntegrationTwoDatasets.peakmem_optimize
        1.93±0.1s       2.20±0.05s    ~1.14  IntegrationTwoDatasets.time_optimize

Benchmark diff main vs. PR

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)

All benchmarks:

       before           after         ratio
     [98652436]       [ecdb9304]
         73.9±1ms         75.0±2ms     1.02  BenchmarkOptimize.time_optimize(False, False, False)
         124±40ms         154±20ms    ~1.25  BenchmarkOptimize.time_optimize(False, False, True)
         73.8±1ms         74.9±2ms     1.01  BenchmarkOptimize.time_optimize(False, True, False)
         119±30ms         147±40ms    ~1.24  BenchmarkOptimize.time_optimize(False, True, True)
         92.0±1ms         93.2±1ms     1.01  BenchmarkOptimize.time_optimize(True, False, False)
        98.1±20ms        99.9±30ms     1.02  BenchmarkOptimize.time_optimize(True, False, True)
         91.2±1ms         91.8±2ms     1.01  BenchmarkOptimize.time_optimize(True, True, False)
         99.3±3ms        99.2±30ms     1.00  BenchmarkOptimize.time_optimize(True, True, True)
             197M             196M     1.00  IntegrationTwoDatasets.peakmem_optimize
       2.11±0.08s       2.20±0.05s     1.05  IntegrationTwoDatasets.time_optimize

Copy link
Member

@jsnel jsnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes (ecdb930) after last review (Changed implementation of relative_posix_path to use os.path.relpath ), reviewed as as ok.

Copy link
Member

@joernweissenborn joernweissenborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@jsnel jsnel merged commit 2d44c75 into glotaran:main Nov 18, 2021
@jsnel jsnel deleted the remove-file-fields branch November 18, 2021 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

♻️ Refactor Result and Scheme loading to to use 'file' fields
3 participants