👌 Improve Project API data handling #1257

s-weigand · 2023-02-24T14:02:42Z

This PR improves the data handling of the Project API in notebook workflows.

Since notebooks are often executed as a whole (Run All) the default of ignore_existing=False in project.import_data leads to using ignore_existing=True each time calling it in an actual case study, which makes the code less readable and adds a lot of redundancy and a default of ignore_existing=True is more sensible (IMHO ignore_existing=False only makes sense in a CLI or GUI context).

Having to import each dataset one by one also clutters the code quite a lot.

project.import_data("measured_data/Npq2_220219_800target3fasea.ascii", dataset_name="TA")
project.import_data(guide_s1, dataset_name="guide_s1")
project.import_data(guide_s2, dataset_name="guide_s2")
project.import_data(guide_s3, dataset_name="guide_s3")
project.import_data(guide_s4, dataset_name="guide_s4")
project.import_data(guide_s5, dataset_name="guide_s5")
project.import_data(guide_s6, dataset_name="guide_s6")
project.import_data(guide_s7, dataset_name="guide_s7")
project.import_data(guide_s8, dataset_name="guide_s8")

This is why it is a lot more convenient to allow the use of a mapping (especially since that mapping could have been defined and used for plotting even before importing glotaran at all)

my_datasets ={
    "TA":"measured_data/Npq2_220219_800target3fasea.ascii"
    "guide_s1":guide_s1,
    "guide_s2":guide_s2,
    "guide_s3":guide_s3,
    "guide_s4":guide_s4,
    "guide_s5":guide_s5,
    "guide_s6":guide_s6,
    "guide_s7":guide_s7,
    "guide_s8":guide_s8,
}
project.import_data(my_datasets)

Lastly, users might want to switch out datasets in the optimization without touching the model definition for example to use an averaged dataset to have a quicker feedback loop or to use some other kind of preprocessing/correcting on the data and compare results with the exact same model.

project.optimize("my_model", "my_parameters", data_lookup_overwrite={"TA": averaged_data})

Change summary

Checklist

✔️ Passing the tests (mandatory for all PR's)
🚧 Added changes to changelog (mandatory for all PR's)
🧪 Adds new tests for the feature (mandatory for ✨ feature and 🩹 bug fix PR's)

This allow using a notbook workflow w/o cluttering it with 'ignore_existing=True' all over the place

…a are not overwritten

sourcery-ai · 2023-02-24T14:02:55Z

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 0.04%.

Quality metrics	Before	After	Change
Complexity	1.01 ⭐	1.01 ⭐	0.00
Method Length	72.29 🙂	74.77 🙂	2.48 👎
Working memory	6.41 🙂	6.44 🙂	0.03 👎
Quality	75.28% ⭐	75.24% ⭐	-0.04% 👎

Other metrics	Before	After	Change
Lines	1510	1597	87

Changed files	Quality Before	Quality After	Quality Change
glotaran/project/project.py	81.67% ⭐	80.48% ⭐	-1.19% 👎
glotaran/project/project_data_registry.py	61.94% 🙂	61.62% 🙂	-0.32% 👎
glotaran/project/test/test_project.py	73.86% 🙂	74.18% 🙂	0.32% 👍

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
glotaran/project/test/test_project.py	test_missing_file_errors	0 ⭐	377 ⛔	8 🙂	55.66% 🙂	Try splitting into smaller methods
glotaran/project/project_data_registry.py	ProjectDataRegistry.import_data	9 🙂	126 😞	10 😞	56.86% 🙂	Try splitting into smaller methods. Extract out complex expressions
glotaran/project/test/test_project.py	test_generate_parameters	3 ⭐	226 ⛔	8 🙂	57.90% 🙂	Try splitting into smaller methods
glotaran/project/test/test_project.py	test_generators_allow_overwrite	0 ⭐	154 😞	9 🙂	64.56% 🙂	Try splitting into smaller methods
glotaran/project/project.py	Project.markdown	0 ⭐	74 🙂	14 😞	66.14% 🙂	Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

github-actions · 2023-02-24T14:02:58Z

👈 Launch a binder notebook on branch s-weigand/pyglotaran/improve-project-data-handeling

github-actions · 2023-02-24T14:05:29Z

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff v0.6.0 vs. main

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [6c3c390e]       [44e9e2ac]
     <v0.6.0>                   
!      41.9±0.1ms           failed      n/a  BenchmarkOptimize.time_optimize(False, False, False)
!      45.0±0.2ms           failed      n/a  BenchmarkOptimize.time_optimize(False, False, True)
!      42.1±0.1ms           failed      n/a  BenchmarkOptimize.time_optimize(False, True, False)
!      45.0±0.3ms           failed      n/a  BenchmarkOptimize.time_optimize(False, True, True)
!      52.1±0.3ms           failed      n/a  BenchmarkOptimize.time_optimize(True, False, False)
!       78.0±30ms           failed      n/a  BenchmarkOptimize.time_optimize(True, False, True)
!        52.8±2ms           failed      n/a  BenchmarkOptimize.time_optimize(True, True, False)
!       61.0±20ms           failed      n/a  BenchmarkOptimize.time_optimize(True, True, True)
             203M             205M     1.01  IntegrationTwoDatasets.peakmem_optimize
-      1.92±0.05s       1.00±0.02s     0.52  IntegrationTwoDatasets.time_optimize

Benchmark diff main vs. PR

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [44e9e2ac]       [0aaac2ed]
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, False, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, False, True)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, True, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(False, True, True)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, False, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, False, True)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, True, False)
           failed           failed      n/a  BenchmarkOptimize.time_optimize(True, True, True)
             205M             206M     1.00  IntegrationTwoDatasets.peakmem_optimize
       1.00±0.02s         987±30ms     0.99  IntegrationTwoDatasets.time_optimize

codecov · 2023-02-24T14:09:12Z

Codecov Report

Base: 88.3% // Head: 88.3% // Increases project coverage by +0.0% 🎉

Coverage data is based on head (0aaac2e) compared to base (44e9e2a).
Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files

@@          Coverage Diff          @@
##            main   #1257   +/-   ##
=====================================
  Coverage   88.3%   88.3%           
=====================================
  Files        104     104           
  Lines       5094    5100    +6     
  Branches     848     852    +4     
=====================================
+ Hits        4499    4505    +6     
  Misses       478     478           
  Partials     117     117

Impacted Files	Coverage Δ
glotaran/project/project.py	`98.7% <100.0%> (+<0.1%)`	⬆️
glotaran/project/project_data_registry.py	`100.0% <100.0%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

jsnel

Some suggestions for variable name improvements and improved help string.

glotaran/project/project.py

glotaran/project/test/test_project.py

glotaran/project/project.py

jsnel

Found some missing verbs

glotaran/project/project.py

Co-authored-by: Joris Snellenburg <[email protected]>

sonarqubecloud · 2023-02-24T22:00:12Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

jsnel

LGTM now!

This change refactors the case studies to use the project API and did lead to some[ improvements in the project API itself](glotaran/pyglotaran#1257). In addition, it cleans up the model and parameter files as well as notebooks. Further improvements like using glotaran generated clp guides and explaining the smoothing due `clp_relations` will follow in a different PR. ### Change summary - [🧹 Add project anchor files to git ignore](75fd1ce) - [♻️ Moved original data files from data subfolder to measured_data](2ec5a1c) - [🧹 Added data folder to gitignore since it gets populated by import_data](467fd91) - [♻️ Converted target_rc to project API](0701661) - [♻️ Converted target_rcg_compare to project API](01ba768) - [♻️ Converted target_rcg_gcrcg_rcgcr_refine to project API](49b92c9) - [♻️ Moved original data 4TT files from data subfolder to measured_data](bd06310) - [👌 Added clp_relations for smoothong of generated guidance spectra](cc60ac0) - [🧹 Cleaned rcg_refine models and parameters from unused code](524af94) - [🧹 Removed dead code comments in target_rc model](a88cd8c) - [👌 Use result create_clp_guide_dataset method instead of function](84fa1ed) - [👌 Show cleaned up model and parameter definition](7bdda07) - [♻️ Converted 4TT to project API](89ac3a0) - [♻️ Moved original data dPSI files from data subfolder to measured_data](85ce095) - [🧹 Ignore data folders in general since they get populated by import_data](953dfa5) - [♻️ Converted dPSI to project API](73816e0) - [🧹 Make use of improved project API](b434331) - [🧹 Cleaned up rc model and parameters](079057b) - [🧹 Cleaned up target_rcg_gcrcg_rcgcr_refine model and parameters](621113c) - [🧹 Cleaned up 4TT models and parameters](f93712f) - [🧹 Cleaned up dPSI model](f0deea3) - [🧹 Cleaned up rc notebook](3ad4b6c) - [🔧 Ignore pdfs folder when running pytest](2a08eee) --------- Co-authored-by: Ivo van Stokkum <[email protected]>

s-weigand added 4 commits February 24, 2023 00:14

👌 Change default of ignore_existing in import_data to True

7b2157d

This allow using a notbook workflow w/o cluttering it with 'ignore_existing=True' all over the place

👌 Allow importing data from a mapping

d4293f7

👌 Add data_overwrite option to Result.optimize and Result.create_scheme

6b6a35d

👌 Changed data_overwrite to data_lookup_overwrite to clarify that dat…

154afa8

…a are not overwritten

s-weigand added this to the v0.7.0 milestone Feb 24, 2023

s-weigand requested a review from jsnel as a code owner February 24, 2023 14:02

🚧📚 Added change to changelog

d29333e

s-weigand marked this pull request as draft February 24, 2023 19:21

🩹 Fix allow_overwrite=True not having any effect if ignore_existing=True

16df399

s-weigand marked this pull request as ready for review February 24, 2023 20:28

🧹 Renamed data_lookup_overwrite to data_lookup_override

5e7a47d

jsnel requested changes Feb 24, 2023

View reviewed changes

glotaran/project/project.py Outdated Show resolved Hide resolved

glotaran/project/project.py Outdated Show resolved Hide resolved

👌 Apply suggestions from code review

0aaac2e

Co-authored-by: Joris Snellenburg <[email protected]>

s-weigand force-pushed the improve-project-data-handeling branch from 3e2b870 to 0aaac2e Compare February 24, 2023 21:59

jsnel approved these changes Feb 24, 2023

View reviewed changes

s-weigand merged commit ad8f53e into glotaran:main Feb 24, 2023

s-weigand deleted the improve-project-data-handeling branch February 24, 2023 23:00

s-weigand mentioned this pull request Feb 27, 2023

♻️ Refactor with project API glotaran/pyglotaran-release-paper-supplementary-information#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

👌 Improve Project API data handling #1257

👌 Improve Project API data handling #1257

s-weigand commented Feb 24, 2023 •

edited

Loading

sourcery-ai bot commented Feb 24, 2023 •

edited

Loading

github-actions bot commented Feb 24, 2023

github-actions bot commented Feb 24, 2023 •

edited

Loading

codecov bot commented Feb 24, 2023 •

edited

Loading

jsnel left a comment

jsnel left a comment

sonarqubecloud bot commented Feb 24, 2023

jsnel left a comment

👌 Improve Project API data handling #1257

👌 Improve Project API data handling #1257

Conversation

s-weigand commented Feb 24, 2023 • edited Loading

Change summary

Checklist

sourcery-ai bot commented Feb 24, 2023 • edited Loading

Sourcery Code Quality Report

Legend and Explanation

github-actions bot commented Feb 24, 2023

github-actions bot commented Feb 24, 2023 • edited Loading

codecov bot commented Feb 24, 2023 • edited Loading

Codecov Report

jsnel left a comment

Choose a reason for hiding this comment

jsnel left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Feb 24, 2023

jsnel left a comment

Choose a reason for hiding this comment

s-weigand commented Feb 24, 2023 •

edited

Loading

sourcery-ai bot commented Feb 24, 2023 •

edited

Loading

github-actions bot commented Feb 24, 2023 •

edited

Loading

codecov bot commented Feb 24, 2023 •

edited

Loading