Add Dataset Groups #851

joernweissenborn · 2021-10-08T19:36:31Z

Gist

This PR adds the feature to have multiple dataset groups in a model. This allows for e.g. using non-negative least squares (nnls) on one (group of) dataset(s) and variable projection (varpro) on another (group of) dataset(s) within the same optimization. It is also now possible, to only link the CLP of certain datasets instead of having to link all datasets.

Dataset Groups

This is achieved by introducing a new level of indirection called dataset groups. There are 2 new objects:

glotaran.model.DatasetGroupModel represents basically an entry in the model spec. It tells whether to use nnls and whether to link the clp. Datasets refer to a group model with the new group property. This defaults to default, a group which will be always added if it is not specified/overridden.

glotaran.model.DatasetGroup contains a DatasetGroupModel and all DatasetModels beloging to that group. This groups are generate by the model via Model.get_dataset_groups and are fed into optimization, see below for details.

Changes to `glotaran.analysis`

To adapt the analysis package to work with dataset groups, major refactor of the Problem class was necessary.

Summary:

Problem has been renamed to OptimizationGroup
A new class glotaran.analysis.OptimizationGroupCalculator has been added to separate out actual calculation of linked/unlinked of the common structure (matrices, residuals, clp...)
The optimization works now on N optimization groups instead of 1 problem, so various things needed to be adapted (like parameter history is now a variable instead of member of the optimization group

Example

Imagine a model like so

...
dataset:
  d1:
    ... 
  d2:
    ...  
  d3:
    ...  
  d4:
    ...

This 4 datasets are currently linked in a way that you either use NNLS on all of them or none. The same goes with linking CLP, either all get linked or none.

With this PR one can do the following:

...
dataset_groups:
  default:
    residual_function: non_negative_least_squares
  mygroup:
    link_clp: true
dataset:
  d1:
    group: mygroup
    ... 
  d2:
    group: mygroup
    ...  
  d3:
    group: mygroup
    ...  
  d4:
    ...

This means that d1 to d3 will be linked and the residual will be calculated with varpro (default). d4 doesn't specify a group, thus is part of 'default' which we override to use nnls instead of varpro.

Deprecation summary

Scheme
-- group deprecated-> now part of models dataset groups
-- group_tolerance -> renamed to clp_link_tolerance
Problem -> replaced by OptimzationGroup
-- This was never public API

Checklist

✔️ Passing the tests (mandatory for all PR's)
👌 Closes issue (mandatory for ✨ feature and 🩹 bug fix PR's)
🧪 Adds new tests for the feature (mandatory for ✨ feature and 🩹 bug fix PR's)
📚 Adds documentation of the feature

Closes issues

closes #XXXX

github-actions · 2021-10-08T19:36:45Z

👈 Launch a binder notebook on branch joernweissenborn/pyglotaran/feature/datasetgroups

github-actions · 2021-10-08T19:41:04Z

Benchmark is done. Checkout the benchmark result page.
Benchmark differences below 5% might be due to CI noise.

Benchmark diff v0.4.1 vs. main

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [21ba272a]       [c24d1438]
     <v0.4.1>                   
         67.7±1ms         73.1±3ms     1.08  BenchmarkOptimize.time_optimize(False, False, False)
-         419±2ms         92.4±7ms     0.22  BenchmarkOptimize.time_optimize(False, False, True)
-        93.3±1ms         76.2±1ms     0.82  BenchmarkOptimize.time_optimize(False, True, False)
         95.5±1ms        94.9±20ms     0.99  BenchmarkOptimize.time_optimize(False, True, True)
+        69.8±2ms         92.6±2ms     1.33  BenchmarkOptimize.time_optimize(True, False, False)
-         421±3ms         99.9±3ms     0.24  BenchmarkOptimize.time_optimize(True, False, True)
       93.9±0.8ms         93.0±3ms     0.99  BenchmarkOptimize.time_optimize(True, True, False)
         92.5±1ms          105±4ms    ~1.14  BenchmarkOptimize.time_optimize(True, True, True)
             195M             192M     0.98  IntegrationTwoDatasets.peakmem_optimize
        5.62±0.2s        2.01±0.1s    ~0.36  IntegrationTwoDatasets.time_optimize

Benchmark diff main vs. PR

Parametrized benchmark signatures:

BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)


All benchmarks:

       before           after         ratio
     [d1e36a93]       [c24d1438]
+      41.3±0.6ms         73.1±3ms     1.77  BenchmarkOptimize.time_optimize(False, False, False)
         53.2±1ms         92.4±7ms    ~1.74  BenchmarkOptimize.time_optimize(False, False, True)
         66.4±2ms         76.2±1ms    ~1.15  BenchmarkOptimize.time_optimize(False, True, False)
         79.5±9ms        94.9±20ms    ~1.19  BenchmarkOptimize.time_optimize(False, True, True)
+        53.9±2ms         92.6±2ms     1.72  BenchmarkOptimize.time_optimize(True, False, False)
       59.6±0.6ms         99.9±3ms    ~1.68  BenchmarkOptimize.time_optimize(True, False, True)
         83.1±4ms         93.0±3ms    ~1.12  BenchmarkOptimize.time_optimize(True, True, False)
        95.2±40ms          105±4ms    ~1.11  BenchmarkOptimize.time_optimize(True, True, True)
             192M             192M     1.00  IntegrationTwoDatasets.peakmem_optimize
       1.86±0.05s        2.01±0.1s     1.08  IntegrationTwoDatasets.time_optimize

codecov · 2021-10-09T13:20:59Z

Codecov Report

Merging #851 (c24d143) into main (d1e36a9) will increase coverage by 0.1%.
The diff coverage is 93.1%.

@@           Coverage Diff           @@
##            main    #851     +/-   ##
=======================================
+ Coverage   84.5%   84.7%   +0.1%     
=======================================
  Files         79      81      +2     
  Lines       4522    4581     +59     
  Branches     826     848     +22     
=======================================
+ Hits        3824    3882     +58     
- Misses       556     558      +2     
+ Partials     142     141      -1

Impacted Files	Coverage Δ
glotaran/analysis/optimization_group.py	`87.6% <78.2%> (ø)`
glotaran/model/model.py	`84.9% <92.3%> (+1.7%)`	⬆️
glotaran/analysis/optimize.py	`86.4% <92.8%> (+1.4%)`	⬆️
glotaran/project/scheme.py	`92.5% <93.1%> (+1.7%)`	⬆️
...analysis/optimization_group_calculator_unlinked.py	`94.3% <96.5%> (ø)`
...n/analysis/optimization_group_calculator_linked.py	`96.5% <96.7%> (ø)`
glotaran/analysis/optimization_group_calculator.py	`100.0% <100.0%> (ø)`
glotaran/model/__init__.py	`100.0% <100.0%> (ø)`
glotaran/model/dataset_group.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d1e36a9...c24d143. Read the comment docs.

s-weigand · 2021-10-13T19:23:49Z

🗑️ Removed arguments:

Scheme.non_negative_least_squares
Scheme.group_tolerance

Those need to be properly deprecated!

This is mainly a reminder that we need to properly deprecate the missing attributes.

in Scheme initialization

This bug led to result creation crashing, because of missing labels. Co-authored-by: Jörn Weißenborn <[email protected]>

It was nice to see how much time and memory the result creation needed compared to the whole optimization, but loading a pickeled OptimizeResult wasn't nice from the start. Whith the changes in this PR and the overhead to keep benchamarks working across versions, IMHO the extra information about result creation details isn't worth it.

s-weigand · 2021-10-15T00:04:36Z

@joernweissenborn please add tests for cc4beb9 yourself (#clean-up-your-own-code-booboo 😛 ).

glotaran/analysis/optimization_group.py

glotaran/analysis/test/test_optimization_group.py

s-weigand

Some changes are required for this PR to be merged, others can be postponed to additional cleanup PRs.

Mandatory changes are:

The deprecations and their tests need to be fixed.
The DatasetGroupIndexModel naming issue
More helpful error message for wrong residual_function

glotaran/analysis/optimization_group.py

glotaran/analysis/optimization_group_calculator.py

glotaran/analysis/optimization_group_calculator_linked.py

glotaran/analysis/optimize.py

glotaran/analysis/test/test_optimization_group.py

glotaran/analysis/test/test_penalties.py

glotaran/project/scheme.py

glotaran/project/test/test_scheme.py

Co-authored-by: Sebastian Weigand <[email protected]>

sourcery-ai · 2021-10-18T08:51:43Z

Sourcery Code Quality Report

❌ Merging this PR will decrease code quality in the affected files by 3.43%.

Quality metrics	Before	After	Change
Complexity	3.95 ⭐	4.15 ⭐	0.20 👎
Method Length	74.56 🙂	87.99 🙂	13.43 👎
Working memory	9.10 🙂	9.53 🙂	0.43 👎
Quality	64.90% 🙂	61.47% 🙂	-3.43% 👎

Other metrics	Before	After	Change
Lines	4009	2642	-1367

Changed files	Quality Before	Quality After	Quality Change
benchmark/benchmarks/integration/ex_two_datasets/benchmark.py	85.49% ⭐	89.66% ⭐	4.17% 👍
glotaran/analysis/optimize.py	46.86% 😞	39.79% 😞	-7.07% 👎
glotaran/analysis/test/test_constraints.py	57.15% 🙂	57.48% 🙂	0.33% 👍
glotaran/analysis/test/test_grouping.py	50.01% 🙂	49.67% 😞	-0.34% 👎
glotaran/analysis/test/test_optimization.py	34.60% 😞	34.53% 😞	-0.07% 👎
glotaran/analysis/test/test_penalties.py	64.87% 🙂	67.27% 🙂	2.40% 👍
glotaran/analysis/test/test_relations.py	57.85% 🙂	58.25% 🙂	0.40% 👍
glotaran/builtin/io/yml/test/test_save_model.py	94.31% ⭐	94.31% ⭐	0.00%
glotaran/builtin/io/yml/test/test_save_scheme.py	78.73% ⭐	78.73% ⭐	0.00%
glotaran/builtin/megacomplexes/decay/test/test_decay_megacomplex.py	59.16% 🙂	59.54% 🙂	0.38% 👍
glotaran/deprecation/modules/test/init.py	80.32% ⭐	75.41% ⭐	-4.91% 👎
glotaran/deprecation/modules/test/test_project_scheme.py	71.23% 🙂	75.33% ⭐	4.10% 👍
glotaran/model/init.py	%	%	%
glotaran/model/model.py	73.26% 🙂	72.40% 🙂	-0.86% 👎
glotaran/model/test/test_model.py	70.89% 🙂	72.38% 🙂	1.49% 👍
glotaran/project/scheme.py	89.88% ⭐	72.83% 🙂	-17.05% 👎
glotaran/project/test/test_scheme.py	80.82% ⭐	80.83% ⭐	0.01% 👍
glotaran/test/test_spectral_decay.py	51.45% 🙂	53.88% 🙂	2.43% 👍
glotaran/test/test_spectral_decay_full_model.py	55.21% 🙂	55.92% 🙂	0.71% 👍
glotaran/test/test_spectral_penalties.py	46.60% 😞	45.01% 😞	-1.59% 👎

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
glotaran/analysis/optimize.py	_create_result	17 🙂	279 ⛔	26 ⛔	23.55% ⛔	Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/test/test_optimization.py	test_optimization	19 😞	514 ⛔	15 😞	25.11% 😞	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
glotaran/test/test_spectral_penalties.py	test_equal_area_penalties	5 ⭐	783 ⛔	17 ⛔	34.81% 😞	Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/optimize.py	optimize	5 ⭐	163 😞	17 ⛔	46.65% 😞	Try splitting into smaller methods. Extract out complex expressions
glotaran/analysis/test/test_grouping.py	test_multi_dataset_overlap	0 ⭐	485 ⛔	12 😞	46.67% 😞	Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

sonarcloud · 2021-10-18T08:51:55Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
5 Code Smells

No Coverage information
1.5% Duplication

s-weigand

I added some fix commits and from my side, this is ok to merge.
@joernweissenborn and @jsnel Please have a look at my changes.

joernweissenborn · 2021-10-18T14:44:41Z

lgtm, hit that merge button :)

glotaran/analysis/optimization_group_calculator_linked.py

jsnel

Approved. Some minor issues to be picked up in a subsequent cleanup / re-naming things PR.

joernweissenborn requested review from jsnel, s-weigand and a team as code owners October 8, 2021 19:36

sourcery-ai bot mentioned this pull request Oct 9, 2021

Add Dataset Groups (Sourcery refactored) #852

Closed

joernweissenborn added 3 commits October 13, 2021 18:26

Added DatasetGroup and DatasetGroupModel.

ececb8e

Replaced Problem with Optimization Groups and Calculators.

7238d32

Removed non-negative-least-squares option from scheme.

bf3e413

joernweissenborn force-pushed the feature/datasetgroups branch from 7c352a1 to bf3e413 Compare October 13, 2021 16:27

🩹 Reactivated skipped test and fixed failing code

158c822

This is mainly a reminder that we need to properly deprecate the missing attributes.

s-weigand force-pushed the feature/datasetgroups branch from 56a2caf to 158c822 Compare October 13, 2021 19:26

This was referenced Oct 14, 2021

🧹 Remove Problem class usage since it isn't part of the official API glotaran/pyglotaran-examples#48

Merged

📚 Convention to differentiate between user and internal API #859

Closed

s-weigand and others added 3 commits October 15, 2021 01:11

👌🗑️ Added back non_negative_least_squares and deprecated group_tolerance

f96af6d

in Scheme initialization

🩹 Fixed optimaization groups using datasets outside of their groups

cc4beb9

This bug led to result creation crashing, because of missing labels. Co-authored-by: Jörn Weißenborn <[email protected]>

joernweissenborn added 4 commits October 15, 2021 17:15

Added test for multiple groups

a5f9e11

Renamed OptimizationGroupCalculators

eee6882

Adressed Codacity issues

4559277

Deprecated scheme.non_negative_least_squares

60a7565

joernweissenborn mentioned this pull request Oct 17, 2021

🧹📚 Unify docstring style #871

Closed

joernweissenborn commented Oct 17, 2021

View reviewed changes

glotaran/analysis/optimization_group.py Show resolved Hide resolved

joernweissenborn commented Oct 17, 2021

View reviewed changes

glotaran/analysis/test/test_optimization_group.py Outdated Show resolved Hide resolved

s-weigand requested changes Oct 17, 2021

View reviewed changes

joernweissenborn and others added 2 commits October 17, 2021 21:13

Apply suggestions from @s-weigand code review

83d986a

Co-authored-by: Sebastian Weigand <[email protected]>

🗑️ Deprecated Scheme args group and non_negative_least_squares

09cd25c

s-weigand and others added 4 commits October 18, 2021 09:44

♻️ Renamed 'DatasetGroupIndexModel' -> 'DatasetIndexModelGroup'

ca11925

♻️ Changed ChainMap in favor of more readable dict updating

963145c

♻️ Refactored by Sourcery

872adbc

♻️ Renamed 'test_problem' -> 'test_optimization_group'

c24d143

s-weigand approved these changes Oct 18, 2021

View reviewed changes

jsnel reviewed Oct 18, 2021

View reviewed changes

glotaran/analysis/optimization_group_calculator_linked.py Show resolved Hide resolved

jsnel reviewed Oct 18, 2021

View reviewed changes

glotaran/analysis/optimization_group_calculator_linked.py Show resolved Hide resolved

jsnel reviewed Oct 18, 2021

View reviewed changes

glotaran/analysis/optimization_group_calculator_linked.py Show resolved Hide resolved

jsnel reviewed Oct 18, 2021

View reviewed changes

glotaran/analysis/optimization_group_calculator_linked.py Show resolved Hide resolved

jsnel approved these changes Oct 18, 2021

View reviewed changes

jsnel merged commit 385006c into glotaran:main Oct 18, 2021

This was referenced Oct 18, 2021

♻️♻️♻️ (Re)naming things #872

Closed

[WIP] Feature Projects #769

Closed

{dontmerge]Scraping result #823

Closed

❌ [TEST] Refactor/refactor project after 807 #825

Closed

❌ [TEST-PR] Wip/projects against d2b297 #829

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dataset Groups #851

Add Dataset Groups #851

joernweissenborn commented Oct 8, 2021 •

edited by jsnel

Loading

github-actions bot commented Oct 8, 2021

github-actions bot commented Oct 8, 2021 •

edited

Loading

codecov bot commented Oct 9, 2021 •

edited

Loading

s-weigand commented Oct 13, 2021

s-weigand commented Oct 15, 2021

s-weigand left a comment

sourcery-ai bot commented Oct 18, 2021

sonarcloud bot commented Oct 18, 2021

s-weigand left a comment

joernweissenborn commented Oct 18, 2021

jsnel left a comment

Add Dataset Groups #851

Add Dataset Groups #851

Conversation

joernweissenborn commented Oct 8, 2021 • edited by jsnel Loading

Gist

Dataset Groups

Changes to glotaran.analysis

Example

Deprecation summary

Checklist

Closes issues

github-actions bot commented Oct 8, 2021

github-actions bot commented Oct 8, 2021 • edited Loading

codecov bot commented Oct 9, 2021 • edited Loading

Codecov Report

s-weigand commented Oct 13, 2021

s-weigand commented Oct 15, 2021

s-weigand left a comment

Choose a reason for hiding this comment

sourcery-ai bot commented Oct 18, 2021

Sourcery Code Quality Report

Legend and Explanation

sonarcloud bot commented Oct 18, 2021

s-weigand left a comment

Choose a reason for hiding this comment

joernweissenborn commented Oct 18, 2021

jsnel left a comment

Choose a reason for hiding this comment

joernweissenborn commented Oct 8, 2021 •

edited by jsnel

Loading

Changes to `glotaran.analysis`

github-actions bot commented Oct 8, 2021 •

edited

Loading

codecov bot commented Oct 9, 2021 •

edited

Loading