-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Dataset Groups #851
Add Dataset Groups #851
Conversation
Benchmark is done. Checkout the benchmark result page. Benchmark diff v0.4.1 vs. mainParametrized benchmark signatures: BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)
Benchmark diff main vs. PRParametrized benchmark signatures: BenchmarkOptimize.time_optimize(index_dependent, grouped, weight)
|
Codecov Report
@@ Coverage Diff @@
## main #851 +/- ##
=======================================
+ Coverage 84.5% 84.7% +0.1%
=======================================
Files 79 81 +2
Lines 4522 4581 +59
Branches 826 848 +22
=======================================
+ Hits 3824 3882 +58
- Misses 556 558 +2
+ Partials 142 141 -1
Continue to review full report at Codecov.
|
7c352a1
to
bf3e413
Compare
🗑️ Removed arguments:
Those need to be properly deprecated! |
This is mainly a reminder that we need to properly deprecate the missing attributes.
56a2caf
to
158c822
Compare
in Scheme initialization
This bug led to result creation crashing, because of missing labels. Co-authored-by: Jörn Weißenborn <[email protected]>
It was nice to see how much time and memory the result creation needed compared to the whole optimization, but loading a pickeled OptimizeResult wasn't nice from the start. Whith the changes in this PR and the overhead to keep benchamarks working across versions, IMHO the extra information about result creation details isn't worth it.
@joernweissenborn please add tests for cc4beb9 yourself ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some changes are required for this PR to be merged, others can be postponed to additional cleanup PRs.
Mandatory changes are:
- The deprecations and their tests need to be fixed.
- The
DatasetGroupIndexModel
naming issue - More helpful error message for wrong
residual_function
Co-authored-by: Sebastian Weigand <[email protected]>
Sourcery Code Quality Report❌ Merging this PR will decrease code quality in the affected files by 3.43%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some fix commits and from my side, this is ok to merge.
@joernweissenborn and @jsnel Please have a look at my changes.
lgtm, hit that merge button :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Some minor issues to be picked up in a subsequent cleanup / re-naming things PR.
Gist
This PR adds the feature to have multiple dataset groups in a model. This allows for e.g. using non-negative least squares (nnls) on one (group of) dataset(s) and variable projection (varpro) on another (group of) dataset(s) within the same optimization. It is also now possible, to only link the CLP of certain datasets instead of having to link all datasets.
Dataset Groups
This is achieved by introducing a new level of indirection called dataset groups. There are 2 new objects:
glotaran.model.DatasetGroupModel
represents basically an entry in the model spec. It tells whether to use nnls and whether to link the clp. Datasets refer to a group model with the newgroup
property. This defaults todefault
, a group which will be always added if it is not specified/overridden.glotaran.model.DatasetGroup
contains aDatasetGroupModel
and allDatasetModels
beloging to that group. This groups are generate by the model viaModel.get_dataset_groups
and are fed into optimization, see below for details.Changes to
glotaran.analysis
To adapt the analysis package to work with dataset groups, major refactor of the
Problem
class was necessary.Summary:
Problem
has been renamed toOptimizationGroup
glotaran.analysis.OptimizationGroupCalculator
has been added to separate out actual calculation of linked/unlinked of the common structure (matrices, residuals, clp...)Example
Imagine a model like so
This 4 datasets are currently linked in a way that you either use NNLS on all of them or none. The same goes with linking CLP, either all get linked or none.
With this PR one can do the following:
This means that d1 to d3 will be linked and the residual will be calculated with varpro (default). d4 doesn't specify a group, thus is part of 'default' which we override to use nnls instead of varpro.
Deprecation summary
Scheme
--
group
deprecated-> now part of models dataset groups--
group_tolerance
-> renamed toclp_link_tolerance
Problem
-> replaced byOptimzationGroup
-- This was never public API
Checklist
Closes issues
closes #XXXX