Development November 2022 #64

SebieF · 2022-12-05T11:34:32Z

05.12.2022 - Version 0.2.1

Bug fixes

Fixing loss function not working on GPU (Fixing loss device handling #62) (Already merged)
Fixing incorrect metrics for classification task (Fixing metrics calculation in ClassificationSolver #63) (Already merged)
Fixing path to string for pretrained model (=> path is correctly saved in out.yml)

Features

Using device is now logged
Adding a sanity_checker.py that checks if the test results have some obvious problems (like only predicting a single
class) (wip)
Adding a limited_sample_size flag to train the model on a subset of all training ids. Makes it easy to check if the
model architecture is able to overfit on the training data
Adding metrics from best training iteration to out.yml file (to compare with test set performance)
Applying _validate_targets to all protocols in TargetManager
Added a Changelog file

Maintenance

Conversion dataset -> torch.tensor moved to embeddings.py
Storing training/validation/test ids is replaced with the amount of samples in the respective sets
Storing start and end time in a reproducible, readable format
Export of ConfigurationException via init.py file for consistency
Removing unnecessary double-loading of checkpoint for test evaluation
Adding typing to split lists in TargetManager

Improves dataset creation time (likely) and makes embedding handling easier

Numbers are good and easy to review for sanity checks, ids are not and are contained in the fasta file anyway at the moment

Fixup from interaction branch (29.11.22)

Check output_vars after run for obvious problems

This limits training data to a user defined value. Enables quick checking of the architecture and to see if the model is able to overfit

Best checkpoint was loaded in both, trainer and Solver, so it is best to keep it in the trainer because it removes the side effect and improves logging ordering

Makes it easier to see difference between training, validation and test for best epoch

The file inconsistencies apply in almost the same way to all protocols (missing pre-computed embeddings for example). Length check is of course only done for residue_to_x. Also fixing old flag name.

…cker

SebieF added bug Something isn't working enhancement New feature or request refactoring Code or standardization refactorings labels Dec 5, 2022

SebieF requested a review from sacdallago December 5, 2022 11:34

SebieF self-assigned this Dec 5, 2022

SebieF force-pushed the dev-11-22 branch from 8d81094 to b763155 Compare December 30, 2022 12:20

SebieF added 16 commits December 30, 2022 14:30

Adding device logging to trainer

ed058b2

Fixing path to string for pretrained model

9e085eb

Moving conversion dataset -> tensor to embeddings.py

ee12391

Improves dataset creation time (likely) and makes embedding handling easier

Replacing saving of training, validation, test ids by saving the numbers

f689292

Numbers are good and easy to review for sanity checks, ids are not and are contained in the fasta file anyway at the moment

Using absolute time for output vars instead of perf_counter time

210426d

Exporting ConfigurationException properly

f5e708d

Fixup from interaction branch (29.11.22)

Adding sanity checker (initial commit)

a795eb3

Check output_vars after run for obvious problems

Fixing metrics loading in sanity_checker check test results

5eff424

Adding limit sample size flag

aac1527

This limits training data to a user defined value. Enables quick checking of the architecture and to see if the model is able to overfit

Removing unnecessary double-loading of checkpoint for test evaluation

d05f83c

Best checkpoint was loaded in both, trainer and Solver, so it is best to keep it in the trainer because it removes the side effect and improves logging ordering

Adding metrics from best training iteration to out.yml file

75a2818

Makes it easier to see difference between training, validation and test for best epoch

Adding typing to split lists

097adaf

Applying _validate_targets to all protocols

33368bc

The file inconsistencies apply in almost the same way to all protocols (missing pre-computed embeddings for example). Length check is of course only done for residue_to_x. Also fixing old flag name.

Adding Changelog with versions 0.2.0 and 0.2.1

64960e9

Distinguishing between _to_class and _to_value protocol in sanity_che…

6ea5bf7

…cker

Making _handle_result private in sanity_checker

c3d94db

SebieF force-pushed the dev-11-22 branch from b763155 to c3d94db Compare December 30, 2022 13:32

sacdallago merged commit 8a9ded5 into sacdallago:main Jan 2, 2023

SebieF deleted the dev-11-22 branch January 2, 2023 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development November 2022 #64

Development November 2022 #64

SebieF commented Dec 5, 2022

Development November 2022 #64

Development November 2022 #64

Conversation

SebieF commented Dec 5, 2022

05.12.2022 - Version 0.2.1

Bug fixes

Features

Maintenance