Dev #60

tlarcher · 2024-09-06T20:34:50Z

📝 Changelog

Major

Added GLC24 pre_extracted habitat dataset and example (see PR 58 in the Links section)
Changed the way checkpoints are loaded from loading the state_dict of the model object to loading the state_dict of the LightningModule. This is a breaking change as examples needed to be updated by removing the replacement of "model." string in the loaded state_dict.
Added possibility to download model weights for any Malpolon model given a URL and a few file paths
Updated the way checkpoint_path is passed on to models. Added an attribute checkpoint_path for all Malpolon models
- Updated every examples consequently
Added Malpolon as (local) model provider.
- Created new module malpolon.models.custom_models which will host custom models proposed by Malpolon
- Split classes from geolifeclef2024_multimodal_ensemble.py to glc2024_multimodal_ensemble_model.py and glc2024_pre_extracted_prediction_system.py in custom_models to prevent circular import from malpolon.models.model_builder after adding Malpolon as (local) provider

Minor

Updated malpolon.data.data_module.export_predict_csv to enable more flexibility when outputting the prediction CSV for a single data point.

Examples

Added GLC24 pre-extracted examples (habitat and species) using the MultiModalEnsemble (MME) model
- Automatic download of the dataset from Kaggle (depending on the value of boolean config parameter data.download_data)
- Automatic download of the model weights from Seafile if not already on disk, via a new model.model_kwargs.pretrained key in the config file. The weights enable users to directly run our MME model on our GLC24_pre_extracted Test set and reach ~30% micro F1-score with ~26% micro precision and ~36% micro Recall, as well as ~96% micro AuC.

Tests

Added and updated unit tests for GLC24 pre-extracted examples (habitat and species)

🔗 Links

🔗 Glc24 habitat modeling #58
🔗 Model weights download #56

✅ Checklist

Lint and tests pass locally with my changes
I've added necessary documentation

* Changed custom models location in new module 'malpolon.models.custom_models'. This includes glc24 pre_extracted MME model and multi_modal.py. For MME: classificationsystem and nn module have been split in 2 files to allow calling MME from model_builder without triggering a circular import through check_model. Updated examples consequently. * Fix: state_dict altered during training. - state_dict contains a loss parameter pos_weight as key loss.pos_weight. This key is created when the loss is instantiated by GenericPredictionSystem. However, this loss parameter was accessed and modified during the _step() process, which also alters the state_dict. Consequently, when loading the model by its checkpoint, there would be a value mismatch and the model would not load to resume training. This has been fixed by restoring the initial value of the loss parameter within the _step() function before the return statement. - 'positive_weigh_factor' model hyperparameter has been deleted and replaced by loss parameter 'pos_weight', which achieves the same purpose. In the config file, 'positive_weigh_factor' model key has been substituted for subkey 'pos_weight' nested under 'loss_kwargs' nested in the optimizer section * Cleaned remainings of previous commit testing * Added download weight option for all classification system and updated checkopoint_path call for MME example * Fixed wrong checkpoint_path path initialization behavior. - glc24_cnn_multimodal_ensemble: updated example config file and main script to new checkpoint_path behavior, in both training and inference runs - standard_prediction_systems.py: Fixed wrong checkpoint_path path initialization behavior - glc2024_pre_extracted_prediction_system.py: added missing checkpoint_path argument and removed checkpoint_path setter as it is carried out by GenericPredictionSystem * Updated example cnn_on_rgbnir_torchgeo following checkpoint_path update * Updated example cnn_on_rgbnir_concat following checkpoint_path update * Updated example cnn_on_rgbnir_glc23_patches following checkpoint_path update * Reset yaml file glc23 example * Fixed wrong variable assignment in exmaples micro_geolifeclef2022/cnn_on_rgb_nir_patches and micro_geolifeclef2022/cnn_on_rgb_patches * Added predict run part in example geolifeclef2022/cnn_on_rgb_patches and updated main script following checkpoint_path update. - data_module: Added more flexibility for predictions without targets - geolifeclef2022 dataset: Added default -1 value for targets in predict mode to comply with standard_prediction_system predict() method * Updated glc22 and microglc22 examples following checkpoint_path update, and added inference part in the run section for those which didn't have one. Added input argument in custom GLC22 datamodules + model output in prediction mode, to such extent. * Updated CIFAR-10 example following checkpoint_path update * Updated all inference examples following checkpoint_path update * Removed duplicate import * Updated code docstrings * Fixed task value from binary to multilabel (doesn't change behavior) * Added 'malpolon' as model providers. - model_builder: Added provider method and created new dictionary with model names as keys, and local imports of models as values - data_module: Added posisblity of applying no activation function when running inference, so as to output the model's logits. Enhanced CSV export method's info prints. - glc2024_multimodal_ensemble_model: Added new init argument and class attribute 'pretrained' which the datmaodule uses to determine whether to download pretrained weights (formerly: a standalone 'weights_download' variable was used by the datamodule). Added docstrings. - glc2024_pre_extracted_prediction_system: Changed handling behavior of the model's loss during '_step()' to prevent overwritting the loss parameter during training which resulted in a de-synchronization of the state_dcit() before and after running the model (since loss parameters are automatically added as learnable parameters) - glc24_cnn_multimodal_ensemble.yaml: Updated config file accordingly. Cleaned config file with correct values. - glc24_cnn_multimodal_ensemble.py: Updated MME main srcipt accordingly. Changed activation function of inference run from softmax() to sigmoid() * Updated glc22 tests following class getter changes * Removed commented dict

* Corrected docstring * Improved script splitting csv obs by species frequency by adding callable arguments, reducing computation time, adding comments, making it more generic. Renamed the script to split_obs_per_column_frequency.py * Fixed unwanted behavior and further improved split_obs_per_column_frequency.py * Fixed output test name syntax being different from the other splits * Renamed or deleted files * Added inference metrics evaluation scripts and output files for GLC24 MME model. Added the top25 predictions files as they are not heavy. * Added specific .gitignore for MME inference and evaluation folder to opt out heavy files only * Updated values of previously created .gitignore * Moved and updated previous specific .gitignore * Added entries to root .gitignore * Added task selection (multilabel or other) in malpolon.data.datasets.geolifeclef2024_pre_extracted.GLC24Datamodule * WiP: glc24 mme habitat integration * COrrected typos * Changed multiclass prediction filtering to keep all predictions and probas out of predict_logits_to_class() * Fixed GLC24 mme habitat download method * Reset glc24 mme habitat config file * Added GLC24 MME habitat model dataset as new Malpolon dataset within malpolon.data.datasets.geolifeclef2024_pre_extracted * Renamed inference evaluation script for GLC24 pre-extracted examples. Added dcostrings to said examples. * Fixed habitat dataset folder not being created before calling symbolic links * Added docstrings and linting * Removed unnecessary files * Added glc24 pre-extracted species unit test * Added glc24_pre_extracted examples * Updated test_examples pytest run skips and cleaned file. * linting * Docstrings glc24_pre_extracted

…predict_point': Changed checkpoint state_dict loading from model to LightningModule (breaking changes). Added iterable data type compatibility.

…er learning examples

tlarcher and others added 14 commits August 13, 2024 10:15

Updated setup.py for v1.3.0

16b61cc

Merge branch 'dev' of github.com:plantnet/malpolon into dev

1599743

Updated sklearn verison

44c74d4

Added GLC24 pre_extracted inference example

5126ea2

Added more flexibility in contructing output point prediction CSV

405cc30

'malpolon.models.standard_prediction_systems.GenericPredictionSystem.…

0caee77

…predict_point': Changed checkpoint state_dict loading from model to LightningModule (breaking changes). Added iterable data type compatibility.

Added loss pos_weight copy security

ed3619d

Added GLC24 pre_extracted example config file

a62731d

Added glc24 pre_extracted habitat example in inference version

9fcb92f

Completed test_examples with glc24 pre_extracted inference and transf…

d2dbba8

…er learning examples

Fixed typo in readme

9df3cdd

Updated examples following state_dict update

44cabda

tlarcher self-assigned this Sep 6, 2024

tlarcher merged commit 93fd2c3 into main Sep 6, 2024
1 check passed

tlarcher deleted the dev branch September 6, 2024 20:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev #60

Dev #60

tlarcher commented Sep 6, 2024

Dev #60

Dev #60

Conversation

tlarcher commented Sep 6, 2024

📝 Changelog

Major

Minor

Examples

Tests

🔗 Links

✅ Checklist