Skip to content

Commit

Permalink
Optimizers / Schedulers flexbility rework (#67)
Browse files Browse the repository at this point in the history
* Model weights download (#56)

* Changed custom models location in new module 'malpolon.models.custom_models'. This includes glc24 pre_extracted MME model and multi_modal.py. For MME: classificationsystem and nn module have been split in 2 files to allow calling MME from model_builder without triggering a circular import through check_model. Updated examples consequently.

* Fix: state_dict altered during training.
- state_dict contains a loss parameter pos_weight as key loss.pos_weight. This key is created when the loss is instantiated by GenericPredictionSystem. However, this loss parameter was accessed and modified during the _step() process, which also alters the state_dict. Consequently, when loading the model by its checkpoint, there would be a value mismatch and the model would not load to resume training. This has been fixed by restoring the initial value of the loss parameter within the _step() function before the return statement.
- 'positive_weigh_factor' model hyperparameter has been deleted and replaced by loss parameter 'pos_weight', which achieves the same purpose. In the config file, 'positive_weigh_factor' model key has been substituted for subkey 'pos_weight' nested under 'loss_kwargs' nested in the optimizer section

* Cleaned remainings of previous commit testing

* Added download weight option for all classification system and updated checkopoint_path call for MME example

* Fixed wrong checkpoint_path path initialization behavior.
- glc24_cnn_multimodal_ensemble: updated example config file and main script to new checkpoint_path behavior, in both training and inference runs
- standard_prediction_systems.py: Fixed wrong checkpoint_path path initialization behavior
- glc2024_pre_extracted_prediction_system.py: added missing checkpoint_path argument and removed checkpoint_path setter as it is carried out by GenericPredictionSystem

* Updated example cnn_on_rgbnir_torchgeo following checkpoint_path update

* Updated example cnn_on_rgbnir_concat following checkpoint_path update

* Updated example cnn_on_rgbnir_glc23_patches following checkpoint_path update

* Reset yaml file glc23 example

* Fixed wrong variable assignment in exmaples micro_geolifeclef2022/cnn_on_rgb_nir_patches  and micro_geolifeclef2022/cnn_on_rgb_patches

* Added predict run part in example geolifeclef2022/cnn_on_rgb_patches and updated main script following checkpoint_path update.
- data_module: Added more flexibility for predictions without targets
- geolifeclef2022 dataset: Added default -1 value for targets in predict mode to comply with standard_prediction_system predict() method

* Updated glc22 and microglc22 examples following checkpoint_path update, and added inference part in the run section for those which didn't have one. Added input argument in custom GLC22 datamodules + model output in prediction mode, to such extent.

* Updated CIFAR-10 example following checkpoint_path update

* Updated all inference examples following checkpoint_path update

* Removed duplicate import

* Updated code docstrings

* Fixed task value from binary to multilabel (doesn't change behavior)

* Added 'malpolon' as model providers.
- model_builder: Added provider method and created new dictionary with model names as keys, and local imports of models as values

- data_module: Added posisblity of applying no activation function when running inference, so as to output the model's logits. Enhanced CSV export method's info prints.

- glc2024_multimodal_ensemble_model: Added new init argument and class attribute 'pretrained' which the datmaodule uses to determine whether to download pretrained weights (formerly: a standalone 'weights_download' variable was used by the datamodule). Added docstrings.

- glc2024_pre_extracted_prediction_system: Changed handling behavior of the model's loss during '_step()' to prevent overwritting the loss parameter during training which resulted in a de-synchronization of the state_dcit() before and after running the model (since loss parameters are automatically added as learnable parameters)
- glc24_cnn_multimodal_ensemble.yaml: Updated config file accordingly. Cleaned config file with correct values.
- glc24_cnn_multimodal_ensemble.py: Updated MME main srcipt accordingly. Changed activation function of inference run from softmax() to sigmoid()

* Updated glc22 tests following class getter changes

* Removed commented dict

* Updated setup.py for v1.3.0

* Updated sklearn verison

* Added optimizer and scheduler selection via config file. Applied changes to sentinel-2a-rgbnir_bioclim example yaml config file

* Dosctrings

* Optimizer / scheduler rework [backward compatible].
- malpolon.models.utils: Changed behavior of check_optimizer() and added check_scheduler() to allow users to input one or several optimizers (and optionally 1 scheduler per optimizer, possibly with a lr_scheduler_config descriptor) via their config files.
- malpolon.models.standard_prediction_systems: changed instantiation of optimizer(s) and scheduler(s) in class GenericPredictionSystem. The class attributes are now lists of instantiated optimizers (respectively, of lr_scheduler_config dictionaries). Updated behavior of method configure_optimizers() to return a dictionary containing all the optimizers and scheudlers (cf. https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.core.LightningModule.html#lightning.pytorch.core.LightningModule.configure_optimizers).
- malpolon.tests.test_models.utils: Added all corresponding unit tests, testing both valid scenarios and edge cases of incorrect user inputs in the config file.
- sentinel-2a-rgbnir_bioclim example: updated the config file to fit previously described changes.

* Updated text_examples skip rules.

* WiP: Updated MME example config file and ClassificationSystemGLC24() class because recent rework of optimizer(s) and scheduler(s) are not compatible with the classification system calls.

* WiP: updated glc24_pre_extracted example following optimizers and schedulers update. Updated test_examples consequently

* Cleaned files and updated docstrings

* restored default test_examples pytest skips values

* linting

* WiP: update documentation and creating a new example-wide README with instruction on what to do to create a custom example

* Added new README in _malpolon/examples_ exaplaining how to create and run examples in a generic way, for each scenario (WiP)

* Updated exmaples/ README

* Updated exmaples/ README

* Updated root readme to link to new examples/ readme

* Updated examples readme with hyperparameters and updated /sentinel-2a-rgbnir_bioclim examples consequently with new 'optim' key

* Updated examples/ readme with info about config parameters

* Changed Conda source following licensing changes which can make its use non free

* Updated Readmes

* Fixed examples/Readme info

* Updated sentinel-2a-rgbnir & sentinel-2a-rgbnir_bioclim examples config files and scripts following optimizers update

* Updated all examples following optimizers update

* Updated config files following text\_examples following optimizers update. All test_examples ran: all passed.

* Updated READMEs with Troubleshooting and Contribution section

* Added linting section to root README, and added bash script to run linters and tests

* Added instructions relative to the checking script in root README

* Linting
  • Loading branch information
tlarcher authored Oct 29, 2024
1 parent 1a7eb0d commit c964faf
Show file tree
Hide file tree
Showing 60 changed files with 1,484 additions and 259 deletions.
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
\#*#
*~
.*
checkMyCode.sh
*.pyc
*.egg-info

Expand Down
78 changes: 64 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If you're not a deep learning or PyTorch expert but nonetheless want to use visu

## 🧭 Usage

Malpolon is destined to be used by various user profiles, some more experimented than others. To this end, we provide several examples of usage of the framework, organized by use case or _scenarios_. These examples can be found in the `examples` folder of the repository, each with a README file for more details on how to use the scripts.
Malpolon is destined to be used by various user profiles, some more experimented than others. To this end, we provide several examples of usage of the framework, organized by use case or _scenarios_. These examples can be found in the `examples` folder of the repository, each with a README file for more details on how to use the scripts. Additionally, check out our guide "[**Getting started with examples**](examples/)".

Here is a list of the currently available scenarios:

Expand Down Expand Up @@ -109,6 +109,9 @@ pip install -r requirements_python3.10.txt

- **Via `conda`**

⚠️ Be aware that conda recently changed its licensing and you may subject to fees, or be limited in downloads. Sources: [anaconda website](https://www.anaconda.com/blog/update-on-anacondas-terms-of-service-for-academia-and-research),
[datacamp blog recap](https://www.datacamp.com/blog/navigating-anaconda-licensing) ⚠️

You can also use `conda` to install your packages.

```script
Expand Down Expand Up @@ -172,6 +175,66 @@ make -C docs html

The result can be found in `docs/_build/html`.


## ⚒️ Troubleshooting
Commonly encountered errors when using the framework are compiled [here](examples/README.md#⚒️-troubleshooting).

## 🚀 Contributing
### **Guidelines**

Issues and PR templates are provided to help you start a contribution to the project.

A checking script is also provided and can run checks relative to the 2 next sections with the following command:

```bash
./checkMyCode all
```

### **Unit tests**
<details>
<summary><i><u>Click here to expand instructions</u></i></summary>

When submitting, make sure the unit tests all pass without errors. These tests are located at `malpolon/tests/` and can be ran all at once, with a code coverage estimation, via command line:

```bash
./checkMyCode.sh t # or `pytest malpolon/tests/`
```
Specify a file path as argument to run a single test file:

```bash
./checkMyCode.sh malpolon/tests/<TEST_FILE>.py # or `pytest malpolon/tests/<TEST_FILE>.py`
```

Run individual test functions via `python malpolon/tests/test_<module>.py` by modifying the files beforehand to call the functions you want to test with:

```python
if __name__ == '__main__':
test_my_function()
```

**This is especially useful for `malpolon/tests/test_examples.py` which tests all the provided examples**, ensuring they do not crash. However, these **require having all the datasets and take a while to run**. Some data you might not have local access to.\
To skip a test function, add a decorator `@pytest.mark.skip()` above the function definition.

</details>

### **Linting**

<details>
<summary><i><u>Click here to expand instructions</u></i></summary>

Likewise, do care about writing a clean code. The project uses `flake8`, `Pylint` and `Pydocstyle` to check the good formatting and documentation of your code. To run linters check on your code you can either run each of these library independently or use the checking script:

```bash
./checkMyCode.sh l
```

Run linters on non-test file(s) :

```bash
./checkMyCode.sh <FILE_PATH_1> <FILE_PATH_2>
```
</details>

## 🚆 Roadmap

This roadmap outlines the planned features and milestones for the project. Please note that the roadmap is subject to change and may be updated as the project progress.
Expand Down Expand Up @@ -220,19 +283,6 @@ Here is an overview of the main Python librairies used in this project.
* [![Hydra](https://img.shields.io/badge/Hydra-%23729DB1.svg?logo=hydra&logoColor=white)](https://hydra.cc/docs/intro/) - To handle models' hyperparameters
* [![Cartopy](https://img.shields.io/badge/Cartopy-%2300A1D9.svg?logo=cartopy&logoColor=white)](https://scitools.org.uk/cartopy/docs/latest/) - To handle geographical data


## ⚒️ Troubleshooting
### `ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256, 1, 1])`

This error might occur when your model is trying to perform a forward pass on a layer which encounters division by 0 because of how small the data is.

Typically, a ResNet block cannot run a `batch_norm` operation on a tensor of size `[1, 256, 1, 1]` because for each of the 256 channels, there is only 1 value to normalize. Since the operation is `value - mean / std`, the std is 0 and the operation is impossible.

To solve this issue, you can either:
- **Increase the batch size** of your dataloader. A small batch size can lead to the last one containing only 1 element _e.g.: a dataset of 99 elements with batch size of 2. Increasing the batch size to 4 would leave a remainder of 3 elements in the last batch [3, 256, 1, 1]_.
- **Increase the input size of your data** so that the encoding layers don't reduce the size too much _e.g.: a patch size of 64 leads to [1, 256, 4, 4]_
- **Change the model architecture** by removing the `batch_norm` layers (can lead to further issues).

## Acknowledgments

This work is made possible through public financing by the [European Commission](https://commission.europa.eu/index_en) on european projects [MAMBO](https://www.mambo-project.eu/) and [GUARDEN](https://guarden.org/).
Expand Down
45 changes: 45 additions & 0 deletions checkMyCode.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash
# Runs the static analysers and tests

case $1 in
all)
list_dir=("malpolon/" "malpolon/tests/")
;;
l)
list_dir=("malpolon/data" "malpolon/models" "malpolon/plot")
;;
t)
list_dir=("malpolon/tests/")
;;
"")
list_dir=("malpolon/")
;;
*)
list_dir=($1)
;;
esac

for dir in "${list_dir[@]}";
do
echo -e "\n\e[1m++++++++++++++++++++++++++++++++++++++"
echo -e " Working in \e[92m $dir \e[0m... "
echo -e "\e[1m++++++++++++++++++++++++++++++++++++++\e[0m"
liste=$(find $dir -type f -iname '*.py' -not -path '*ipynb_checkpoints/*')
for fichier in $liste;
do
if [[ $fichier != *"__init__.py"* ]] ; then
if [[ $fichier == *"tests/"* ]] ; then
echo -e "Running \e[95m\e[1m pytest\e[0m,\e[95m\e[1m coverage \e[0m on\e[92m $fichier \e[39m..."
coverage run -m pytest $fichier
coverage report
else
echo -e "Running \e[95m\e[1m Flake8 \e[0m on\e[92m $fichier \e[39m..."
flake8 $fichier
echo -e "Running \e[95m\e[1m Pylint \e[0m on\e[92m $fichier \e[39m..."
pylint $fichier
echo -e "Running \e[95m\e[1m Pydocstyle \e[0m on\e[92m $fichier \e[39m..."
pydocstyle $fichier -v
fi
fi
done
done
Binary file added docs/resources/malpolon_macro_view.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/malpolon_main_components.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion environment_python3.10.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: malpolon_3.10
channels:
- defaults
- conda-forge
- nodefaults
dependencies:
- python==3.10.12
- cmake==3.26.3
Expand Down
Loading

0 comments on commit c964faf

Please sign in to comment.