Modeling - Score v0 #22

miltminz · 2020-12-08T15:34:28Z

This PR aims at adding new features for the modelling of the pyro-score

A new module in datasets called model_dataset.py has been added: this deals with the aggregations (by day and department) and merge of ERA5, FWI and VIIRS datasets for creating a new dataset to be used in the modelling phase
New folder models with score_v0.py: here we add functions to process the dataset for modelling (train/test split, lags, filtering correlated features to target, random forest and XGBoost classifier)
A script example_scorev0.py serving as an example of how to use the new features
Unit tests for the new features

frgfm

Thanks for the PR!
I added a few comments but it seems good to me! The unittest seems to be failing because of RAM just like earlier unittest with the full DF. Not sure what would be the best move here 🤷‍♂️

pyro_risks/datasets/model_dataset.py

test/test_models.py

chloeskt · 2020-12-08T16:56:17Z

To add on what @frgfm said regarding the failed tests. It is indeed due to the RAM, I believe that we need to add "mock" datasets in the release (such as what I did for ERA5, FIRMS and VIIRS here https://github.com/pyronear/pyro-risks/releases/tag/v0.1.0-data). There are two datasets tests for which this has yet to be implemented:

FWI (the largest one)
NOAA

Do you guys think that it is a good idea to proceed in such way ? @miltminz @frgfm

In that case, I believe that you'll need to add mock test datasets for FWI Milton, and for NOAA if you can (otherwise I'll do it in another PR)

If you have better ideas, I'm all ears !

frgfm · 2020-12-08T17:40:04Z

@chloeskt I guess the problem will come back every time we get a big dataset.
Perhaps we could make the test only locally? It cannot actually be done on github and it sounds heavy to make a mock dataset for each set we want to upload :/

I'm fine with either way, whichever you think is best

chloeskt · 2020-12-08T17:58:39Z

I don't think that doing the tests locally is a good idea, plus we are going to face the same issue: for instance if your computer has only 4GB or 8GB of RAM, tests might fail. So my guess is that we need to add mock datasets, even if it's heavy and uncomfortable. But I have less experience than you guys so if it's easier/better for everyone to just run the tests locally, then fine by me !

codecov · 2020-12-08T19:22:45Z

Codecov Report

Merging #22 (48dc220) into master (d92e82c) will decrease coverage by 1.25%.
The diff coverage is 90.29%.

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
- Coverage   96.65%   95.40%   -1.26%     
==========================================
  Files          11       13       +2     
  Lines         419      522     +103     
==========================================
+ Hits          405      498      +93     
- Misses         14       24      +10

Flag	Coverage Δ
unittests	`95.40% <90.29%> (-1.26%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pyro_risks/datasets/era_fwi_viirs.py	`90.00% <90.00%> (ø)`
pyro_risks/models/score_v0.py	`90.00% <90.00%> (ø)`
pyro_risks/config.py	`100.00% <100.00%> (ø)`
pyro_risks/datasets/__init__.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d92e82c...48dc220. Read the comment docs.

frgfm

Thanks for the edits!

test/test_models.py

miltminz added 6 commits December 8, 2020 16:15

add class for creation of modeling dataset

9931da6

add test for modeldf class

2132571

changed init for recognizing modeldf

abd9b3c

add modeling functions for score computations

3d31acf

add tests for models section

8d9f59e

add example for score_v0 new features

7644d64

miltminz added enhancement New feature or request module: models module: test labels Dec 8, 2020

miltminz added this to the 0.1.0 milestone Dec 8, 2020

miltminz requested review from frgfm, GHCamille, jsakv and chloeskt December 8, 2020 15:34

miltminz self-assigned this Dec 8, 2020

frgfm reviewed Dec 8, 2020

View reviewed changes

pyro_risks/datasets/model_dataset.py Outdated Show resolved Hide resolved

test/test_models.py Outdated Show resolved Hide resolved

test/test_models.py Show resolved Hide resolved

miltminz added 6 commits December 8, 2020 19:46

modified config adding new data sources fallbacks

e5e86dc

modified datasets init with new module name

ae2daa1

modified module and class names

92d843c

modified script example with new class name

20eac77

modified unit tests

13a0882

fixed failing unit test

48dc220

frgfm approved these changes Dec 8, 2020

View reviewed changes

test/test_models.py Show resolved Hide resolved

chloeskt approved these changes Dec 8, 2020

View reviewed changes

miltminz merged commit 1bd777b into master Dec 8, 2020

miltminz deleted the model-milton branch December 8, 2020 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modeling - Score v0 #22

Modeling - Score v0 #22

miltminz commented Dec 8, 2020

frgfm left a comment

chloeskt commented Dec 8, 2020

frgfm commented Dec 8, 2020

chloeskt commented Dec 8, 2020

codecov bot commented Dec 8, 2020

frgfm left a comment

Modeling - Score v0 #22

Modeling - Score v0 #22

Conversation

miltminz commented Dec 8, 2020

frgfm left a comment

Choose a reason for hiding this comment

chloeskt commented Dec 8, 2020

frgfm commented Dec 8, 2020

chloeskt commented Dec 8, 2020

codecov bot commented Dec 8, 2020

Codecov Report

frgfm left a comment

Choose a reason for hiding this comment