Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modeling - Score v0 #22

Merged
merged 12 commits into from
Dec 8, 2020
Merged

Modeling - Score v0 #22

merged 12 commits into from
Dec 8, 2020

Conversation

miltminz
Copy link
Collaborator

@miltminz miltminz commented Dec 8, 2020

This PR aims at adding new features for the modelling of the pyro-score

  • A new module in datasets called model_dataset.py has been added: this deals with the aggregations (by day and department) and merge of ERA5, FWI and VIIRS datasets for creating a new dataset to be used in the modelling phase
  • New folder models with score_v0.py: here we add functions to process the dataset for modelling (train/test split, lags, filtering correlated features to target, random forest and XGBoost classifier)
  • A script example_scorev0.py serving as an example of how to use the new features
  • Unit tests for the new features

@miltminz miltminz added this to the 0.1.0 milestone Dec 8, 2020
@miltminz miltminz self-assigned this Dec 8, 2020
Copy link
Member

@frgfm frgfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I added a few comments but it seems good to me! The unittest seems to be failing because of RAM just like earlier unittest with the full DF. Not sure what would be the best move here 🤷‍♂️

pyro_risks/datasets/model_dataset.py Outdated Show resolved Hide resolved
test/test_models.py Outdated Show resolved Hide resolved
test/test_models.py Show resolved Hide resolved
@chloeskt
Copy link
Contributor

chloeskt commented Dec 8, 2020

To add on what @frgfm said regarding the failed tests. It is indeed due to the RAM, I believe that we need to add "mock" datasets in the release (such as what I did for ERA5, FIRMS and VIIRS here https://github.com/pyronear/pyro-risks/releases/tag/v0.1.0-data). There are two datasets tests for which this has yet to be implemented:

  • FWI (the largest one)
  • NOAA

Do you guys think that it is a good idea to proceed in such way ? @miltminz @frgfm

In that case, I believe that you'll need to add mock test datasets for FWI Milton, and for NOAA if you can (otherwise I'll do it in another PR)

If you have better ideas, I'm all ears !

@frgfm
Copy link
Member

frgfm commented Dec 8, 2020

@chloeskt I guess the problem will come back every time we get a big dataset.
Perhaps we could make the test only locally? It cannot actually be done on github and it sounds heavy to make a mock dataset for each set we want to upload :/

I'm fine with either way, whichever you think is best

@chloeskt
Copy link
Contributor

chloeskt commented Dec 8, 2020

I don't think that doing the tests locally is a good idea, plus we are going to face the same issue: for instance if your computer has only 4GB or 8GB of RAM, tests might fail. So my guess is that we need to add mock datasets, even if it's heavy and uncomfortable. But I have less experience than you guys so if it's easier/better for everyone to just run the tests locally, then fine by me !

@codecov
Copy link

codecov bot commented Dec 8, 2020

Codecov Report

Merging #22 (48dc220) into master (d92e82c) will decrease coverage by 1.25%.
The diff coverage is 90.29%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #22      +/-   ##
==========================================
- Coverage   96.65%   95.40%   -1.26%     
==========================================
  Files          11       13       +2     
  Lines         419      522     +103     
==========================================
+ Hits          405      498      +93     
- Misses         14       24      +10     
Flag Coverage Δ
unittests 95.40% <90.29%> (-1.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pyro_risks/datasets/era_fwi_viirs.py 90.00% <90.00%> (ø)
pyro_risks/models/score_v0.py 90.00% <90.00%> (ø)
pyro_risks/config.py 100.00% <100.00%> (ø)
pyro_risks/datasets/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d92e82c...48dc220. Read the comment docs.

Copy link
Member

@frgfm frgfm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edits!

test/test_models.py Show resolved Hide resolved
@miltminz miltminz merged commit 1bd777b into master Dec 8, 2020
@miltminz miltminz deleted the model-milton branch December 8, 2020 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants