-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modeling - Score v0 #22
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I added a few comments but it seems good to me! The unittest seems to be failing because of RAM just like earlier unittest with the full DF. Not sure what would be the best move here 🤷♂️
To add on what @frgfm said regarding the failed tests. It is indeed due to the RAM, I believe that we need to add "mock" datasets in the release (such as what I did for ERA5, FIRMS and VIIRS here https://github.com/pyronear/pyro-risks/releases/tag/v0.1.0-data). There are two datasets tests for which this has yet to be implemented:
Do you guys think that it is a good idea to proceed in such way ? @miltminz @frgfm In that case, I believe that you'll need to add mock test datasets for FWI Milton, and for NOAA if you can (otherwise I'll do it in another PR) If you have better ideas, I'm all ears ! |
@chloeskt I guess the problem will come back every time we get a big dataset. I'm fine with either way, whichever you think is best |
I don't think that doing the tests locally is a good idea, plus we are going to face the same issue: for instance if your computer has only 4GB or 8GB of RAM, tests might fail. So my guess is that we need to add mock datasets, even if it's heavy and uncomfortable. But I have less experience than you guys so if it's easier/better for everyone to just run the tests locally, then fine by me ! |
Codecov Report
@@ Coverage Diff @@
## master #22 +/- ##
==========================================
- Coverage 96.65% 95.40% -1.26%
==========================================
Files 11 13 +2
Lines 419 522 +103
==========================================
+ Hits 405 498 +93
- Misses 14 24 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the edits!
This PR aims at adding new features for the modelling of the pyro-score
datasets
calledmodel_dataset.py
has been added: this deals with the aggregations (by day and department) and merge of ERA5, FWI and VIIRS datasets for creating a new dataset to be used in the modelling phasemodels
withscore_v0.py
: here we add functions to process the dataset for modelling (train/test split, lags, filtering correlated features to target, random forest and XGBoost classifier)example_scorev0.py
serving as an example of how to use the new features