Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored dataloader #1040

Merged
merged 4 commits into from
Dec 7, 2022
Merged

Refactored dataloader #1040

merged 4 commits into from
Dec 7, 2022

Conversation

karl-richter
Copy link
Collaborator

🔬 Background

  • The dataloader currently uses a for loop in the __getitem__ function to retrieve a sample from the dataset. Since for a dataset with 10.000 samples that is trained for 100 epochs, this function is called 1 Mio. times, pre-computing the samples should bring a speed-up of the model training.

🔮 Key changes

  • Pre-compute the samples to only use indexing during retrieval.

📋 Review Checklist

  • I have performed a self-review of my own code.
  • I have commented my code, added docstrings and data types to function definitions.
  • I have added pytests to check whether my feature / fix works.

Please make sure to follow our best practices in the Contributing guidelines.

@github-actions
Copy link

github-actions bot commented Dec 5, 2022

036460b

Model Benchmark

Benchmark Metric main current diff
AirPassengers MAE_val 15.2698 15.2698 0.0%
AirPassengers RMSE_val 19.4209 19.4209 0.0%
AirPassengers Loss_val 0.00195 0.00195 0.0%
AirPassengers RegLoss_val 0 0 0.0%
AirPassengers epoch 89 89 0.0%
AirPassengers MAE 9.82902 9.82902 0.0%
AirPassengers RMSE 11.7005 11.7005 0.0%
AirPassengers Loss 0.00056 0.00056 0.0%
AirPassengers RegLoss 0 0 0.0%
AirPassengers time 4.49 4.51 0.45%
AirPassengers system_performance 0.7978 0.8004 0.33%
AirPassengers system_std 0.00248 0.0008 -67.74%
PeytonManning MAE_val 0.64636 0.64636 0.0%
PeytonManning RMSE_val 0.79276 0.79276 0.0%
PeytonManning Loss_val 0.01494 0.01494 0.0%
PeytonManning RegLoss_val 0 0 0.0%
PeytonManning epoch 37 37 0.0%
PeytonManning MAE 0.42701 0.42701 0.0%
PeytonManning RMSE 0.57032 0.57032 0.0%
PeytonManning Loss 0.00635 0.00635 0.0%
PeytonManning RegLoss 0 0 0.0%
PeytonManning time 11.74 11.81 0.6%
PeytonManning system_performance 0.7874 0.7942 0.86%
PeytonManning system_std 0.00049 0.0004 -18.37%
YosemiteTemps MAE_val 1.72949 1.72949 0.0%
YosemiteTemps RMSE_val 2.27386 2.27386 0.0%
YosemiteTemps Loss_val 0.00096 0.00096 0.0%
YosemiteTemps RegLoss_val 0 0 0.0%
YosemiteTemps epoch 84 84 0.0%
YosemiteTemps MAE 1.45189 1.45189 0.0%
YosemiteTemps RMSE 2.16631 2.16631 0.0%
YosemiteTemps Loss 0.00066 0.00066 0.0%
YosemiteTemps RegLoss 0 0 0.0%
YosemiteTemps time 93.12 94.18 1.14%
YosemiteTemps system_performance 0.7964 0.8008 0.55%
YosemiteTemps system_std 0.00196 0.00117 -40.31%
Model training plots

Model Training

PeytonManning

YosemiteTemps

AirPassengers

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2022

Codecov Report

Merging #1040 (92506d3) into main (66021de) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #1040   +/-   ##
=======================================
  Coverage   90.26%   90.27%           
=======================================
  Files          21       21           
  Lines        4736     4740    +4     
=======================================
+ Hits         4275     4279    +4     
  Misses        461      461           
Impacted Files Coverage Δ
neuralprophet/time_dataset.py 94.52% <100.00%> (+0.08%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@noxan noxan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement to have it once and not with every iteration 👍

@noxan noxan added the status: ready PR is ready to be merged label Dec 5, 2022
@karl-richter karl-richter self-assigned this Dec 7, 2022
@noxan noxan merged commit 036460b into main Dec 7, 2022
@noxan noxan deleted the refactor/dataloader_samples branch December 7, 2022 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready PR is ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants