Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataModules: skip prepare_data #967

Closed
adamjstewart opened this issue Dec 21, 2022 · 1 comment · Fixed by #974
Closed

DataModules: skip prepare_data #967

adamjstewart opened this issue Dec 21, 2022 · 1 comment · Fixed by #974
Labels
datamodules PyTorch Lightning datamodules
Milestone

Comments

@adamjstewart
Copy link
Collaborator

Summary

We should skip the prepare_data step in our data modules unless downloading is actually required.

Rationale

Instantiating the dataset is slow. Currently, all of our datamodules instantiate the dataset at least twice on the off chance that someone is requesting the dataset to be downloaded. This isn't needed most of the time.

Implementation

A better solution would be to replace:

def prepare_data(self) -> None:
    Dataset(**self.kwargs)

with this:

def prepare_data(self) -> None:
    if self.kwargs.get("download", False):
        Dataset(**self.kwargs)

in all of our data modules.

Alternatives

No response

Additional information

No response

@adamjstewart adamjstewart added the datamodules PyTorch Lightning datamodules label Dec 21, 2022
@adamjstewart adamjstewart added this to the 0.3.2 milestone Dec 21, 2022
@iamhbc
Copy link

iamhbc commented Dec 25, 2022

The solution is to replace the prepare_data() method with the code above in all of our data modules. This code will check the "download" keyword argument and only download the dataset if it is set to true. This will significantly reduce the time required to instantiate the dataset and speed up our data modules.

@adamjstewart adamjstewart modified the milestones: 0.3.2, 0.4.0 Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datamodules PyTorch Lightning datamodules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants