Skip to content

Sharding the dataset #85

Answered by kohpangwei
renmengye asked this question in Q&A
Discussion options

You must be logged in to vote

We currently don't have any plans to add sharded data loaders, sorry, though you're definitely welcome to write your own using the underlying WILDSDataset classes, and we'd be happy to look it over! For our own experiments, we found it helpful to first copy (a compressed version of) the data from the network drive to the local disk before running the script. Would that help?

Other potentially things that you might already be doing include: increasing the number of CPUs available for the job, and increasing the num_workers for the data loader.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by renmengye
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants