Sharding the dataset #85

renmengye · 2021-09-20T15:46:25Z

renmengye
Sep 20, 2021

Will there be support for a sharded version of the data loader (e.g. iWildCam) instead of reading from individual JPG images? I find the data reading sometimes very slow with network drives. Any suggestions? Thanks!

Answered by kohpangwei

Sep 20, 2021

We currently don't have any plans to add sharded data loaders, sorry, though you're definitely welcome to write your own using the underlying WILDSDataset classes, and we'd be happy to look it over! For our own experiments, we found it helpful to first copy (a compressed version of) the data from the network drive to the local disk before running the script. Would that help?

Other potentially things that you might already be doing include: increasing the number of CPUs available for the job, and increasing the num_workers for the data loader.

View full answer

kohpangwei · 2021-09-20T21:33:52Z

kohpangwei
Sep 20, 2021
Maintainer

We currently don't have any plans to add sharded data loaders, sorry, though you're definitely welcome to write your own using the underlying WILDSDataset classes, and we'd be happy to look it over! For our own experiments, we found it helpful to first copy (a compressed version of) the data from the network drive to the local disk before running the script. Would that help?

Other potentially things that you might already be doing include: increasing the number of CPUs available for the job, and increasing the num_workers for the data loader.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding the dataset #85

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Sharding the dataset #85

renmengye Sep 20, 2021

Replies: 1 comment

kohpangwei Sep 20, 2021 Maintainer

renmengye
Sep 20, 2021

kohpangwei
Sep 20, 2021
Maintainer