You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ohinds
changed the title
Shard size is automatically determined to produce ~100MB tfrecrods files
Shard size is automatically determined to produce ~100MB tfrecords files
Aug 25, 2023
100MB doesn't make sense on fast disk systems like we have on openmind or for brain imaging data. i believe we have played with TB sized shards as well. i would make this a user controllable parameter.
Well, the default currently produces tfrecord files sizes of about 20MB, so that makes even less sense. I'm suggesting an automatically-determined default, with the facility for people to override if the want something else.
Also, specifying a shard size in bytes makes way more sense than number of examples, as it currently is.
According to the tensorflow user guide, tfrecords files should be ~100MB (https://github.com/tensorflow/docs/blob/master/site/en/r1/guide/performance/overview.md). When tfrecords datasets are constructed from files, the shard size could be automatically computed to follow this guidance.
The text was updated successfully, but these errors were encountered: