Should Lightning provide a default worker_init_fn? #6957

awaelchli · 2021-04-11T10:03:46Z

awaelchli
Apr 11, 2021

When your dataset generates random numbers (e.g. for data augmentation), the seed in the workers is only correct for torch but not other libraries like numpy will set an identical seed in all workers [1]. The result is that data augmentations are the same for all workers which can obviously affect performance.

Lightning could provide a default worker init fn which takes care of this issue. Even if we don't add a default, the question is how would the user do it properly today in Lightning? The tricky part is that it is required to reset the seed every epoch (when the dataloader is consumed completely).

Note: An easy way out of this is to not use numpy at all to generate random numbers in the dataloader. When using torch, you're fine.

[1] https://tanelp.github.io/posts/a-bug-that-plagues-thousands-of-open-source-ml-projects/

ethanwharris · 2021-04-11T10:09:18Z

ethanwharris
Apr 11, 2021
Maintainer

A default worker_init_fn might be the way forward, particularly if using seed_everything where the user has directly asked us to take control of seeding things properly.

3 replies

Borda Apr 11, 2021
Maintainer

I like the worker_init_fn :]

awaelchli Apr 11, 2021
Author

Made a quick PR #6960
One thing to consider, this will be a breaking change even if it helps users who have this "bug". By breaking change I mean parity is not guaranteed, performance will differ after releasing this change (for those who use numpy.random).

If we value parity between lightning versions more, we should not add this change.

ethanwharris Apr 11, 2021
Maintainer

That's a very good point. It's definitely nice to have the option, so maybe a seed_workers flag in seed_everything is the way to go? We could even default this flag to True and document that it should be switched to False if parity is important to the user.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should Lightning provide a default worker_init_fn? #6957

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Should Lightning provide a default worker_init_fn? #6957

awaelchli Apr 11, 2021

Replies: 1 comment · 3 replies

ethanwharris Apr 11, 2021 Maintainer

Borda Apr 11, 2021 Maintainer

awaelchli Apr 11, 2021 Author

ethanwharris Apr 11, 2021 Maintainer

awaelchli
Apr 11, 2021

Replies: 1 comment 3 replies

ethanwharris
Apr 11, 2021
Maintainer

Borda Apr 11, 2021
Maintainer

awaelchli Apr 11, 2021
Author

ethanwharris Apr 11, 2021
Maintainer