-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory on shuffling huge datasets #21
Comments
This might be a bug of Marian. Memory shouldn't grow after |
Related Marian issue: marian-nmt/marian-dev#148 |
|
I suspect the running out of memory, even when --shuffle-in-ram is not used, comes from here: Assuming that's actually the cause, we could replace it with a two-pass shuffle:
Edit: or do it like this Edit: for why |
I didn't see this for some time and I assume it's fixed by using OpusTrainer. |
300M dataset, 128 GB RAM
the workaround is to shuffle dataset after the merge step, disable
--shuffle-in-ram
and use--shuffle batches
The text was updated successfully, but these errors were encountered: