-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add num_proc to map and filter calls #706
Conversation
@alextrott16 I vaguely recall you had an issue with adding this at one point. Do you remember this? Seems to work fine for me now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! question and super nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for doing this.
@dakinggg I don't have a particularly clear memory of running into any issues with using num_proc
. If you feel confident that our test coverage would catch any issues resulting from this leading to wonkiness, then I'm satisfied.
Adds the
num_proc
arg to the calls to huggingface dataset map/filter.before:
num-proc-test-baseline-1-cDb1Vk
after:
num-proc-test-1-4nqb1g
dataset map goes from ~27 seconds to ~4 seconds for dolly_hhrlhf. Should be a bigger win for bigger datasets.