-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BatchAsyncMapper #1044
Conversation
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
return new_batch | ||
|
||
|
||
@functional_datapipe("async_map_batches") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, any naming suggestion is welcome
self, | ||
source_datapipe: IterDataPipe, | ||
async_fn: Callable, | ||
input_col=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious if it will becomes something like input_cols/output_cols
or it always has to be single column?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current state:
input_col
accepts multiple elementsoutput_col
is not supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Question:
- Will we add a
AsyncMapper
or this will be all? It seems like it will be mostly be the same except maybe instead of asking forbatch_size
, automatically set it tobatch_size = max-concurrency
. - Providing some async options for IO DataPipes will be nice. Or somehow use this DataPipe in combination with those as an example.
I am not sure. Probably not until there is a solid request. Rather than setting
It's hard to combine with other DataPipes because this DataPipe takes async function directly rather than relying on if prior DataPipe has an async operation. |
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Changes
BatchAsyncMapper
to support async processingbatch().async_map().flatmap()
input_col
/output_col
are added to align the behavior toMapper