Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement BatchAsyncMapper #1044

Closed
wants to merge 5 commits into from
Closed

Implement BatchAsyncMapper #1044

wants to merge 5 commits into from

Conversation

ejguan
Copy link
Contributor

@ejguan ejguan commented Feb 24, 2023

Changes

  • Add BatchAsyncMapper to support async processing
    • Add syntax sugar to combine batch().async_map().flatmap()
    • input_col/output_col are added to align the behavior to Mapper
    • Add unit tests

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 24, 2023
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

return new_batch


@functional_datapipe("async_map_batches")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, any naming suggestion is welcome

self,
source_datapipe: IterDataPipe,
async_fn: Callable,
input_col=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious if it will becomes something like input_cols/output_cols or it always has to be single column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current state:

  • input_col accepts multiple elements
  • output_col is not supported

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Question:

  1. Will we add a AsyncMapper or this will be all? It seems like it will be mostly be the same except maybe instead of asking for batch_size, automatically set it to batch_size = max-concurrency.
  2. Providing some async options for IO DataPipes will be nice. Or somehow use this DataPipe in combination with those as an example.

@ejguan
Copy link
Contributor Author

ejguan commented Feb 24, 2023

1. Will we add a AsyncMapper or this will be all? It seems like it will be mostly be the same except maybe instead of asking for batch_size, automatically set it to batch_size = max-concurrency.

I am not sure. Probably not until there is a solid request. Rather than setting batch_size = max-concurrency, i think it might be better to do max-concurrency=batch_size

2. Providing some async options for IO DataPipes will be nice. Or somehow use this DataPipe in combination with those as an example.

It's hard to combine with other DataPipes because this DataPipe takes async function directly rather than relying on if prior DataPipe has an async operation.
I will try to come up with an example for async io operations.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in daabdac.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged topic: new feature topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants