Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add various utility meta-transforms to Beam. #32445

Merged
merged 7 commits into from
Oct 18, 2024
Merged

Conversation

robertwb
Copy link
Contributor

These were inspired by some discussions at the Beam summit.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @liferoad for label python.
R: @Abacn for label java.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@liferoad
Copy link
Collaborator

Shall we add this to https://beam.apache.org/documentation/programming-guide/#flatten as one of core PTransform?

@robertwb
Copy link
Contributor Author

Added a note about the Flatten alternative. I don't think Tee is important/core enough to add to the guide itself, but have been thinking it might be good to do a blog post calling out these and other utility transforms (like BatchElements).

@hjtran
Copy link
Contributor

hjtran commented Sep 26, 2024

These look really convenient. Especially for pipelines that might write out intermediate results or merge older pcollections (something the schrodinger use cases do a lot).

Added a note about the Flatten alternative. I don't think Tee is important/core enough to add to the guide itself, but have been thinking it might be good to do a blog post calling out these and other utility transforms (like BatchElements).

I personally discover useful transforms through the beam transform catalog so it'd be nice if some examples were included there.

Copy link
Contributor

github-actions bot commented Oct 4, 2024

Reminder, please take a look at this pr: @liferoad @Abacn

Copy link
Contributor

Reminder, please take a look at this pr: @liferoad @Abacn

Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @tvalentyn for label python.
R: @damondouglas for label java.
R: @melap for label website.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)


@Override
public String getKindString() {
return "Flatten.With";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge.With might be a possible alternative name. but maybe it adds more confusion since we have a pre-existing Flatten already for a similar concept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I chose this name because it is literally syntactic sugar for the same primitive Flatten operation. (Personally, I'd prefer disjoint union, but that's probably to obscure let alone too late to change now...)

pcoll1 = partitioned[0]
pcoll2 = partitioned[1]
pcoll3 = partitioned[2]
SomeTransform = lambda: beam.Map(lambda x: x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have other example for merging with a transform output? feel free to create a bug to add it. examples are just important as having the capability, so i think we should highlight these everywhere (beam playground, snippets, website docs, etc). Can be with follow up /starter bugs if you don't have time to do all that in one change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this example, filed #32840 for follow-up. It would be good to think about how we could structure things to further reduce redundancy between these various forms of documentation.

Copy link
Contributor Author

@robertwb robertwb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. I added one more example, but filed an issue for further documentation in the interest of not blocking things.


@Override
public String getKindString() {
return "Flatten.With";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I chose this name because it is literally syntactic sugar for the same primitive Flatten operation. (Personally, I'd prefer disjoint union, but that's probably to obscure let alone too late to change now...)

pcoll1 = partitioned[0]
pcoll2 = partitioned[1]
pcoll3 = partitioned[2]
SomeTransform = lambda: beam.Map(lambda x: x)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this example, filed #32840 for follow-up. It would be good to think about how we could structure things to further reduce redundancy between these various forms of documentation.

@tvalentyn
Copy link
Contributor

I personally discover useful transforms through the beam transform catalog so it'd be nice if some examples were included there.

@robertwb do we need an issue or commit for this as well?

PTAL at website failures & please look at stage website content before merge to see if the changes reflect your intent. Link should be available in the Summary tab of the Stage_GCS GithubAction run.

LGTM otherwise, thanks!

@robertwb
Copy link
Contributor Author

Thanks. Looking into the website failures...

@robertwb
Copy link
Contributor Author

I personally discover useful transforms through the beam transform catalog so it'd be nice if some examples were included there.

@robertwb do we need an issue or commit for this as well?

Update the description to mention both.

@robertwb robertwb merged commit 3b839d1 into apache:master Oct 18, 2024
118 of 120 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants