-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-14347] Add function for simple function registration #17650
[BEAM-14347] Add function for simple function registration #17650
Conversation
R: @lostluck here's the simple function registration you suggested. Arguably we don't need it, but it was simple enough and gives us full consistency |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov Report
@@ Coverage Diff @@
## master #17650 +/- ##
=======================================
Coverage 73.95% 73.95%
=======================================
Files 693 693
Lines 91721 91721
=======================================
Hits 67834 67834
Misses 22638 22638
Partials 1249 1249
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
We do need it for the same reason. Please add a benchmark for wrapping and invoking a simple function caller, and you'll see. The stats combiners are simple binary functions eg for minIntFn The various struct wrappers happened 2nd originally, and piggy backed on the work to make functions faster (partly because that was faster than an alternative method receiver passing approach). |
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
{{if (or $processElementIn $processElementOut)}}// DoFn input and output parameter types should be provided in order as the generic constraints. | ||
{{end}}func DoFn{{$processElementIn}}x{{$processElementOut}}{{(genericTypingRepresentation $processElementIn $processElementOut true)}}(doFn genericDoFn{{$processElementIn}}x{{$processElementOut}}{{(genericTypingRepresentation $processElementIn $processElementOut false)}}) { | ||
registerDoFnTypes(doFn) | ||
registerDoFn{{$processElementIn}}x{{$processElementOut}}StructWrappersAndFuncs{{(genericTypingRepresentation $processElementIn $processElementOut false)}}(doFn) | ||
} | ||
|
||
// Function{{$processElementIn}}x{{$processElementOut}}{{(genericTypingRepresentation $processElementIn $processElementOut true)}} registers your non-structural DoFn to optimize execution at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Function{{$processElementIn}}x{{$processElementOut}}{{(genericTypingRepresentation $processElementIn $processElementOut true)}} registers your non-structural DoFn to optimize execution at runtime. | |
// Function{{$processElementIn}}x{{$processElementOut}}{{(genericTypingRepresentation $processElementIn $processElementOut true)}} registers your functional DoFn to optimize execution at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't commit it directly since it needs to match the registered version, but agreed this is better
Oh yeah, my point isn't that it isn't much better in terms of performance, just that its not a huge usability win vs the user manually calling these themselves. The performance bump is evident from the existing benchmarks on I should've been more clear in my original comment that I do still think this is a good idea, I just think it is less useful because its not too hard for users to do the right thing themselves already. |
Oh fair! |
Ah, you're right! Good callout, I'll fix it |
Fixed with benchmarks! Thanks for calling this out, I'd gotten confused on what actually gets generated for these ones (which is also why I thought it was an easy one for users to do themselves) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Run GoPortable PreCommit |
commit 5064cc2 Author: Robert Burke <[email protected]> Date: Sat May 14 15:49:04 2022 -0700 [BEAM-14470] Use Generic Registrations in loadtests. (apache#17673) commit 780ad62 Author: Danny McCormick <[email protected]> Date: Fri May 13 19:04:58 2022 -0400 [BEAM-14371] (and BEAM-14372) - enable a couple staticchecks (apache#17670) commit 66e85da Author: Robert Bradshaw <[email protected]> Date: Fri May 13 15:00:42 2022 -0700 Add some auto-starting runners to the typescript SDK. (apache#17580) Adds out-of-the-box support for FlinkRunner, DataflowRunner, and the Python Universal Local runner. Also adds a DefaultRunner which chooses between the DirectRunner and the ULR depending on the properties of the pipeline. commit c110365 Author: Jack McCluskey <[email protected]> Date: Fri May 13 17:20:49 2022 -0400 [BEAM-14469] Allow nil primary returns from TrySplit in a single-window context (apache#17667) commit b774133 Author: Ryan Thompson <[email protected]> Date: Fri May 13 15:28:22 2022 -0400 [BEAM-14014] Support impersonation credentials in dataflow runner (apache#17244) commit 04f4984 Merge: 0a2aed7 26b16d0 Author: Heejong Lee <[email protected]> Date: Fri May 13 11:37:31 2022 -0700 Merge pull request apache#17605 from ihji/BEAM-14455 [BEAM-14455] Add UUID to sub-schemas for PythonExternalTransform commit 0a2aed7 Author: Marco Robles <[email protected]> Date: Fri May 13 12:50:53 2022 -0500 Merge pull request apache#17365 from [BEAM-12482] Update Schema Destination during Bigquery load job when using temporary tables using zeroloadjob * Added a zero row job to bigquery writeTable * added unit test * add sdf for update schema destination * add loadjob with inputstreamcontent parameter * add updateschemadest with zeroloadjob before copying to dest * add test for write temp tables * add nullness supress warning * remove unnecesary variable and use global variable for updateschemadestination-dofn Co-authored-by: Miguel Anzo <[email protected]> commit 2d57753 Merge: 9085345 2d36feb Author: Heejong Lee <[email protected]> Date: Fri May 13 10:29:55 2022 -0700 Merge pull request apache#17608 from ihji/BEAM-14430 [BEAM-14430] Adding a logical type support for Python callables to Row schema commit 9085345 Merge: 2b7aab0 b8d78b7 Author: Robert Bradshaw <[email protected]> Date: Fri May 13 09:23:25 2022 -0700 Merge pull request apache#17653 Revert "Better test assertion." commit 2b7aab0 Author: Jack McCluskey <[email protected]> Date: Fri May 13 12:16:31 2022 -0400 [BEAM-14465] Reduce DefaultS3ClientBuilderFactory logging to debug level (apache#17645) commit 4ce6056 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri May 13 09:14:26 2022 -0700 Bump github.com/spf13/cobra from 1.3.0 to 1.4.0 in /sdks (apache#17647) Bumps [github.com/spf13/cobra](https://github.com/spf13/cobra) from 1.3.0 to 1.4.0. - [Release notes](https://github.com/spf13/cobra/releases) - [Changelog](https://github.com/spf13/cobra/blob/v1.4.0/CHANGELOG.md) - [Commits](spf13/cobra@v1.3.0...v1.4.0) --- updated-dependencies: - dependency-name: github.com/spf13/cobra dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 787479f Author: Brian Hulette <[email protected]> Date: Thu May 12 23:56:21 2022 -0700 Drop dataclasses requirement, we only support python 3.7+ (apache#17640) commit b8d78b7 Author: tvalentyn <[email protected]> Date: Fri May 13 08:54:11 2022 +0200 Revert "Better test assertion. (apache#17551)" This reverts commit dc20185. commit 969f5b8 Author: Danny McCormick <[email protected]> Date: Thu May 12 22:12:53 2022 -0400 [BEAM-14347] Add function for simple function registration (apache#17650) commit 2d36feb Author: Heejong Lee <[email protected]> Date: Thu May 12 18:02:09 2022 -0700 add comment commit 4ac6820 Merge: 1c476e7 6c18bcf Author: Ahmet Altay <[email protected]> Date: Thu May 12 16:34:49 2022 -0700 Merge pull request apache#17631 from chamikaramj/update_release_md commit 2fce769 Author: Heejong Lee <[email protected]> Date: Thu May 12 16:12:46 2022 -0700 put a default type hint for PythonCallableSource commit 1c476e7 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu May 12 15:25:59 2022 -0700 Bump cloud.google.com/go/bigquery from 1.28.0 to 1.32.0 in /sdks (apache#17625) Bumps [cloud.google.com/go/bigquery](https://github.com/googleapis/google-cloud-go) from 1.28.0 to 1.32.0. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@spanner/v1.28.0...spanner/v1.32.0) --- updated-dependencies: - dependency-name: cloud.google.com/go/bigquery dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 514e325 Author: Heejong Lee <[email protected]> Date: Thu May 12 15:10:50 2022 -0700 add micros_instant urn commit 792baa2 Author: Heejong Lee <[email protected]> Date: Thu May 12 14:46:24 2022 -0700 move logical types def commit 704ecc6 Author: bullet03 <[email protected]> Date: Fri May 13 02:48:41 2022 +0600 Merge pull request apache#17408 from [BEAM-14312] [Website] change section order, move socials to footer * [BEAM-14312] [Website] change section order, move socials to footer * [BEAM-14312] [Website] change margin * [BEAM-14312] [Website] change link * [BEAM-14312] [Website] delete section, change title and links of the pillars * [BEAM-14312] [Website] add pipelines section * [BEAM-14312] [Website] change image in graphic section * [BEAM-14312] [Website] delete width of 95%, change flink link * [BEAM-14312] [Website] move css style, delete unused html, optimized logos size * [BEAM-14312] [Website] substitute images, fix css styles for pipelines and logos sections * [BEAM-14312] [Website] add kinesis runner * [BEAM-14312] [Website] add kinesis runner, add margin-right to pipelines-logos * [BEAM-14312] [Website] change amazon kinesis logo * [BEAM-14312] change structure of logos in the footer * [BEAM-14312] add license header to linkedin.svg * [BEAM-14312] change margin in quote section commit c36f544 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu May 12 13:44:23 2022 -0700 Bump cloud.google.com/go/pubsub from 1.18.0 to 1.21.1 in /sdks (apache#17646) Bumps [cloud.google.com/go/pubsub](https://github.com/googleapis/google-cloud-go) from 1.18.0 to 1.21.1. - [Release notes](https://github.com/googleapis/google-cloud-go/releases) - [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md) - [Commits](googleapis/google-cloud-go@pubsub/v1.18.0...pubsub/v1.21.1) --- updated-dependencies: - dependency-name: cloud.google.com/go/pubsub dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 6680261 Author: Pablo <[email protected]> Date: Thu May 12 16:38:55 2022 -0400 Merge pull request apache#17584 from [BEAM-14415] Exception handling tests and logging for partial failure BQIO * [BEAM-14415] Exception handling tests and logging for partial failures in BQ IO * fix DLQ integration test * fix lint * fix postcommit * fix formatter * Fixing tests and adding test info * fix skipping tests commit 3cca763 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu May 12 13:06:09 2022 -0700 Bump github.com/lib/pq from 1.10.4 to 1.10.5 in /sdks (apache#17626) Bumps [github.com/lib/pq](https://github.com/lib/pq) from 1.10.4 to 1.10.5. - [Release notes](https://github.com/lib/pq/releases) - [Commits](lib/pq@v1.10.4...v1.10.5) --- updated-dependencies: - dependency-name: github.com/lib/pq dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit cc571a3 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu May 12 13:05:11 2022 -0700 Bump github.com/testcontainers/testcontainers-go in /sdks (apache#17627) Bumps [github.com/testcontainers/testcontainers-go](https://github.com/testcontainers/testcontainers-go) from 0.12.0 to 0.13.0. - [Release notes](https://github.com/testcontainers/testcontainers-go/releases) - [Commits](testcontainers/testcontainers-go@v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: github.com/testcontainers/testcontainers-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 59212f3 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu May 12 13:04:42 2022 -0700 Bump github.com/google/go-cmp from 0.5.7 to 0.5.8 in /sdks (apache#17628) Bumps [github.com/google/go-cmp](https://github.com/google/go-cmp) from 0.5.7 to 0.5.8. - [Release notes](https://github.com/google/go-cmp/releases) - [Commits](google/go-cmp@v0.5.7...v0.5.8) --- updated-dependencies: - dependency-name: github.com/google/go-cmp dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit dc20185 Author: Robert Bradshaw <[email protected]> Date: Thu May 12 13:02:14 2022 -0700 Better test assertion. (apache#17551) commit 5bb50d1 Author: Danny McCormick <[email protected]> Date: Thu May 12 16:00:55 2022 -0400 [BEAM-14347] Add generic registration feature to CHANGES (apache#17643) commit d5df315 Merge: ec85d28 7c23d52 Author: Aizhamal Nurmamat kyzy <[email protected]> Date: Thu May 12 10:13:12 2022 -0700 Merge pull request apache#17588 from damccorm/users/damccorm/issueTemplates [BEAM-14442] Add GitHub issue templates commit ec85d28 Author: Yichi Zhang <[email protected]> Date: Thu May 12 10:06:16 2022 -0700 Revert "[BEAM-14429] Force java load test on dataflow runner v2 forceNumInitialBundles to 1 (apache#17576)" (apache#17609) This reverts commit 178441d. commit fd61a90 Author: Danny McCormick <[email protected]> Date: Thu May 12 10:37:27 2022 -0400 Trigger go precommits on go mod/sum changes (apache#17636) commit 7c23d52 Author: Danny McCormick <[email protected]> Date: Thu May 12 09:30:51 2022 -0400 Pare down to fewer templates commit 30f1a0c Author: Robert Burke <[email protected]> Date: Thu May 12 06:17:35 2022 -0700 Typo & link update (apache#17633) commit a6ee885 Author: Moritz Mack <[email protected]> Date: Thu May 12 15:08:19 2022 +0200 [BEAM-14334] Fix leakage of SparkContext in Spark runner tests to remove forkEvery 1 (apache#17406) * [BEAM-14334] Fix leakage of SparkContext in Spark runner tests to remove forkEvery 1 and set provided SparkContext via SparkContextFactory to avoid losing it during a serde roundtrip in TestPipenline. commit e2b2966 Author: Heejong Lee <[email protected]> Date: Wed May 11 22:29:12 2022 -0700 fix lint errors commit 6797787 Author: Heejong Lee <[email protected]> Date: Wed May 11 20:34:20 2022 -0700 add urn, type inference for PythonCallableSource commit a167424 Author: Kiley Sok <[email protected]> Date: Wed May 11 14:48:57 2022 -0700 [BEAM-13695] Add jamm jvm options to Java 11 (apache#17178) * Add jamm jvm options to Java 11 * fix test comments commit 6ee09b6 Author: Danny McCormick <[email protected]> Date: Wed May 11 17:16:50 2022 -0400 Correctly route go dependency changes to go label (apache#17632) commit 6c18bcf Author: Chamikara Jayalath <[email protected]> Date: Wed May 11 13:33:08 2022 -0700 Updates CHANGES.md to include some recently discovered known issues commit 59cdb58 Author: Danny McCormick <[email protected]> Date: Wed May 11 15:59:21 2022 -0400 [BEAM-14347] Add some benchmarks for generic registration (apache#17613) commit 559877a Author: Danny McCormick <[email protected]> Date: Wed May 11 15:57:56 2022 -0400 Cut p4 commit b215351 Author: Danny McCormick <[email protected]> Date: Wed May 11 15:48:56 2022 -0400 We don't need outage commit 51f8dc8 Author: Danny McCormick <[email protected]> Date: Wed May 11 15:45:03 2022 -0400 Ask for beam version + other dependencies commit f81119a Author: Yichi Zhang <[email protected]> Date: Wed May 11 12:08:45 2022 -0700 Remove python 3.6 postcommit from mass_comment.py (apache#17630) commit d6df306 Merge: fbc495c 86b68aa Author: Kenn Knowles <[email protected]> Date: Wed May 11 11:46:04 2022 -0700 Merge pull request apache#17519: [BEAM-14096] bump junit-quickcheck to 1.0 commit fbc495c Author: Danny McCormick <[email protected]> Date: Wed May 11 14:09:39 2022 -0400 [BEAM-12526] Add Dependabot (apache#17563) * Create dependabot.yml * Apache header * See if dependabot will parse now * Explanatory comment * Reduce duplication * Bug + style from feedback * Change format to make dependabot happy commit f30e4e5 Author: Igor Krasavin <[email protected]> Date: Wed May 11 19:23:23 2022 +0300 [BEAM-14081] [CdapIO] Add context classes for CDAP plugins (apache#17104) * [BEAM-14048] Add ConfigWrapper for building CDAP PluginConfigs * [BEAM-14048] Fix checkstyle * [BEAM-14048] Fix warnings * [BEAM-14048] Fix warnings * [BEAM-14048] Fix warning * [BEAM-14048] Fix warning * [BEAM-14048] Remove unused dependencies * [BEAM-14048] Add needed dependencies * [BEAM-14048] Fix spotless * [BEAM-14048] Fix typo * [BEAM-14048] Use fori instead of stream * [BEAM-14048] Suppress warning * [BEAM-14048] Add used undeclared artifacts * [BEAM-14048] Change dependencies to test * Add context. * Fix dependencies issue * Add null annotation * [BEAM-14048] Refactoring * Add SuppressWarning. * Fix style. * Determine dependencies. * [BEAM-14048] Use CDAP InstantiatorFactory for creating config objects * [BEAM-14048] Suppress warning * [BEAM-14081] Refactoring * Update maven repo * Update build.gradle * [BEAM-14081] Refactoring * [BEAM-14048] Use ServiceNow CDAP dependency from Maven central * [BEAM-14048] Set macroFields * [BEAM-14081] Fix javadoc * [BEAM-14081] Make BatchContextImpl class abstract Co-authored-by: vitaly.terentyev <[email protected]> Co-authored-by: Alex Kosolapov <[email protected]> Co-authored-by: Elizaveta Lomteva <[email protected]> Co-authored-by: Elizaveta Lomteva <[email protected]> commit 43cc865 Author: Jack McCluskey <[email protected]> Date: Wed May 11 11:58:53 2022 -0400 [BEAM-11104] Add self-checkpointing to CHANGES.md (apache#17612) commit 0f38c82 Author: tvalentyn <[email protected]> Date: Wed May 11 15:58:59 2022 +0200 [BEAM-14396] Bump httplib2 upper bound. (apache#17602) commit a181053 Author: andoni-guzman <[email protected]> Date: Wed May 11 08:29:32 2022 -0500 [BEAM-5492] Python Dataflow integration tests should export the pipeline console output to Jenkins Test Result section (apache#17530) commit 5a5e51e Author: Heejong Lee <[email protected]> Date: Thu May 5 20:26:31 2022 -0700 [BEAM-14430] Adding a logical type support for Python callables to Row schema commit 26b16d0 Author: Heejong Lee <[email protected]> Date: Tue May 10 14:05:50 2022 -0700 [BEAM-14455] Add UUID to sub-schemas for PythonExternalTransform commit d181445 Author: Danny McCormick <[email protected]> Date: Mon May 9 11:19:30 2022 -0400 [BEAM-14441] Add GitHub issue templates commit 86b68aa Author: masahitojp <[email protected]> Date: Sun May 1 18:52:09 2022 +0900 [BEAM-14096] bump junit-quickcheck to 1.0
This adds a basic function for non-structural DoFn registration. This doesn't actually need to be generic, and it doesn't have to be aware of its parameters (we could just accept an interface). I chose to do it this way because (1) it gives us greater type safety and (2) it matches the other registration functions
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.