-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YAML] Implement basic java mapping operations. #28657
Conversation
R: @Polber |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
|
||
@Nullable | ||
@SuppressWarnings("all") | ||
public abstract ErrorHandling getError_handling(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for naming these with underscores when they can be remapped by the YAML provider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I can go ahead and fix this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be a UDF file test? testNamedFunctionUdf
and testClassMethodUdf
cover the name
parameter, but what about the path
?
Also, should this functionality (specifying only name
) be included in the python MapToFields
logic? It could essentially replace PyTransform
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a full jar seemed a bit out-of-scope for a unit test (and that's not where the complexity was). IMHO that could wait until we have the integration testing framework.
As for using name for Python, this is essentially the equivalent to "callable" where you can pass a fully-qualified python name (as functions and classes are first class objects in Python, unlike Java). (Also, in Java, even if the file is specified, one still needs the fully qualified name.)
The Python failures are irrelevant. |
private static FunctionAndType createFunctionFromName(String name, String path) | ||
throws ReflectiveOperationException, MalformedURLException { | ||
ClassLoader classLoader = | ||
path == null | ||
? ClassLoader.getSystemClassLoader() | ||
: new URLClassLoader( | ||
new URL[] {new URL("file://" + path)}, ClassLoader.getSystemClassLoader()); | ||
String className, methodName = null; | ||
if (name.indexOf("::") == -1) { | ||
className = name; | ||
methodName = null; | ||
} else { | ||
String[] parts = name.split("::", 2); | ||
className = parts[0]; | ||
methodName = parts[1]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertwb Shouldn't there be support for reading a file from GCS in this logic by 2.52?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that as as critical for 2.52 (v.s. the basic functionality of supporting expressions), but we could try to get that in as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern would be if a user specified a GCS path in their pipeline which uses the python MapToFields, but then updates the pipeline to use more Java transforms, causing the MapToFields to switch to the Java implementation.
Is there a way to force an SDK implementation for a transform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the language is explicitly specified; there's no automatic switching between the two. (E.g. a python script is not interchangeable with a java jar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and implemented this. Ideally it'd use the artifact service, but we don't yet have a way to declare that from a transform.
Codecov Report
@@ Coverage Diff @@
## master #28657 +/- ##
==========================================
- Coverage 38.37% 38.32% -0.05%
==========================================
Files 686 688 +2
Lines 101685 101844 +159
==========================================
+ Hits 39017 39033 +16
- Misses 61088 61231 +143
Partials 1580 1580
Flags with carried forward coverage won't be shown. Click here to find out more. see 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Run Java PreCommit |
This will allow one to use the
MapToFields
transform with java expressions (callables, etc.)This hasn't been wired up to Python yet, but is a good intermediate milestone. We'll also want to follow up with similar implementations for
Filter
(which can borrow the same UDF logic) andExplode
(no UDFs involved).Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.