-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DECIMAL logical type in python SDK #23014
Conversation
3ce2ce8
to
801a3cd
Compare
Codecov Report
@@ Coverage Diff @@
## master #23014 +/- ##
==========================================
- Coverage 73.58% 73.48% -0.11%
==========================================
Files 716 718 +2
Lines 95311 95692 +381
==========================================
+ Hits 70138 70316 +178
- Misses 23877 24080 +203
Partials 1296 1296
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Run Java PreCommit |
Run Python_PVR_Flink PreCommit |
Run Python 3.8 PostCommit |
Run Java PreCommit |
Assigning reviewers. If you would like to opt out of this review, comment R: @y1chi for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
* migrate FixedPrecisionNumeric jdbclogicaltype to portable logical type * Support DECIMAL type in python sdk xlang jdbc transform * Support standard logical type with argument in java sdk * proto support logical type's argument type in java sdk. Support of logical type's argument value is still pending * Implement BigIntegerCoder and DecimalCoder in python sdk
b897925
to
5158b36
Compare
Run Python 3.8 PostCommit |
Run Java PreCommit |
Java PreCommit build stuck twice at Task :sdks:java:core:buildNeeded. Also occasionally happens on master (e.g. https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/5900/, https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/5896/console) should be irrelevant. Read to review. |
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Abacn, and sorry about the delay. I have a few suggestions
@@ -65,7 +67,7 @@ | |||
}) | |||
public class SchemaTranslation { | |||
|
|||
private static final String URN_BEAM_LOGICAL_DECIMAL = "beam:logical_type:decimal:v1"; | |||
private static final String URN_BEAM_LOGICAL_DECIMAL = FixedPrecisionNumeric.BASE_IDENTIFIER; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add this in schema.proto and reference that here? Then we can document the format in schema.proto as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added "beam:logical_type:decimal:v1" into schema.proto
(base == null) ? null : base.precision(), | ||
scale, | ||
(base == null) ? null : base.scale()); | ||
return base; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in this class is a little bit complicated, what do you think about adding a unit test just for this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is just moving org.apache.beam.sdk.io.jdbc.LogicalTypes.FixedPrecisionNumeric class to a separate file under schemas/logicaltypes. Yes will do add unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry about that, thank you
.../java/core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/FixedPrecisionNumeric.java
Show resolved
Hide resolved
public static final String IDENTIFIER = "beam:logical_type:fixed_decimal:v1"; | ||
|
||
// TODO(https://github.com/apache/beam/issues/19817) implement beam:logical_type:decimal:v1 as | ||
// CoderLogicalType (once CoderLogicalType is implemented). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is CoderLogicalType? I don't a reference to it in that issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah the link in TODO is not accurate. This is a suggestion I got from @reuvenlax. The reference was here: #7865 (comment). Should we create an issue for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entered #23374
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
logical_type.representation_type()))) | ||
else: | ||
# TODO(bhulette,yathu): Complete support for logical types that require | ||
# arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file an issue for this and link it here. Also note that the option_to_runner_api and option_from_runner_api has logic for working with FieldValues, we should re-use that when we take this on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entered #23373
"CREATE TABLE {}(f_int INTEGER, f_timestamp TIMESTAMP)".format( | ||
table_name)) | ||
"CREATE TABLE {}(f_int INTEGER, f_timestamp TIMESTAMP, f_decimal DECIMAL(10,2))" # pylint: disable=line-too-long | ||
.format(table_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test(s) for this in standard_coders.yaml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in standard_coders.yaml
Run Python 3.8 PostCommit |
501bf54
to
5f182d1
Compare
Run Python 3.8 PostCommit |
5f182d1
to
e849f1f
Compare
Run Python 3.8 PostCommit |
PostCommit test passed. Ready for review |
Run Java PreCommit |
(base == null) ? null : base.precision(), | ||
scale, | ||
(base == null) ? null : base.scale()); | ||
return base; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry about that, thank you
model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/schema.proto
Outdated
Show resolved
Hide resolved
public static final String IDENTIFIER = "beam:logical_type:fixed_decimal:v1"; | ||
|
||
// TODO(https://github.com/apache/beam/issues/19817) implement beam:logical_type:decimal:v1 as | ||
// CoderLogicalType (once CoderLogicalType is implemented). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
row = Row.withSchema(schema).addValues(decimal).build(); | ||
assertEquals(decimal, row.getLogicalTypeValue(0, BigDecimal.class)); | ||
|
||
decimal = BigDecimal.valueOf(random.nextLong() + 100_000_000_000L, scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm a little nervous that the random component can make this flaky. Can we just select some constants istead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I made a typo. These two cases were intended be (random.nextInt() + 100_000_000_000L) which guarantees to generate a positive number with at least 11 digits and beyond the precision of DECIMAL(precision=10). Yes we can just use 100_000_000_001L here
|
||
// check argument invalid case | ||
decimal = BigDecimal.valueOf(random.nextLong() + 100_000_000_000L, scale); | ||
assertThrows(IllegalArgumentException.class, Row.withSchema(schema).addValues(decimal)::build); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like how concise this is :)
IllegalArgumentException.class, | ||
() -> | ||
FixedPrecisionNumeric.of( | ||
Row.withSchema(invalidArgumentSchema).addValues(precision, scale).build())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think unit testing best practice would be to put each of these test cases in a different test method. On the other hand this pattern is consistent with the rest of the file, so I'm fine if you want to leave it as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I agree in general this is best practice, I observed the file LogicalTypesTest.java
has each test testing one logical type and keep the new test following this. If tests become more loaded we can split test source files.
|
||
@classmethod | ||
def _from_typing(cls, typ): | ||
print(typ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a debug statement that was left in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops thanks for the catch, will double check
|
||
/** Fixed precision numeric types used to represent jdbc NUMERIC and DECIMAL types. */ | ||
public class FixedPrecisionNumeric extends PassThroughLogicalType<BigDecimal> { | ||
public static final String IDENTIFIER = "beam:logical_type:fixed_decimal:v1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is being used cross-language can we define the URN in schema.proto and document it there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered to put it in schema.proto as beam:logical_type:decimal:v1
but the problem is that I do not know how to generate a test case of it in standard_coders.yaml yet because the logical types with argument is currently not fully supported. I would like to leave it now and make it into schema.proto until we are confident that this implementation is standardized(i.e. stable). Open to opinions for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that's fine with me.
…1/schema.proto fix incorrect documentation of Decimal logical type Co-authored-by: Brian Hulette <[email protected]>
go precommit fails due to
first time see go precommit flakes java precommit failed for another different test
should be irrelevant either |
Run Go PreCommit |
Run Java PreCommit |
New flakes seen on Java PreCommit and entered #23449 |
Run Java PreCommit |
Run Python 3.8 PostCommit |
1 similar comment
Run Python 3.8 PostCommit |
Run Java PreCommit |
Run Python 3.8 PostCommit |
Thanks for taking care of the tests @TheNeuralBit! Test reliability has been terrible. Last two Java PreCommit fails in different flakes (no test fails for both)
Python 3.8 PostCommit:
there was a success run (rare) of postcommit on the third commit: https://ci-beam.apache.org/job/beam_PostCommit_Python38_PR/618/ the last two commits of this PR are trivial and should not introduce surprise. We could wait for the running postcommit and give it one more chance :). Update: looks like too much post commit run hit the BigQuery quota. |
Run Python 3.8 PostCommit |
Ok, given the previous successful run I'm going to go ahead and merge. Thanks @Abacn! |
The second part of #19817 (keep the issue open for as go support is still pending).
This change makes decimal type work for xlang transforms (such as jdbc) in python sdk. It also contains prerequisite changes for the minimum support of a portable logical type with argument in both Java and Python sdk.
Python SDK still cannot get the argument value from proto because
SchemaTranslation.value_to_runner_api
method does not exist yet (it isSchemaTranslation.fieldValueToProto
in Java SDK). Nevertheless because the argument is just a wrapper, and the Decimal value can be recovered from the encoded data alone. This does not affect the decimal type.Because the change is already large, value_to_runner_api and the complete support of argument values in python is let as followup.
migrate FixedPrecisionNumeric jdbclogicaltype to portable
logical type
Support DECIMAL type in python sdk xlang jdbc transform
Support standard logical type with argument in java sdk
proto support logical type's argument type in java sdk.
Support of logical type's argument value is still pending
Implement BigIntegerCoder and DecimalCoder in python sdk
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.