-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: with_metadata not working in Python SDK ReadFromKafka #28200
Comments
This is also happening to the java sdk I'm using the 2.51.0 Java SDK and Google Cloud Dataflow Runner. What was observer is when there is a metadata header being sent with a KafkaMessage from any producer, this error is coming up. We have resolved the issue by omitting metadata kafka headers from Producers java.lang.IllegalArgumentException: Unable to encode element 'ValueWithRecordId{id=[], value=org.apache.beam.sdk.io.kafka.KafkaRecord@d6357d6b}' with coder 'ValueWithRecordId$ValueWithRecordIdCoder(KafkaRecordCoder(NullableCoder(StringUtf8Coder),NullableCoder(StringUtf8Coder)))'. |
I think I found the issue with the kafka record coder. It looks like you have a header with a null byte value as a value, which is breaking the encoder. I'll enable that to be nullable |
Thank you @johnjcasey Do you know when these changes will be moved to the main branch ? |
As soon as the tests pass, I'll merge them in |
The fix has been merged |
What happened?
Here's the error generated when with_metadata is set as True in ReadFromKafka. It seems the SchemaCoder is not matching with KafkaConsumer record which has an extra timestamp_type at position 4 which is missing in the below SchemaCoder. Please check.
generic::unknown: org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Unable to encode element 'org.apache.beam.sdk.io.kafka.KafkaIO$ByteArrayKafkaRecord@45b2487a' with coder 'SchemaCoder<Schema: Fields: Field{name=topic, description=, type=STRING NOT NULL, options={{}}} Field{name=partition, description=, type=INT32 NOT NULL, options={{}}} Field{name=offset, description=, type=INT64 NOT NULL, options={{}}} Field{name=timestamp, description=, type=INT64 NOT NULL, options={{}}} Field{name=key, description=, type=BYTES, options={{}}} Field{name=value, description=, type=BYTES, options={{}}} Field{name=headers, description=, type=ARRAY<ROW<key STRING NOT NULL, value BYTES NOT NULL> NOT NULL>, options={{}}} Field{name=timestampTypeId, description=, type=INT32 NOT NULL, options={{}}} Field{name=timestampTypeName, description=, type=STRING NOT NULL, options={{}}} Encoding positions: {headers=6, timestampTypeName=8, partition=1, offset=2, topic=0, value=5, key=4, timestamp=3, timestampTypeId=7}
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: