Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: with_metadata not working in Python SDK ReadFromKafka #28200

Closed
2 of 15 tasks
prasanthkn83 opened this issue Aug 29, 2023 · 5 comments
Closed
2 of 15 tasks

[Bug]: with_metadata not working in Python SDK ReadFromKafka #28200

prasanthkn83 opened this issue Aug 29, 2023 · 5 comments

Comments

@prasanthkn83
Copy link

What happened?

Here's the error generated when with_metadata is set as True in ReadFromKafka. It seems the SchemaCoder is not matching with KafkaConsumer record which has an extra timestamp_type at position 4 which is missing in the below SchemaCoder. Please check.

generic::unknown: org.apache.beam.sdk.util.UserCodeException: java.lang.IllegalArgumentException: Unable to encode element 'org.apache.beam.sdk.io.kafka.KafkaIO$ByteArrayKafkaRecord@45b2487a' with coder 'SchemaCoder<Schema: Fields: Field{name=topic, description=, type=STRING NOT NULL, options={{}}} Field{name=partition, description=, type=INT32 NOT NULL, options={{}}} Field{name=offset, description=, type=INT64 NOT NULL, options={{}}} Field{name=timestamp, description=, type=INT64 NOT NULL, options={{}}} Field{name=key, description=, type=BYTES, options={{}}} Field{name=value, description=, type=BYTES, options={{}}} Field{name=headers, description=, type=ARRAY<ROW<key STRING NOT NULL, value BYTES NOT NULL> NOT NULL>, options={{}}} Field{name=timestampTypeId, description=, type=INT32 NOT NULL, options={{}}} Field{name=timestampTypeName, description=, type=STRING NOT NULL, options={{}}} Encoding positions: {headers=6, timestampTypeName=8, partition=1, offset=2, topic=0, value=5, key=4, timestamp=3, timestampTypeId=7}

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@shiftaltdelete23
Copy link

shiftaltdelete23 commented Oct 26, 2023

This is also happening to the java sdk

I'm using the 2.51.0 Java SDK and Google Cloud Dataflow Runner. What was observer is when there is a metadata header being sent with a KafkaMessage from any producer, this error is coming up. We have resolved the issue by omitting metadata kafka headers from Producers

java.lang.IllegalArgumentException: Unable to encode element 'ValueWithRecordId{id=[], value=org.apache.beam.sdk.io.kafka.KafkaRecord@d6357d6b}' with coder 'ValueWithRecordId$ValueWithRecordIdCoder(KafkaRecordCoder(NullableCoder(StringUtf8Coder),NullableCoder(StringUtf8Coder)))'.
at org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:295)
at org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:286)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:642)
at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:558)
at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:384)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:128)
at org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:67)
at org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:218)
at org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:169)
at org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:83)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1404)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$800(StreamingDataflowWorker.java:154)
at org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$4.run(StreamingDataflowWorker.java:1044)
at org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor.lambda$executeLockHeld$0(BoundedQueueExecutor.java:133)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null byte[]
at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:63)
at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:73)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:63)
at org.apache.beam.sdk.coders.KvCoder.encode(KvCoder.java:37)
at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:111)
at org.apache.beam.sdk.coders.IterableLikeCoder.encode(IterableLikeCoder.java:58)
at org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:66)
at org.apache.beam.sdk.io.kafka.KafkaRecordCoder.encode(KafkaRecordCoder.java:41)
at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:106)
at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:100)
at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:82)
at org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:292)
... 17 more

@liferoad liferoad added the java label Oct 27, 2023
@johnjcasey
Copy link
Contributor

I think I found the issue with the kafka record coder. It looks like you have a header with a null byte value as a value, which is breaking the encoder. I'll enable that to be nullable

@prasanthkn83
Copy link
Author

Thank you @johnjcasey Do you know when these changes will be moved to the main branch ?

@johnjcasey
Copy link
Contributor

As soon as the tests pass, I'll merge them in

@johnjcasey
Copy link
Contributor

The fix has been merged

@github-actions github-actions bot added this to the 2.52.0 Release milestone Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants