Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ReadFromKafkaViaSDF ignores configuration from redistribute and allow duplicates and always enables them #32196

Closed
1 of 17 tasks
scwhittle opened this issue Aug 15, 2024 · 3 comments

Comments

@scwhittle
Copy link
Contributor

What happened?

Introduced in 9cbdda1 so this is affecting the 2.58 beam release. The Dataflow v2 Runner uses this version for Kafka by default.

Since this can introduce duplicates and is unexpected, marking as P1 and considering to release a patched sdk version to address it.

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@scwhittle
Copy link
Contributor Author

scwhittle commented Aug 15, 2024

Possible work-arounds:

  • use a another SDK version than 2.58
  • disable the ReadFromKafkaViaSDF usage via experiment use_unbounded_sdf_wrapper
  • modify pipeline to use KafkaIO.readFromDescriptors instead of KafkaIO.read/KafkaIO.readBytes

@scwhittle
Copy link
Contributor Author

Upon further investigation, the incorrect options are set on the ReadSourceDescriptors transform and not used for determinining redistribute and allowed duplicates of the read elements as that uses the original kafkaRead here

The effect of the misconfiguration is if commitoffets is enabled it is not performed due to the logic here

@scwhittle
Copy link
Contributor Author

Fixed in 2.58.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant