Kafka metrics #32402

Naireen · 2024-09-05T19:30:02Z

Add per worker metric for Kafka poll latency. This is specifically for Dataflow V1 runner.

Next steps would be to extend this to add latency for other RPC calls.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

github-actions · 2024-09-17T04:07:29Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

Naireen · 2024-09-17T05:08:57Z

R: @scwhittle

github-actions · 2024-09-17T05:10:11Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

scwhittle

Can you clarify in PR description that this is for Dataflow V1 runner?

Adding John as he is more familiar with BeamIO in general and may have suggestions on monitoring.

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java

.../java/org/apache/beam/runners/dataflow/worker/MetricsToPerStepNamespaceMetricsConverter.java

...w-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/streaming/StageInfo.java

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaMetrics.java

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java

[Dataflow Streaming] Use isolated windmill streams based on job settings (apache#32503)

Naireen · 2024-09-27T19:05:55Z

Run Java PreCommit

passed

sjvanrossum · 2024-10-24T14:38:06Z

@Naireen @johnjcasey The changes introduced in this PR assume that a single KafkaUnboundedSource will be assigned a single Kafka topic. As far as I'm aware that is not what the transform allows users to configure, withTopics(), withTopicPartitions() and withTopicPattern() permit multiple topics. If the number of splits is less than the number of topic partitions, then a single split may end up with an assignment consisting of partitions from multiple topics.

Abacn · 2024-12-06T00:33:17Z

runners/google-cloud-dataflow-java/worker/build.gradle

@@ -54,6 +54,7 @@ def sdk_provided_project_dependencies = [
        ":runners:google-cloud-dataflow-java",
        ":sdks:java:extensions:avro",
        ":sdks:java:extensions:google-cloud-platform-core",
+        ":sdks:java:io:kafka", // For metric propagation into worker


This is bad code coupling. Fetching static name isn't sufficient justification to introduce this mandatory dependency to Dataflow worker jar. This also makes confluent repository mandatory for user project.

Abacn · 2024-12-06T00:35:17Z

...va/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java

@@ -668,6 +669,10 @@ public static void main(String[] args) throws Exception {
      enableBigQueryMetrics();
    }

+    if (DataflowRunner.hasExperiment(options, "enable_kafka_metrics")) {
+      KafkaSinkMetrics.setSupportKafkaMetrics(true);


Instead of setting IO flags in worker, one can utilize the JvmInitializer.runBeforeProcessing mechanism that is introduce a JvmInitializer implementation in org.apache.beam.sdk.io.kafka to initialize the flags on worker. This avoids the need of kafka dependency

github-actions bot added java io runners dataflow kafka labels Sep 5, 2024

Naireen force-pushed the kafka_metrics branch 16 times, most recently from a1ab6c0 to 627ad7c Compare September 17, 2024 00:05

Naireen marked this pull request as ready for review September 17, 2024 04:04

scwhittle requested a review from johnjcasey September 18, 2024 09:04

scwhittle requested changes Sep 18, 2024

View reviewed changes

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java Outdated Show resolved Hide resolved

sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaUnboundedReader.java Outdated Show resolved Hide resolved

johnjcasey reviewed Sep 26, 2024

View reviewed changes

github-actions bot added python go labels Sep 26, 2024

Naireen added 3 commits September 26, 2024 22:25

Add kafka poll latency metrics

129094b

Address Sam's comments

c03a6e3

[Dataflow Streaming] Use isolated windmill streams based on job settings (apache#32503)

Add kafka poll latency metrics

3f9d233

Naireen force-pushed the kafka_metrics branch from ba9250c to dc0900a Compare September 26, 2024 22:26

github-actions bot removed python go build website infra examples extensions learning gcp jms mqtt yaml bigtable labels Sep 26, 2024

Naireen force-pushed the kafka_metrics branch from dc0900a to 0929e39 Compare September 26, 2024 22:33

address comments

d90b748

Naireen force-pushed the kafka_metrics branch from 0929e39 to d90b748 Compare September 26, 2024 22:34

Naireen mentioned this pull request Oct 3, 2024

Kafka poll interval #32162

Open

3 tasks

johnjcasey approved these changes Oct 8, 2024

View reviewed changes

Ensure this is disabled for now until flag to enable it is explicitly

9b7983b

passed

Naireen force-pushed the kafka_metrics branch from a842623 to 9b7983b Compare October 9, 2024 20:38

johnjcasey merged commit 0ee13b2 into apache:master Oct 23, 2024
24 checks passed

Abacn reviewed Dec 6, 2024

View reviewed changes

Abacn mentioned this pull request Dec 6, 2024

Remove mandatory beam-sdks-io-kafka dependency for dataflow worker jar #33302

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka metrics #32402

Kafka metrics #32402

Naireen commented Sep 5, 2024 •

edited

Loading

github-actions bot commented Sep 17, 2024

Naireen commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

scwhittle left a comment

Naireen commented Sep 27, 2024

sjvanrossum commented Oct 24, 2024

Abacn Dec 6, 2024

Abacn Dec 6, 2024

Kafka metrics #32402

Kafka metrics #32402

Conversation

Naireen commented Sep 5, 2024 • edited Loading

GitHub Actions Tests Status (on master branch)

github-actions bot commented Sep 17, 2024

Naireen commented Sep 17, 2024

github-actions bot commented Sep 17, 2024

scwhittle left a comment

Choose a reason for hiding this comment

Naireen commented Sep 27, 2024

sjvanrossum commented Oct 24, 2024

Abacn Dec 6, 2024

Choose a reason for hiding this comment

Abacn Dec 6, 2024

Choose a reason for hiding this comment

Naireen commented Sep 5, 2024 •

edited

Loading