Kafkaproducer: buffer is full messages after upgrading to v1.0.0-beta1 with round-robin partitioner for Kafka #800

plazmorez · 2024-09-09T16:34:56Z

Many thanks for your time and effort with go-dnscollector!

After upgrading the go-dnscollector to version v1.0.0-beta1 and enabling the round-robin partitioner for kafkaproducer, we started encountering messages indicating that the buffer is full.
However, when specifying a specific partition number, the buffer does not overflow, and the issue does not occur.

The average DNS load is around 1400 RPS.

Config:

global:
  trace:
    verbose: true

  server-identity: "dns-collector"
  text-format: "timestamp-rfc3339ns localtime identity operation rcode queryip protocol length qname qtype"

multiplexer:
  collectors:
    - name: tap
      dnstap:
        sock-path: /var/run/dnstap.sock
      transforms:
        normalize:
          qname-lowercase: true

  loggers:
    - name: prometheus
      prometheus:
       ...
       listen-port: 9998
       prometheus-prefix: "dnscollector"
       top-n: 10

    - name: kafkaproducer
      kafkaproducer:
        remote-address: host.fqdn
        remote-port: 9999
        connect-timeout: 5
        retry-interval: 10
        flush-interval: 30
        tls-support: false
        tls-insecure: false
        tls-min-version: 1.2
        sasl-support: true
        sasl-mechanism: SCRAM-SHA-512
        sasl-username: ""
        sasl-password: ""
        mode: json
        buffer-size: 1000
        topic: "topic"
        partition: null
        chan-buffer-size: 800000
        compression: "snappy"

  routes:
    - from: [ tap ]
      to: [ prometheus, kafkaproducer ]

Logs:

2024/09/09 17:58:47.544421 worker - [tap] (conn #1) dnstap processor - worker[kafkaproducer] buffer is full, 10279 dnsmessage(s) dropped
2024/09/09 17:58:37.543725 worker - [tap] (conn #1) dnstap processor - worker[kafkaproducer] buffer is full, 9123 dnsmessage(s) dropped
2024/09/09 17:58:27.543366 worker - [tap] (conn #1) dnstap processor - worker[kafkaproducer] buffer is full, 5072 dnsmessage(s) dropped
2024/09/09 17:31:27.456113 worker - [tap] dnstap - conn #1 - receiver framestream initialized
2024/09/09 17:31:27.455869 worker - [tap] (conn #1) transformers applied: [normalize:qname-lowercase]
2024/09/09 17:31:27.455773 worker - [tap] (conn #1) dnstap processor - starting data collection
2024/09/09 17:31:27.455736 worker - [tap] (conn #1) dnstap processor - starting monitoring - refresh every 10s
2024/09/09 17:31:27.455545 worker - [tap] (conn #1) dnstap processor - enabled
2024/09/09 17:31:27.455527 worker - [tap] dnstap - conn #1 - new connection from @ (@)
2024/09/09 17:31:27.445308 worker - [kafkaproducer] kafka - connected with success
2024/09/09 17:31:27.397325 worker - [prometheus] prometheus - is listening on :9998
2024/09/09 17:31:27.397284 worker - [prometheus] prometheus - starting http server...
2024/09/09 17:31:27.397272 worker - [prometheus] prometheus - logging has started
2024/09/09 17:31:27.397199 worker - [kafkaproducer] kafka - connecting to kafka=host.fqdn:9999 partition=all topic=topic.>
2024/09/09 17:31:27.397176 worker - [prometheus] prometheus - starting data collection
2024/09/09 17:31:27.397159 worker - [kafkaproducer] kafka - logging has started
2024/09/09 17:31:27.397122 worker - [tap] dnstap - listening on /var/run/dnstap.sock
2024/09/09 17:31:27.397059 worker - [kafkaproducer] kafka - starting data collection
2024/09/09 17:31:27.397042 worker - [tap] dnstap - starting data collection
2024/09/09 17:31:27.396841 worker - [tap] dnstap - starting monitoring - refresh every 10s
2024/09/09 17:31:27.396818 worker - [kafkaproducer] kafka - starting monitoring - refresh every 10s
2024/09/09 17:31:27.395413 main - routing: collector[tap] send to logger[kafkaproducer]
2024/09/09 17:31:27.395382 main - routing: collector[tap] send to logger[prometheus]
2024/09/09 17:31:27.380180 worker - [tap] dnstap - enabled
2024/09/09 17:31:27.379516 main - loading collectors...
2024/09/09 17:31:27.333355 worker - [kafkaproducer] kafka - enabled
2024/09/09 17:31:27.332822 worker - [prometheus] prometheus - starting monitoring - refresh every 10s
2024/09/09 17:31:27.331285 worker - [prometheus] prometheus - enabled
2024/09/09 17:31:27.330633 main - loading loggers...
2024/09/09 17:31:27.330630 main - The multiplexer mode is deprecated. Please switch to the pipelines mode.
2024/09/09 17:31:27.330627 main - running in multiplexer mode
2024/09/09 17:31:27.330566 main - telemetry enabled on local address: :9997
2024/09/09 17:31:27.330507 main - version=1.0.0-beta1 revision=f961d61d17e5207bf1755ae62e7a7d202baa2473

dmachard · 2024-09-09T18:20:23Z

Thank to report that.
I will try to reproduce in my side

uralmetal · 2024-09-11T07:52:31Z

@dmachard as an alternative i suggest solution in kafka-go library
I found ready solution RoundRobin
https://github.com/segmentio/kafka-go/blob/main/balancer.go#L41

dmachard · 2024-09-11T10:43:59Z

The current implementation logic is not good because round robin is performed on each packet to be sent.
I will take a look to use the implementation in the kafa-go library

dmachard · 2024-09-12T19:01:43Z

new beta release available, any feedback will be appreciated

dmachard · 2024-11-15T17:15:39Z

Fixed since 1.0.0. If you encounter any further issues, please feel free to open a new ticket. Thank you for your feedback!

dmachard added the needs more investigation label Sep 9, 2024

uralmetal mentioned this issue Sep 11, 2024

Support commpression and partitioner parameters for kafka logger #481

Open

dmachard mentioned this issue Sep 12, 2024

kafka: apply round robin on bach #809

Merged

dmachard mentioned this issue Sep 12, 2024

kafka producer: use the higher-level Writer abstraction from segmentio/kafka-go lib #808

Open

dmachard added bug Something isn't working and removed needs more investigation labels Sep 12, 2024

dmachard added this to the v1.0.0 milestone Sep 15, 2024

dmachard added the waiting feedback label Sep 21, 2024

dmachard closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafkaproducer: buffer is full messages after upgrading to v1.0.0-beta1 with round-robin partitioner for Kafka #800

Kafkaproducer: buffer is full messages after upgrading to v1.0.0-beta1 with round-robin partitioner for Kafka #800

plazmorez commented Sep 9, 2024

dmachard commented Sep 9, 2024

uralmetal commented Sep 11, 2024

dmachard commented Sep 11, 2024

dmachard commented Sep 12, 2024

dmachard commented Nov 15, 2024

Kafkaproducer: buffer is full messages after upgrading to v1.0.0-beta1 with round-robin partitioner for Kafka #800

Kafkaproducer: buffer is full messages after upgrading to v1.0.0-beta1 with round-robin partitioner for Kafka #800

Comments

plazmorez commented Sep 9, 2024

dmachard commented Sep 9, 2024

uralmetal commented Sep 11, 2024

dmachard commented Sep 11, 2024

dmachard commented Sep 12, 2024

dmachard commented Nov 15, 2024