Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNSCollector with AFPacket Sniffer Hangs on Stop #831

Closed
BarboraAni opened this issue Oct 7, 2024 · 6 comments
Closed

DNSCollector with AFPacket Sniffer Hangs on Stop #831

BarboraAni opened this issue Oct 7, 2024 · 6 comments
Labels
bug Something isn't working waiting feedback
Milestone

Comments

@BarboraAni
Copy link

BarboraAni commented Oct 7, 2024

Describe the bug
We are currently using dnscollector version 1.0.0 with the afpacket sniffer. The issue arises when attempting to stop or restart the dnscollector service. The process does not shut down gracefully. Instead, it seems to hang indefinitely until the timeout is reached, at which point systemd forcefully kills the process. Could this be due to some of the system resources not being released?

INFO: 2024/10/07 14:23:32.077692 worker - [afpacket] afpacket sniffer - stop to listen...

This problem does not occur if there is no incoming traffic on the interface.
To Reproduce

  • Start dnscollector service.
  • Ensure there is incoming traffic on the interface.
  • Attempt to stop or restart the service.

Expected behavior
The service should stop or restart gracefully.

@dmachard
Copy link
Owner

dmachard commented Oct 7, 2024

Thanks to report that!
DNScollector stucks each time ? I tried to reproduced in my side but without success.

Could you share

  • your complete configuration file
  • the full logs ouputs, not just one line.
  • a network dump of your DNS traffic when the stunk occurred

@BarboraAni
Copy link
Author

BarboraAni commented Oct 8, 2024

Unfortunately, I cannot provide a network dump of the DNS traffic due to security concerns.

Below is the configuration file:

global:
  telemetry:
    enabled: true
    prometheus-prefix: dnscollector
    web-listen: :9165
    web-path: /metrics
  text-format: timestamp-rfc3339ns operation rcode queryip queryport family protocol
    length qname qtype latency
  text-format-boundary: '"'
  text-format-delimiter: ' '
  trace:
    filename: /var/log/szn-go-dnscollector/szn-go-dnscollector.log
    log-malformed: false
    max-backups: 10
    max-size: 5
    verbose: true

pipelines:
  - afpacket-sniffer:
      device: enp3s0
    name: afpacket
    routing-policy:
      dropped: []
      forward:
      - prometheus
      - elastic
    transforms:
      normalize:
        qname-lowercase: true
  - elasticsearch:
      flush-interval: 10
      index: dnscollector-staging
      server: https://xxx:9200
    name: elastic
    transforms:
      filtering:
        drop-queryip-file: /etc/szn-go-dnscollector/drop_queryip
  - name: prometheus
    prometheus:
      basic-auth-enable: false
      default-domains-cache-size: 100000
      domains-cache-size: 100000
      histogram-metrics-enabled: true
      listen-ip: 0.0.0.0
      listen-port: 8181
      noerror-domains-cache-size: 100000
      nonexistent-domains-cache-size: 100000
      prometheus-labels:
      - stream_id
      servfail-domains-cache-size: 100000

I wouldn't say it hangs every time, but approximately 1 out of every 5 restarts, it seems to get stuck. This also appears to affect the Prometheus worker.

WARNING: 2024/10/08 12:39:46.587311 main - exiting...
INFO: 2024/10/08 12:39:46.587413 main - telemetry is stopping
INFO: 2024/10/08 12:39:46.587526 worker - [afpacket] afpacket sniffer - stopping monitor...
INFO: 2024/10/08 12:39:46.587544 worker - [afpacket] afpacket sniffer - monitor terminated
INFO: 2024/10/08 12:39:46.587553 worker - [afpacket] afpacket sniffer - stopping collect...
INFO: 2024/10/08 12:39:46.587560 worker - [afpacket] afpacket sniffer - stop to listen...
INFO: 2024/10/08 12:39:47.290366 worker - [afpacket] afpacket sniffer - stopping sniffer...
INFO: 2024/10/08 12:39:47.295623 worker - [afpacket] dns processor - stopping monitor...
INFO: 2024/10/08 12:39:47.295659 worker - [afpacket] dns processor - monitor terminated
INFO: 2024/10/08 12:39:47.295668 worker - [afpacket] dns processor - stopping collect...
INFO: 2024/10/08 12:39:47.295681 worker - [afpacket] dns processor - collection terminated
INFO: 2024/10/08 12:39:47.295696 worker - [afpacket] afpacket sniffer - read data terminated
INFO: 2024/10/08 12:39:47.295722 worker - [afpacket] afpacket sniffer - collection terminated
INFO: 2024/10/08 12:39:47.295739 worker - [elastic] elasticsearch - stopping monitor...
INFO: 2024/10/08 12:39:47.295753 worker - [elastic] elasticsearch - monitor terminated
INFO: 2024/10/08 12:39:47.295763 worker - [elastic] elasticsearch - stopping collect...
INFO: 2024/10/08 12:39:47.295776 worker - [elastic] elasticsearch - logging terminated
INFO: 2024/10/08 12:39:47.295783 worker - [elastic] elasticsearch - collection terminated
INFO: 2024/10/08 12:39:47.295796 worker - [prometheus] prometheus - stopping monitor...
WARNING: 2024/10/08 12:44:30.205475 main - exiting...
INFO: 2024/10/08 12:44:30.205501 main - telemetry is stopping
INFO: 2024/10/08 12:44:30.205591 worker - [afpacket] afpacket sniffer - stopping monitor...
INFO: 2024/10/08 12:44:30.205602 worker - [afpacket] afpacket sniffer - monitor terminated
INFO: 2024/10/08 12:44:30.205608 worker - [afpacket] afpacket sniffer - stopping collect...
INFO: 2024/10/08 12:44:30.205613 worker - [afpacket] afpacket sniffer - stop to listen...

From there, it just hangs until systemd forcefully kills it.

@dmachard dmachard added bug Something isn't working needs more investigation labels Oct 9, 2024
@dmachard
Copy link
Owner

dmachard commented Oct 10, 2024

Just to kept you informed, it's reproduced in my side.

@dmachard
Copy link
Owner

A deadlock has been identified on stop between 2 goroutines.
It should be fixed with the v1.1.0-beta3

@dmachard dmachard added this to the v1.1.0 milestone Oct 11, 2024
@dmachard
Copy link
Owner

Any feedback will be appreciated

@BarboraAni
Copy link
Author

I have tested the new DNSCollector version, and it now restarts normally without any hangs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting feedback
Projects
None yet
Development

No branches or pull requests

2 participants