Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file logger: buffer still full despite "chan-buffer-size" changes #636

Closed
romainw opened this issue Mar 5, 2024 · 13 comments
Closed

file logger: buffer still full despite "chan-buffer-size" changes #636

romainw opened this issue Mar 5, 2024 · 13 comments
Labels
question Further information is requested
Milestone

Comments

@romainw
Copy link

romainw commented Mar 5, 2024

Many thanks for your time and effort with go-dnscollector!

We are using go-dnscollector as part of the pDNSSOC project (https://github.com/CERN-CERT/pDNSSOC)

We process incoming dnstap files for a number of resolvers with little/no control over the files size or content.
We feed go-dnscollectors with dnstap files in a directory "watched" by the ingestor. Each file is about 50MB in size.

Unfortunately we suffer from a buffer full, dropping packets error.
We typically loose hundreds of thousands of packets, and our logs are flooded with "buffer full" error messages.

We tried increasing the value of "chan-buffer-size" to a larger value (e.g. 10485760) but it does not change the amount of errors or the amount of lost packets in each error.

I am not sure this is a bug or just a configuration issue, or just us doing it wrong, or what we can do about this.
Any advice welcome.

Current configuration:

    - name: dns_server_1
      file-ingestor:
        enable: true
        watch-dir: /var/pdnssoc_input/dns_server_1
        watch-mode: dnstap
        delete-after: true
        chan-buffer-size: 10485760
      transforms:
        filtering:
          log-queries: true
          log-replies: true
          drop-queryip-file: /etc/dnscollector/dns_servers.txt
@dmachard
Copy link
Owner

dmachard commented Mar 5, 2024

Could you share more details ?

  • the complete config file
  • version used
  • and output logs with buffer full messages

@romainw
Copy link
Author

romainw commented Mar 6, 2024

Sure! Thanks Denis.

We are using 0.41.0.

Config:

global:
  trace:
    verbose: true
  text-format: "timestamp-rfc3339ns identity operation rcode queryip queryport family protocol length qname qtype latency"
  # default text field delimiter
  text-format-delimiter: " "
  # default text field boundary
  text-format-boundary: "\""

multiplexer:
  collectors:
    - name: dnstap-dns-server-1
      file-ingestor:
        enable: true
        watch-dir: /var/pdnssoc_input/dns-server-1
        watch-mode: dnstap
        delete-after: true
        chan-buffer-size: 10485760
      transforms:
        filtering:
          log-queries: true
          log-replies: true
          drop-queryip-file: /root/dns_servers.txt
  loggers:
    - name: filelogdomains
      logfile:
        file-path: /var/dnscollector/matches/matches_domains.json
        mode: json
      transforms:
        filtering:
          keep-fqdn-file: '/var/dnscollector/misp_domains.txt'

    - name: filelogips
      logfile:
        file-path: /var/dnscollector/matches/matches_ips.json
        mode: json
      transforms:
        filtering:
          keep-rdata-file: '/var/dnscollector/misp_ips.txt'

    - name: fileall
      logfile:
        file-path: /var/dnscollector/queries.log
        mode: json
        max-size: 100
        max-files: 100
        postrotate-command: "/var/dnscollector/postrotate_query.sh"
        postrotate-delete-success: true

  routes:
    - from: [ dnstap-dns-server-1 ]
      to: [ filelogdomains, filelogips, fileall ]

Log files:

INFO: 2024/03/06 19:07:30.196568 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-18:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.196586 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-18:56:53.fstrm]
INFO: 2024/03/06 19:07:30.439693 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-10:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.439721 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-10:56:53.fstrm]
INFO: 2024/03/06 19:07:30.608676 collector - [dnstap-dns-server-1] fileingestor - processing of [dnstap-2024-03-02-02:56:53.fstrm] terminated
INFO: 2024/03/06 19:07:30.608694 collector - [dnstap-dns-server-1] fileingestor - delete file [dnstap-2024-03-02-02:56:53.fstrm]
ERROR: 2024/03/06 19:07:33.763154 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 364989 packet(s) dropped

The error would repeat over and over:

ERROR: 2024/03/03 04:16:40.594323 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 647689 packet(s) dropped
ERROR: 2024/03/03 04:16:50.602279 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 741568 packet(s) dropped
ERROR: 2024/03/03 04:17:00.726706 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 665186 packet(s) dropped
ERROR: 2024/03/03 04:17:10.824914 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 318581 packet(s) dropped
ERROR: 2024/03/03 04:17:20.825447 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 123588 packet(s) dropped
ERROR: 2024/03/03 04:17:30.895391 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 84076 packet(s) dropped
ERROR: 2024/03/03 04:18:43.438253 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 232958 packet(s) dropped
ERROR: 2024/03/03 04:18:53.484891 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 646237 packet(s) dropped
ERROR: 2024/03/03 04:19:03.654005 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 484388 packet(s) dropped
ERROR: 2024/03/03 04:19:13.786948 processor - [dnstap-dns-server-1] dnstap - logger[fileall] buffer is full, 403251 packet(s) dropped

@romainw
Copy link
Author

romainw commented Mar 6, 2024

(Just to give a volume indication, feeding on 34M dnstap file results in " dnstap - logger[fileall] buffer is full, 2003 packet(s) dropped" with the configuration file from the previous message)

@dmachard
Copy link
Owner

dmachard commented Mar 7, 2024

In your case, the bottleneck is the fileall logger, you can see that with logger[fileall] buffer is full, 647689 packet(s) dropped.This message indicates that the buffer of the fileall logger is full.

It occured because this logger is too slow regarding the amount of incoming data. To avoid to block the collector processing, the overhead of packets are dropped.

Could you try some adjustments

  • If you remove the postrotate command, the drop of packet still occurs ?
  • Try to increase the buffer of the logger fileall ? Be carefull with this parameter (big impact on memory usage)
chan-buffer-size: 131070

@romainw
Copy link
Author

romainw commented Mar 8, 2024

Good thinking, thank you!

After testing, it turns out that:

  • The postrotate script is causing the issue -- Disabling it solves the dropped packet issue.
  • The buffer size of the logger fileall does not play a role. In fact, the default 65535 is sufficient.

For the record, the postrotate script we use is:

#!/bin/bash

BACKUP_FOLDER=/var/dnscollector/queries/$(date +%Y-%m-%d)
mkdir -p $BACKUP_FOLDER

FILE_NAME=$(basename $1 .log)

jq -c '. | {timestamp: .dnstap."timestamp-rfc3339ns", query: .dns.qname, client: .network."query-ip", server: .network."response-ip", client_id: .dnstap.identity , answers: .dns."resource-records".an }' $1 > $BACKUP_FOLDER/$FILE_NAME.json && gzip -S .gz_minified $BACKUP_FOLDER/$FILE_NAME.json

I will continue to investigate what is causing the bottleneck there and update this ticket with any possible solution.

@dmachard
Copy link
Owner

dmachard commented Mar 8, 2024

@arvchristos @romainw The best way will be that the pdnssoc-cli uses natively the 'flat-json' model generated by DNScollector to avoid a new JSON generation from an external script. Also the file logger support gzip compression.

I think that the pDNSSOC and DNScollector integration can be greatly improved to support a higher load :)

@dmachard dmachard changed the title Buffer still full despite "chan-buffer-size" changes file logger: buffer still full despite "chan-buffer-size" changes Mar 8, 2024
@arvchristos
Copy link
Contributor

Hey @dmachard !

The main reason why we relied on an external script for JSON generation is to be able to store only the required for correlation fields and save on size. I am not sure whether there is any transformer to remove fields from the JSON output but it seems useful for space optimization (combined with gzip of course as you correctly pointed out).

We essentially need to store minified logs so that we can run correlation on past data with new indicators of compromise, apart from the live correlation we do with various other sources that go-dnscollector supports.

@dmachard
Copy link
Owner

dmachard commented Mar 8, 2024

I am not sure whether there is any transformer to remove fields from the JSON output

Great idea, can you open a ticket to track this?

@arvchristos
Copy link
Contributor

I am not sure whether there is any transformer to remove fields from the JSON output

Great idea, can you open a ticket to track this?

Sure, here it is:

#642

@dmachard
Copy link
Owner

Custom JSON relabeling has been implemented. It's will be implemented in next release v0.43.0.
The documentation is here.
Perhaps it can be used to improved performance.

@dmachard dmachard added question Further information is requested and removed needs more investigation labels Mar 24, 2024
@yadavp27
Copy link

yadavp27 commented Oct 8, 2024

Hi @dmachard

Im seeing similar issue with v1.0.0 while writing it to a local file. if you can suggest any tweak is needed in the config file.
i have tried increasing this values chan-buffer-size: 0 to 30720

`global:
trace:
verbose: true

pipelines:

  • name: tap
    dnstap:
    sock-path: /var/run/dnscollector/dnstap.sock
    routing-policy:
    forward: [ file ]

  • name: file
    logfile:
    file-path: "/var/log/dnscollector/dnstap.log"
    compress: true
    max-size: 30720
    max-files: 7
    mode: json
    chan-buffer-size: 0
    text-format: "timestamp identity qr operation rcode queryip queryport responseip responseport family protocol length qname qtype latency ttl"`

Oct 07 18:06:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:16.724111 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 1596 dnsmessage(s) dropped
Oct 07 18:06:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:26.724988 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6556 dnsmessage(s) dropped
Oct 07 18:06:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:36.725535 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6724 dnsmessage(s) dropped
Oct 07 18:06:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:46.726453 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5972 dnsmessage(s) dropped
Oct 07 18:06:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:06:56.726685 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 7065 dnsmessage(s) dropped
Oct 07 18:07:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:06.727064 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6029 dnsmessage(s) dropped
Oct 07 18:07:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:16.727741 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5829 dnsmessage(s) dropped
Oct 07 18:07:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:26.728751 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6174 dnsmessage(s) dropped
Oct 07 18:07:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:36.729660 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5778 dnsmessage(s) dropped
Oct 07 18:07:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:46.730720 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6140 dnsmessage(s) dropped
Oct 07 18:07:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:07:56.731152 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6053 dnsmessage(s) dropped
Oct 07 18:08:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:06.732230 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5895 dnsmessage(s) dropped
Oct 07 18:08:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:16.732329 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6083 dnsmessage(s) dropped
Oct 07 18:08:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:26.732412 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 6050 dnsmessage(s) dropped
Oct 07 18:08:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:36.732935 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5488 dnsmessage(s) dropped
Oct 07 18:08:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:46.733642 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5914 dnsmessage(s) dropped
Oct 07 18:08:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:08:56.733756 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 7116 dnsmessage(s) dropped
Oct 07 18:09:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:06.733896 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5845 dnsmessage(s) dropped
Oct 07 18:09:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:16.733973 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5587 dnsmessage(s) dropped
Oct 07 18:09:26 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:26.734995 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5570 dnsmessage(s) dropped
Oct 07 18:09:36 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:36.735098 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5593 dnsmessage(s) dropped
Oct 07 18:09:46 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:46.735690 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5701 dnsmessage(s) dropped
Oct 07 18:09:56 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:09:56.736378 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5875 dnsmessage(s) dropped
Oct 07 18:10:06 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:10:06.736505 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 5680 dnsmessage(s) dropped
Oct 07 18:10:16 dnscollector go-dnscollector[3564495]: WARNING: 2024/10/07 18:10:16.737712 worker - [tap] (conn #1) dnstap processor - worker[file] buffer is full, 4175 dnsmessage(s) dropped

Thanks.

@dmachard
Copy link
Owner

Hi @arvchristos @romainw,

I pushed a fix in the head of the repository to ensure the post-rotate command no longer blocks execution. I think this should be helpful for you!

@dmachard
Copy link
Owner

Fixed in release 1.2.0. If you encounter any further issues, please feel free to open a new ticket. Thank you for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants