Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting BufferChunkOverflowError from non-buffered copy plugin! #2928

Closed
pranavmarla opened this issue Apr 2, 2020 · 10 comments
Closed

Getting BufferChunkOverflowError from non-buffered copy plugin! #2928

pranavmarla opened this issue Apr 2, 2020 · 10 comments
Labels

Comments

@pranavmarla
Copy link

pranavmarla commented Apr 2, 2020

Describe the bug

In my Fluentd config, I use the copy plugin to send the same logs to two Kafka clusters (each using the kafka2 plugin). In each kafka2 section, I set the max buffer chunk size (i.e. chunk_limit_size) to 600 KB (600,000 bytes).
(See Fluentd config below)

I am seeing multiple error messages like this:

{"time":"2020-04-02 09:54:14.883 -0400","level":"error","message":"[publish_logs_to_outputs] ignore emit error error_class=Fluent::Plugin::Buffer::BufferChunkOverflowError error=\"a 465698bytes record is larger than buffer chunk limit size\"","worker_id":5}

{"time":"2020-04-02 11:11:31.630 -0400","level":"error","message":"[publish_logs_to_outputs] ignore emit error error_class=Fluent::Plugin::Buffer::BufferChunkOverflowError error=\"a 35711bytes record is larger than buffer chunk limit size\"","worker_id":5}

There are two problems with these error messages:

  1. The BufferChunkOverflowError is being generated from the copy plugin which, as far as I know, is non-buffered!
  2. According to the error messages, the BufferChunkOverflowError is being triggered by log sizes (eg. ~466 KB, ~36 KB) that are all much lesser than the actual chunk_limit_size set in my kafka2 sections: 600 KB!

Expected behavior
What I expect is:

  1. Any BufferChunkOverflowError should only be generated by output plugins that actually have buffers, like the kafka2 plugin -- NOT the copy plugin!
  2. The BufferChunkOverflowError should only be triggered when the log size is greater than the configured chunk_limit_size!

Your Environment

  • td-agent version: 1.9.2
  • fluent-plugin-kafka version: 0.12.3
  • Operating system: Ubuntu 18.04.4 LTS (Bionic Beaver)
  • Kernel version: 4.15.0-88-generic

Your Configuration

...
<match **>

  @type copy
  @id publish_logs_to_outputs

  # Kafka cluster 1
  <store ignore_error>
    
    @type kafka2
    @id kafka_cluster_1

    default_topic xxx
    
    brokers xxx

    sasl_over_ssl true
    username xxx
    password xxx
    ssl_ca_cert xxx
    ssl_verify_hostname false
    
    <format>
      @type json
    </format>

    <buffer>
      @type file
      
      chunk_limit_size 600000
      total_limit_size 500g

      flush_mode interval
      flush_interval 10s
      flush_thread_count 12
    </buffer>

  </store>

  # Kafka cluster 2
  <store ignore_error>
    
    @type kafka2
    @id kafka_cluster_2

    default_topic xxx
    
    brokers xxx

    sasl_over_ssl true
    username xxx
    password xxx
    ssl_ca_cert xxx
    ssl_verify_hostname false
    
    <format>
      @type json
    </format>

    <buffer>
      @type file
      
      chunk_limit_size 600000
      total_limit_size 500g

      flush_mode interval
      flush_interval 10s
      flush_thread_count 12
    </buffer>

  </store>

</match>
...

Your Error Log
(copied from above)

I am seeing multiple error messages like this:

{"time":"2020-04-02 09:54:14.883 -0400","level":"error","message":"[publish_logs_to_outputs] ignore emit error error_class=Fluent::Plugin::Buffer::BufferChunkOverflowError error=\"a 465698bytes record is larger than buffer chunk limit size\"","worker_id":5}

{"time":"2020-04-02 11:11:31.630 -0400","level":"error","message":"[publish_logs_to_outputs] ignore emit error error_class=Fluent::Plugin::Buffer::BufferChunkOverflowError error=\"a 35711bytes record is larger than buffer chunk limit size\"","worker_id":5}

Additional context

To clarify, most of the error messages are what I expect, and indicate that the kafka2 config (and chunk_limit_size config) are working as expected.

For example, most of the error messages look like this:

{"time":"2020-04-02 11:37:29.573 -0400","level":"warn","message":"[kafka_cluster_2] chunk bytes limit exceeds for an emitted event stream: 8293577bytes","worker_id":4}

There is no problem with these error messages, because the error is coming from the buffered kafka2 plugin, and the error is being generated by a log > chunk_limit_size (600 KB) -- i.e. these error messages are expected, because I expect logs > 600 KB to be ignored by the kafka2 plugin.

My problem is that I ALSO sometimes see the weird error messages at the top, which come from the non-buffered copy plugin, and complain about logs that are < chunk_limit_size (600 KB)

@ganmacs
Copy link
Member

ganmacs commented Apr 3, 2020

The BufferChunkOverflowError is being generated from the copy plugin which, as far as I know, is non-buffered!

This error is kafka2's error which copy plugin emits. This error is a bit confusing. How about adding class name where the error occurs to the error log like "ignore emit error from #{output}", error: e? In design, Errors which is occured in output plugins are handled outside of them.

log.error "ignore emit error", error: e

According to the error messages, the BufferChunkOverflowError is being triggered by log sizes (eg. ~466 KB, ~36 KB) that are all much lesser than the actual chunk_limit_size set in my kafka2 sections: 600 KB!

it's wired... The error can only happen when a chunk is over chunk_limit_size.

def chunk_size_over?(chunk)

raise BufferChunkOverflowError, "a #{big_record_size}bytes record is larger than buffer chunk limit size"

@pranavmarla
Copy link
Author

@ganmacs Interesting ... So you're saying that, even though the error message comes from the copy plugin -- which we know because the error message has the copy plugin's ID: [publish_logs_to_outputs] -- the actual error was generated by the kafka2 plugin?

@ganmacs
Copy link
Member

ganmacs commented Apr 6, 2020

Yes. Handling such errors outside of plugins is the design of fluentd.

@repeatedly
Copy link
Member

How about adding class name where the error occurs to the error log like "ignore emit error from #{output}"

output.plugin_id?

@pranavmarla
Copy link
Author

@ganmacs Thank you for explaining how the error handling works -- that answers my first question.

Do you have any ideas regarding my second question: Why is BufferChunkOverflowError being triggered by logs that are < chunk_limit_size (600 KB)?

@ganmacs
Copy link
Member

ganmacs commented Apr 8, 2020

Do you have any ideas regarding my second question: Why is BufferChunkOverflowError being triggered by logs that are < chunk_limit_size (600 KB)?

I'm not sure. I can't reproduce it. Do you have a reproduction config and input data?

@pranavmarla
Copy link
Author

pranavmarla commented Apr 9, 2020

Thank you. I just checked my records -- it is still happening, but not frequently. Over 24 hours, it happened about ~116 times. Our current config is fairly complex though -- let me see if I can get a simpler config that reproduces the same issue, and I will upload here -- if not, I will just upload the complex config here.

@pranavmarla
Copy link
Author

Note: This might be the same as #3072, which was apparently fixed in v1.11.2

@github-actions
Copy link

This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days

@github-actions github-actions bot added the stale label Jan 25, 2021
@github-actions
Copy link

This issue was automatically closed because of stale in 30 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants