Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline feature with built-in CRI parser does not separate streams #4387

Open
mdraijer opened this issue Dec 2, 2021 · 32 comments
Open

Multiline feature with built-in CRI parser does not separate streams #4387

mdraijer opened this issue Dec 2, 2021 · 32 comments
Assignees
Labels
long-term Long term issues (exempted by stale bots)

Comments

@mdraijer
Copy link

mdraijer commented Dec 2, 2021

Bug Report

Describe the bug

When a container application produces log messages that are split by the container runtime into multiple parts, and there are log messages written to stdout and stderr more or less at the same time, it is possible that Fluent Bit does not concatenate those parts.

To Reproduce

  • Example log message:
2021-11-18T14:47:34.730832241+00:00 stdout F some short log message A
2021-11-18T14:47:34.730832241+00:00 stdout P some longer log message B part one
2021-11-18T14:47:34.730908438+00:00 stdout F end of the longer message B
2021-11-18T14:47:34.730908438+00:00 stdout P another longer log message  C part one
2021-11-18T14:47:34.730939216+00:00 stdout P message C continued in part two
2021-11-18T14:47:34.735454549+00:00 stderr F some log message D in stderr part one
2021-11-18T14:47:34.735482854+00:00 stdout F message C still continued after a line from a different stream
2021-11-18T14:47:34.735482854+00:00 stderr F and the end of the stderr message D
  • Steps to reproduce the problem:

Create an application that writes a couple of long lines to stdout and stderr and run in a CRI-O runtime (e.g. Openshift Kubernetes cluster).

Expected behavior

The output (in our case Elastic) shows each log message complete as a separate document.

Your Environment

  • Version used: container image fluent/fluent-bit:1.8.9
  • Configuration: relevant part:
      [INPUT]
          Name tail
          Path /var/log/containers/*.log
          Multiline.Parser cri
    
  • Environment name and version: Kubernetes 1.20 / Openshift 4.7.37
  • Server type and version:
  • Operating System and version: RHEL 47.84, CRI-O 1.20.5-7
  • Filters and plugins:

Additional context

  • Lines have an indication in field 3: F for a one-line message and for the concluding line of a multi-line message; P for parts other than the final part of a multi-line message.
  • Log messages from different streams (stdout, stderr) can be mixed up (examples C and D).
  • Log messages can be in JSON and we also apply the JSON parser as filter.
  • Messages which are not recombined by Fluent Bit are hard to find in Elastic and in case they are formatted as JSON, the parts can obviously not be interpreted as JSON, so we are missing a lot of information in Elastic for those lines.

Note that this is not about the multiline filter (see e.g. #4309), nor about the old multiline parser (see Tail - Multiline).

@PettitWesley
Copy link
Contributor

@edsiper IMO, this is a pretty major miss in the cri parser.

@mdraijer
Copy link
Author

@edsiper @PettitWesley any idea when a fix can be implemented?

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Mar 26, 2022
@mdraijer
Copy link
Author

I take it that the "pretty major miss" is still open?

@PettitWesley PettitWesley added long-term Long term issues (exempted by stale bots) and removed Stale labels Mar 28, 2022
@PettitWesley
Copy link
Contributor

This definitely needs to be fixed @agup006 @edsiper, this is a fundamental thing users expect Fluent Bit to just get right/work

@agup006
Copy link
Member

agup006 commented Mar 28, 2022

I thought we had added the partial message support in the initial version of cri parser, adding @lecaros - would you be able to help us schedule this with other multiline work?

@PettitWesley
Copy link
Contributor

@agup006 @lecaros partial message support for CRI log files exists but here is reported a bug: lines from stdout and stderr are not properly separated

@PettitWesley
Copy link
Contributor

I want to note that fixing this is separate from the work that I did in the multiline filter to add buffered mode and partial_message support.

The work here is net new and different. Though the use case/problem statement is similar- split messages from a container runtime need to be rejoined intelligently based on context (like whether they come from stdout or stderr). My work was to serve folks who are mainly using the fluentd docker log driver to send logs to Fluent Bit. This is CRI container log files which is similar and different.

@tarruda
Copy link

tarruda commented Sep 21, 2022

This can be easily accomplished with a Lua filter:

local buffer = {
  stdout = {
    text = {}
  },
  stderr = {
    text = {}
  }
}


function cb_filter(tag, ts, record)
  local _, _, date, stream, record_type, line = record.log:find('^(%S+)%s+(%S+)%s+([PF])%s+(.+)$')
  
  table.insert(buffer[stream].text, line)
  if #(buffer[stream].text) == 1 then
    -- store the date extracted from the first line
    buffer[stream].date = date
  end
  
  if record_type == 'P' then
    -- not finished
    return -1       
  end
  
  -- flush the buffered message
  record = {
    date = buffer[stream].date,
    stream = stream,
    log = table.concat(buffer[stream].text, '\n')
  }
  buffer[stream].text={}
  
  return 1, ts, record
end

Test on fluent-bit lua playground. Sample input:

{"log":"2021-11-18T14:47:34.730832241+00:00 stdout F some short log message A"}
{"log":"2021-11-18T14:47:34.730832241+00:00 stdout P some longer log message B part one"}
{"log":"2021-11-18T14:47:34.730908438+00:00 stdout F end of the longer message B"}
{"log":"2021-11-18T14:47:34.730908438+00:00 stdout P another longer log message  C part one"}
{"log":"2021-11-18T14:47:34.730939216+00:00 stdout P message C continued in part two"}
{"log":"2021-11-18T14:47:34.735454549+00:00 stderr F some log message D in stderr part one"}
{"log":"2021-11-18T14:47:34.735482854+00:00 stdout F message C still continued after a line from a different stream"}
{"log":"2021-11-18T14:47:34.735482854+00:00 stderr F and the end of the stderr message D"}

Output:

[0,{"date":"2021-11-18T14:47:34.730832241+00:00","stream":"stdout","log":"some short log message A"}]
[0,{"date":"2021-11-18T14:47:34.730832241+00:00","stream":"stdout","log":"some longer log message B part one\nend of the longer message B"}]
[0,{"date":"2021-11-18T14:47:34.735454549+00:00","stream":"stderr","log":"some log message D in stderr part one"}]
[0,{"date":"2021-11-18T14:47:34.730908438+00:00","stream":"stdout","log":"another longer log message  C part one\nmessage C continued in part two\nmessage C still continued after a line from a different stream"}]
[0,{"date":"2021-11-18T14:47:34.735482854+00:00","stream":"stderr","log":"and the end of the stderr message D"}]

Note that this example does not consider multiple files. If grouping streams is required, then per file/tag buffers would need to be used

@mmerickel
Copy link

mmerickel commented Dec 21, 2022

Has anyone actually seen containerd or cri-o output interleave partials like in the OP's example? From things like https://github.com/containerd/containerd/blob/78cd9d3b6b4c1a8d977a9ee695328f4c51a304d0/pkg/cri/io/logger_test.go#L121 I'm led to believe that they output the full stream in multiple lines before moving onto lines from the other stream, so you should be able to assume all partials in a row are part of the same stream.

This is counter to the original design spec, but I would say that spec is just broken and makes no sense because it has partials for stdout being joined with a full from stderr for "log entry 2" which just makes no sense.

tl;dr it feels like this isn't actually a bug if the multiline logic is combining consecutive lines from the same file based on P/F tags?

@frankhetterich
Copy link

We experiencing a similar Issue, but i´m not shure if it´s really the same.
Our Applications are logging in JSON to stdout and we´re harvesting the logs with the tail plugin:

[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*_default_*.log
    multiline.parser  docker, cri
    DB                /var/log/fluentbit_kube.db
    Mem_Buf_Limit     5MB
    Skip_Long_Lines   Off
    Refresh_Interval  10`

Now it happens that K8s rotates the logfile and truncates the JSON logline randomly:
2023-01-27T13:40:28.92620435Z stdout P {"Timestamp":"2023-01-27T13:40:28.9222232+00:00","Level":"Information","MessageTemplate":"Request finished in {ElapsedMilliseconds}ms {StatusCode} {ContentType}","RenderedMessage":"Request finished in 0ms 200 null","Properties":{"ElapsedMilliseconds":0,"St

The log then continues in a new File:
2023-01-27T13:40:28.92735586Z stdout F atusCode":200,"ContentType":null,"EventId":{"Id":2,"Name":"RequestFinished"}

Our expectation now is that fluentbit will put these two lines together again and send them to Elasticsearch as one line like this (manual) example:

2023-01-27T13:40:28.92620435Z stdout P {"Timestamp":"2023-01-27T13:40:28.9222232+00:00","Level":"Information","MessageTemplate":"Request finished in {ElapsedMilliseconds}ms {StatusCode} {ContentType}","RenderedMessage":"Request finished in 0ms 200 null","Properties":{"ElapsedMilliseconds":0,"StatusCode":200,"ContentType":null,"EventId":{"Id":2,"Name":"RequestFinished"}

Is this expectation correct? Or is the Issue maybe related to the K8s log rotate?

regards,
Frank

@ryan65
Copy link

ryan65 commented May 8, 2023

Hi
I too have a problem with multiline from different streams of the same container that get mixed together. occasionally I can see a stderr log line (debug log) take part of a multiline from an stdout stream (info) . it doesnt happen most of the time , but it is pretty consistent once in a while.
Is there any workaround . ?

@leonardo-albertovich
Copy link
Collaborator

Is this happening in fluent-bit 2.1 or 2.0?

@ryan65
Copy link

ryan65 commented May 8, 2023

actually I'm working with 1.9.7 but I didn't see any fix on this so I assume it is still happening.
Just to clarify, this only happens with difference streams , the stderr log seems to sometimes take up the last line or lines of a previous stdout multi line log.
I use my own multiline parser state machine , not a predefined one.
I was wondering if there is a workaround for example, somehow to have multiple instances of parsers running , one for each different stream so that they wont collide.

@leonardo-albertovich
Copy link
Collaborator

The thing is that there supposedly there should be different multiline component instances which is why I'm asking about 2.0 or 2.1.

Since 1.9 and 1.8 are no longer supported the best approach to get this fixed would be reproducing it in either 2.0 or 2.1 so we can take a look.

@ryan65
Copy link

ryan65 commented May 9, 2023

Hi leonardo
Just FYI
I see there is some kind of break in my fluentbit 2.1.1 / fluentd integration. (my fluentd is 1.15.2 but I also tried 1.16 and same reslults)
when upgrading to 2.1.1 I get error messages in the fluentd logs , seems related related to the fluent timestamp
local-lrc-fluentd-log-collector-1 | 2023-05-09 05:50:47 +0000 [error]: #0 unexpected error on reading data host="172.28.0.3" port=40886 error_class=NoMethodError error="undefined method to_i' for [2023-05-09 05:50:46.662332400 +0000, {}]:Array\n\n time = Fluent::EventTime.now if time.nil? || time.to_i == 0 # to_i == 0` for empty EventTime\n ^^^^^\nDid you mean? to_s\n to_a\n to_h"

seems that something is wrong with the internal fluentbit metadata containing the timestamp or something else related to metadata I think.
HOWEVER I see this is fixed in 2.1.2 but I didnt see any mention of this in the release notes.

Anyway regarding the mixed multilines between streams ,
I will deploy with 2.1.2 and check if the issue still happens, and update later on in the day .
Thanks.

@leonardo-albertovich
Copy link
Collaborator

It seems that it could have been missed by the person who wrote the notes. Yes, there was an issue in versions 2.1.0 and 2.1.1 with the default value of the retain_metadata_in_forward_mode option in the forward output plugin not being set to false which means fluent-bit was serializing records in the new format (that includes metadata).

In fluent-bit 2.1.2 that was amended to retain backwards compatibility with fluentd, older fluent-bit versions and compatible systems which in turn means that when a user wants to interconnect two fluent-bit 2.1+ instances using the forward output plugin they need to explicitly set retain_metadata_in_forward_mode to true in order to retain any existing metadata (only pertinent for opentelemetry ingested data at the moment).

I hope this makes things a bit clearer.

@ryan65
Copy link

ryan65 commented May 9, 2023

Hi Leonardo
so I tested with 2.1.2 and the issue still happens
below are logs from the the fluentbit output filter , to demonstrate the issue.
I wrote a test that print several logs every 10 seconds (mixed log levels) and one of them is a multiline callstack.
If you go to timestamp 683629046.556653225 you will see the multiline log in stdout and the last part of the callstack "at process.processTimers (node:internal/timers:512:7" is missing,
Now if you go to the log at 1683629046.556703940 which was printed with logLevel error (stderr stream) the end of the missing callstack is actually appended to this log
Important to note.

  1. This does not happen most of the times , usally everything is OK as expected as you can see in the logs below. You can check all the other logs below which have the word Multiline and will see the whole callstack is there.
  2. this is definitely only when we have different logs in different streams (debug, info, error) , I never saw this issue when all the logs where printed with the same stream . stdout. (info).

2023-05-09T13:43:55+03:00 [0] lrc.saas.srl-worker: [[1683629033.552235588, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:53.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{}
2023-05-09T13:43:57+03:00 [0] lrc.saas.srl-worker: [[1683629036.555438955, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:43:57+03:00 [1] lrc.saas.srl-worker: [[1683629036.555537074, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack
2023-05-09T13:43:57+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54)
2023-05-09T13:43:57+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:43:57+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:43:57+03:00 [2] lrc.saas.srl-worker: [[1683629036.555557797, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{}
2023-05-09T13:43:57+03:00 [3] lrc.saas.srl-worker: [[1683629036.555608716, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| ", "kuberemoved"=>{}
2023-05-09T13:44:04+03:00 [0] lrc.saas.srl-worker: [[1683629043.552716663, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:04+03:00 [1] lrc.saas.srl-worker: [[1683629043.552794859, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack
2023-05-09T13:44:04+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54)
2023-05-09T13:44:04+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:44:04+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:04+03:00 [2] lrc.saas.srl-worker: [[1683629043.552830072, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{}
2023-05-09T13:44:06+03:00 [0] lrc.saas.srl-worker: [[1683629043.552833405, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{}
2023-05-09T13:44:07+03:00 [0] lrc.saas.srl-worker: [[1683629046.556515642, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:07+03:00 [1] lrc.saas.srl-worker: [[1683629046.556653225, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack
2023-05-09T13:44:07+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54)
2023-05-09T13:44:07+03:00 at listOnTimeout (node:internal/timers:569:17)", "kuberemoved"=>{}
2023-05-09T13:44:07+03:00 [2] lrc.saas.srl-worker: [[1683629046.556678249, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{}
2023-05-09T13:44:07+03:00 [3] lrc.saas.srl-worker: [[1683629046.556703940, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line||
2023-05-09T13:44:07+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:14+03:00 [0] lrc.saas.srl-worker: [[1683629053.551961925, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.551Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:14+03:00 [1] lrc.saas.srl-worker: [[1683629053.552114257, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack
2023-05-09T13:44:14+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54)
2023-05-09T13:44:14+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:44:14+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:14+03:00 [2] lrc.saas.srl-worker: [[1683629053.552150889, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{}
2023-05-09T13:44:16+03:00 [0] lrc.saas.srl-worker: [[1683629053.552154410, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{}
2023-05-09T13:44:17+03:00 [0] lrc.saas.srl-worker: [[1683629056.554114944, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:17+03:00 [1] lrc.saas.srl-worker: [[1683629056.554279655, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack
2023-05-09T13:44:17+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54)
2023-05-09T13:44:17+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:44:17+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:17+03:00 [2] lrc.saas.srl-worker: [[1683629056.554320923, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{}
2023-05-09T13:44:18+03:00 [0] lrc.saas.srl-worker: [[1683629056.554330939, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| ", "kuberemoved"=>{}
2023-05-09T13:44:24+03:00 [0] lrc.saas.srl-worker: [[1683629063.552755280, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:24+03:00 [1] lrc.saas.srl-worker: [[1683629063.552883385, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack
2023-05-09T13:44:24+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54)
2023-05-09T13:44:24+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:44:24+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:24+03:00 [2] lrc.saas.srl-worker: [[1683629063.552934892, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{}
2023-05-09T13:44:25+03:00 [0] lrc.saas.srl-worker: [[1683629063.552939775, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{}
2023-05-09T13:44:27+03:00 [0] lrc.saas.srl-worker: [[1683629066.554403061, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:27+03:00 [1] lrc.saas.srl-worker: [[1683629066.554525319, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack
2023-05-09T13:44:27+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54)
2023-05-09T13:44:27+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:44:27+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:27+03:00 [2] lrc.saas.srl-worker: [[1683629066.554554719, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{}
2023-05-09T13:44:27+03:00 [3] lrc.saas.srl-worker: [[1683629066.554586062, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| ", "kuberemoved"=>{}
2023-05-09T13:44:33+03:00 [0] lrc.saas.srl-worker: [[1683629073.071104064, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.071Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||RabbitmqMessageTracker : 0 messages are now being handled.|| ", "kuberemoved"=>{}
2023-05-09T13:44:34+03:00 [0] lrc.saas.srl-worker: [[1683629073.552509143, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{}
2023-05-09T13:44:34+03:00 [1] lrc.saas.srl-worker: [[1683629073.552636752, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack
2023-05-09T13:44:34+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54)
2023-05-09T13:44:34+03:00 at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:44:34+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{}
2023-05-09T13:44:34+03:00 [2] lrc.saas.srl-worker: [[1683629073.552674041, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{}

@ryan65
Copy link

ryan65 commented May 9, 2023

It seems that it could have been missed by the person who wrote the notes. Yes, there was an issue in versions 2.1.0 and 2.1.1 with the default value of the retain_metadata_in_forward_mode option in the forward output plugin not being set to false which means fluent-bit was serializing records in the new format (that includes metadata).

In fluent-bit 2.1.2 that was amended to retain backwards compatibility with fluentd, older fluent-bit versions and compatible systems which in turn means that when a user wants to interconnect two fluent-bit 2.1+ instances using the forward output plugin they need to explicitly set retain_metadata_in_forward_mode to true in order to retain any existing metadata (only pertinent for opentelemetry ingested data at the moment).

I hope this makes things a bit clearer.

Yes , thanks, just out of curiosity , when do you know when fluentd will be upgraded to support the metadata ?

@ryan65
Copy link

ryan65 commented May 9, 2023

Hi Leonardo so I tested with 2.1.2 and the issue still happens below are logs from the the fluentbit output filter , to demonstrate the issue. I wrote a test that print several logs every 10 seconds (mixed log levels) and one of them is a multiline callstack. If you go to timestamp 683629046.556653225 you will see the multiline log in stdout and the last part of the callstack "at process.processTimers (node:internal/timers:512:7" is missing, Now if you go to the log at 1683629046.556703940 which was printed with logLevel error (stderr stream) the end of the missing callstack is actually appended to this log Important to note.

  1. This does not happen most of the times , usally everything is OK as expected as you can see in the logs below. You can check all the other logs below which have the word Multiline and will see the whole callstack is there.
  2. this is definitely only when we have different logs in different streams (debug, info, error) , I never saw this issue when all the logs where printed with the same stream . stdout. (info).

2023-05-09T13:43:55+03:00 [0] lrc.saas.srl-worker: [[1683629033.552235588, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:53.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{} 2023-05-09T13:43:57+03:00 [0] lrc.saas.srl-worker: [[1683629036.555438955, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:43:57+03:00 [1] lrc.saas.srl-worker: [[1683629036.555537074, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack 2023-05-09T13:43:57+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54) 2023-05-09T13:43:57+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:43:57+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:43:57+03:00 [2] lrc.saas.srl-worker: [[1683629036.555557797, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{} 2023-05-09T13:43:57+03:00 [3] lrc.saas.srl-worker: [[1683629036.555608716, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:43:56.555Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| ", "kuberemoved"=>{} 2023-05-09T13:44:04+03:00 [0] lrc.saas.srl-worker: [[1683629043.552716663, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:04+03:00 [1] lrc.saas.srl-worker: [[1683629043.552794859, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack 2023-05-09T13:44:04+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54) 2023-05-09T13:44:04+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:44:04+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:04+03:00 [2] lrc.saas.srl-worker: [[1683629043.552830072, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{} 2023-05-09T13:44:06+03:00 [0] lrc.saas.srl-worker: [[1683629043.552833405, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:03.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{} 2023-05-09T13:44:07+03:00 [0] lrc.saas.srl-worker: [[1683629046.556515642, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:07+03:00 [1] lrc.saas.srl-worker: [[1683629046.556653225, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack 2023-05-09T13:44:07+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54) 2023-05-09T13:44:07+03:00 at listOnTimeout (node:internal/timers:569:17)", "kuberemoved"=>{} 2023-05-09T13:44:07+03:00 [2] lrc.saas.srl-worker: [[1683629046.556678249, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{} 2023-05-09T13:44:07+03:00 [3] lrc.saas.srl-worker: [[1683629046.556703940, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:06.556Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| 2023-05-09T13:44:07+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:14+03:00 [0] lrc.saas.srl-worker: [[1683629053.551961925, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.551Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:14+03:00 [1] lrc.saas.srl-worker: [[1683629053.552114257, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack 2023-05-09T13:44:14+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54) 2023-05-09T13:44:14+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:44:14+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:14+03:00 [2] lrc.saas.srl-worker: [[1683629053.552150889, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{} 2023-05-09T13:44:16+03:00 [0] lrc.saas.srl-worker: [[1683629053.552154410, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:13.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{} 2023-05-09T13:44:17+03:00 [0] lrc.saas.srl-worker: [[1683629056.554114944, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:17+03:00 [1] lrc.saas.srl-worker: [[1683629056.554279655, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack 2023-05-09T13:44:17+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54) 2023-05-09T13:44:17+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:44:17+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:17+03:00 [2] lrc.saas.srl-worker: [[1683629056.554320923, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{} 2023-05-09T13:44:18+03:00 [0] lrc.saas.srl-worker: [[1683629056.554330939, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:16.554Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| ", "kuberemoved"=>{} 2023-05-09T13:44:24+03:00 [0] lrc.saas.srl-worker: [[1683629063.552755280, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:24+03:00 [1] lrc.saas.srl-worker: [[1683629063.552883385, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack 2023-05-09T13:44:24+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54) 2023-05-09T13:44:24+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:44:24+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:24+03:00 [2] lrc.saas.srl-worker: [[1683629063.552934892, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{} 2023-05-09T13:44:25+03:00 [0] lrc.saas.srl-worker: [[1683629063.552939775, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:23.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{} 2023-05-09T13:44:27+03:00 [0] lrc.saas.srl-worker: [[1683629066.554403061, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:27+03:00 [1] lrc.saas.srl-worker: [[1683629066.554525319, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack 2023-05-09T13:44:27+03:00 at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54) 2023-05-09T13:44:27+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:44:27+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:27+03:00 [2] lrc.saas.srl-worker: [[1683629066.554554719, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line|| ", "kuberemoved"=>{} 2023-05-09T13:44:27+03:00 [3] lrc.saas.srl-worker: [[1683629066.554586062, {}], {"stream"=>"stderr", "logtag"=>"F", "log"=>"2023-05-09T10:44:26.554Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line|| ", "kuberemoved"=>{} 2023-05-09T13:44:33+03:00 [0] lrc.saas.srl-worker: [[1683629073.071104064, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.071Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||RabbitmqMessageTracker : 0 messages are now being handled.|| ", "kuberemoved"=>{} 2023-05-09T13:44:34+03:00 [0] lrc.saas.srl-worker: [[1683629073.552509143, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line|| ", "kuberemoved"=>{} 2023-05-09T13:44:34+03:00 [1] lrc.saas.srl-worker: [[1683629073.552636752, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack 2023-05-09T13:44:34+03:00 at Timeout._onTimeout (/opt/testexec.server/hub/app.js:188:54) 2023-05-09T13:44:34+03:00 at listOnTimeout (node:internal/timers:569:17) 2023-05-09T13:44:34+03:00 at process.processTimers (node:internal/timers:512:7)|| ", "kuberemoved"=>{} 2023-05-09T13:44:34+03:00 [2] lrc.saas.srl-worker: [[1683629073.552674041, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:44:33.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| ", "kuberemoved"=>{}

BTW - just to clarify , I use my own custom multiline parser , which assumes a space at the begining of the log should be appended to the previous log . since all our logs never start with a space .
[MULTILINE_PARSER]
name multiline-lrc-saas
type regex
rule "start_state" "/^[^[\s\]]]/" "cont"
rule "cont" "/^[\s]]/" "cont"

@leonardo-albertovich
Copy link
Collaborator

I don't know when / if fluentd will be updated to support metadata. We submitted a forward protocol enhancement proposal but I'm not the one in charge of it and I don't have any information about it at the moment.

Do you think it'd be possible for you to put together a reproduction case we can use to reliably reproduce the issue and work on it?

Ideally it would be a self contained case with the required configuration files, scripts used to produce the input, sample input files (if any) and instructions.

I'm sure that would speed up the process tremendously and would be highly appreciated.

Thank you very much.

@ryan65
Copy link

ryan65 commented May 9, 2023

Hi Leonardo
what reproduction are you referring to , the mixed logs or the issue with fluentd and metadata compatibility ?
Thanks.

@leonardo-albertovich
Copy link
Collaborator

The mixed logs.

@ryan65
Copy link

ryan65 commented May 9, 2023

Hi @leonardo-albertovich
After you asked for logs files I actually found something very very interesting .
I entered the container logs themselves and see that the problem already begins in the container log file itself.
see timestamp 13:46:16.770528105Z where the multiline is written. in between the callstack there is an stderr log.
and then one line after the end of the callstack is printed.
Questions:

  1. Not sure but recently we have moved from docker runtime to containerd runtime but Im not sure if this is the cause. It might have happened in the past as well but just went un noticed. could be an issue with the logging library itself. (winston nodejs)
  2. in a case below , would you say that fluentbit should be able to handle a case like this (e.g seperate multiline accumilation for different streams) or is this the expected behaviour of fluentbit in which case the problem lies elsewhere in the order of the logs in the container files.

Thanks.

2023-05-09T13:46:13.769279129Z stdout F 2023-05-09T13:46:13.769Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info||
2023-05-09T13:46:13.769282954Z stdout F 2023-05-09T13:46:13.769Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2||
2023-05-09T13:46:16.770435149Z stdout F 2023-05-09T13:46:16.770Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZ Just a single line||
2023-05-09T13:46:16.770528105Z stdout F 2023-05-09T13:46:16.770Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||ZZZZZZZZZZZZZ MultiLine Error: test stack
2023-05-09T13:46:16.77053802Z stdout F at Timeout.different [as _onTimeout] (/opt/testexec.server/hub/app.js:180:54)
2023-05-09T13:46:16.770544984Z stdout F at listOnTimeout (node:internal/timers:569:17)
2023-05-09T13:46:16.77056145Z stderr F 2023-05-09T13:46:16.770Z - debug: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Debug line||
2023-05-09T13:46:16.77056665Z stdout F at process.processTimers (node:internal/timers:512:7)||

2023-05-09T13:46:16.770603504Z stderr F 2023-05-09T13:46:16.770Z - error: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||YYYYYYYYYYYYYYYYYYY Error line||
2023-05-09T13:46:23.769253346Z stdout F 2023-05-09T13:46:23.769Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# Just a single line||
2023-05-09T13:46:23.769371769Z stdout F 2023-05-09T13:46:23.769Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||############# MultiLine Error: test stack
2023-05-

@mdraijer
Copy link
Author

@ryan65 @leonardo-albertovich
Yes, the very interesting thing in the previous comment is exactly what I described in the issue itself, see at the top of this page.

Both Docker runtime and CRI-O show this behaviour, although Docker cuts off at 16k and CRI-O at 8k, so you will notice it sooner / more often with CRI-O.

The whole point is that Fluent-Bit should recognize the stream and perform the concatenation logic only for lines from the same stream.

Simple as that.

@ryan65
Copy link

ryan65 commented May 10, 2023

hi @mdraijer
Oh , this takes me by surprise, I wasn't actually sure that this is expected of fluentbit, in this case this seems to be a bug in fluentbit mechanism.
@leonardo-albertovich can you please confirm ?

@leonardo-albertovich
Copy link
Collaborator

From what I can see at the code (and my limited knowledge of the multiline parser system) the stream segregation feature should be there but I need to take a deeper look into it to be able to make a proper assessment.

I'll assign this issue to myself and I'll take some time to reproduce it as time permits (probably next week) to wrap it up as soon as possible.

At first I thought it could be an issue with this regular expression having an excessively greedy time pattern but that should alter the results so I doubt that's the case.

@leonardo-albertovich leonardo-albertovich self-assigned this May 10, 2023
@ryan65
Copy link

ryan65 commented May 10, 2023

@leonardo-albertovich ,
Great. thanks very much

@leonardo-albertovich
Copy link
Collaborator

Quick update: @Syn3rman will work on this issue next week.

@Syn3rman
Copy link
Contributor

I've opened a PR for it, could you give it a try @ryan65 or @mdraijer and check if it fixes it? thanks :)

@ryan65
Copy link

ryan65 commented May 28, 2023

Hi , not sure how to try it out, is there a docker image version tag available to download?
Thanks.

@mdraijer
Copy link
Author

@Syn3rman I see 2 merged PR's in this thread. Does that mean that your fix (from May 2023, sorry for the late response), is already in some release? If yes, from which version is it included?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
long-term Long term issues (exempted by stale bots)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants