-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiline feature with built-in CRI parser does not separate streams #4387
Comments
@edsiper IMO, this is a pretty major miss in the |
@edsiper @PettitWesley any idea when a fix can be implemented? |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
I take it that the "pretty major miss" is still open? |
I thought we had added the partial message support in the initial version of cri parser, adding @lecaros - would you be able to help us schedule this with other multiline work? |
I want to note that fixing this is separate from the work that I did in the multiline filter to add buffered mode and partial_message support. The work here is net new and different. Though the use case/problem statement is similar- split messages from a container runtime need to be rejoined intelligently based on context (like whether they come from stdout or stderr). My work was to serve folks who are mainly using the fluentd docker log driver to send logs to Fluent Bit. This is CRI container log files which is similar and different. |
This can be easily accomplished with a Lua filter: local buffer = {
stdout = {
text = {}
},
stderr = {
text = {}
}
}
function cb_filter(tag, ts, record)
local _, _, date, stream, record_type, line = record.log:find('^(%S+)%s+(%S+)%s+([PF])%s+(.+)$')
table.insert(buffer[stream].text, line)
if #(buffer[stream].text) == 1 then
-- store the date extracted from the first line
buffer[stream].date = date
end
if record_type == 'P' then
-- not finished
return -1
end
-- flush the buffered message
record = {
date = buffer[stream].date,
stream = stream,
log = table.concat(buffer[stream].text, '\n')
}
buffer[stream].text={}
return 1, ts, record
end Test on fluent-bit lua playground. Sample input:
Output:
Note that this example does not consider multiple files. If grouping streams is required, then per file/tag buffers would need to be used |
Has anyone actually seen containerd or cri-o output interleave partials like in the OP's example? From things like https://github.com/containerd/containerd/blob/78cd9d3b6b4c1a8d977a9ee695328f4c51a304d0/pkg/cri/io/logger_test.go#L121 I'm led to believe that they output the full stream in multiple lines before moving onto lines from the other stream, so you should be able to assume all partials in a row are part of the same stream. This is counter to the original design spec, but I would say that spec is just broken and makes no sense because it has partials for stdout being joined with a full from stderr for "log entry 2" which just makes no sense. tl;dr it feels like this isn't actually a bug if the multiline logic is combining consecutive lines from the same file based on P/F tags? |
We experiencing a similar Issue, but i´m not shure if it´s really the same.
Now it happens that K8s rotates the logfile and truncates the JSON logline randomly: The log then continues in a new File: Our expectation now is that fluentbit will put these two lines together again and send them to Elasticsearch as one line like this (manual) example:
Is this expectation correct? Or is the Issue maybe related to the K8s log rotate? regards, |
Hi |
Is this happening in fluent-bit 2.1 or 2.0? |
actually I'm working with 1.9.7 but I didn't see any fix on this so I assume it is still happening. |
The thing is that there supposedly there should be different multiline component instances which is why I'm asking about 2.0 or 2.1. Since 1.9 and 1.8 are no longer supported the best approach to get this fixed would be reproducing it in either 2.0 or 2.1 so we can take a look. |
Hi leonardo seems that something is wrong with the internal fluentbit metadata containing the timestamp or something else related to metadata I think. Anyway regarding the mixed multilines between streams , |
It seems that it could have been missed by the person who wrote the notes. Yes, there was an issue in versions 2.1.0 and 2.1.1 with the default value of the In fluent-bit 2.1.2 that was amended to retain backwards compatibility with fluentd, older fluent-bit versions and compatible systems which in turn means that when a user wants to interconnect two fluent-bit 2.1+ instances using the I hope this makes things a bit clearer. |
Hi Leonardo
2023-05-09T13:43:55+03:00 [0] lrc.saas.srl-worker: [[1683629033.552235588, {}], {"stream"=>"stdout", "logtag"=>"F", "log"=>"2023-05-09T10:43:53.552Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||&&&&&&&&&&&&&&&& Same Info 2|| ", "kuberemoved"=>{} |
Yes , thanks, just out of curiosity , when do you know when fluentd will be upgraded to support the metadata ? |
BTW - just to clarify , I use my own custom multiline parser , which assumes a space at the begining of the log should be appended to the previous log . since all our logs never start with a space . |
I don't know when / if fluentd will be updated to support metadata. We submitted a forward protocol enhancement proposal but I'm not the one in charge of it and I don't have any information about it at the moment. Do you think it'd be possible for you to put together a reproduction case we can use to reliably reproduce the issue and work on it? Ideally it would be a self contained case with the required configuration files, scripts used to produce the input, sample input files (if any) and instructions. I'm sure that would speed up the process tremendously and would be highly appreciated. Thank you very much. |
Hi Leonardo |
The mixed logs. |
Hi @leonardo-albertovich
Thanks. 2023-05-09T13:46:13.769279129Z stdout F 2023-05-09T13:46:13.769Z - info: EnvName=pcoe-aws|TenantId=hub-system|PID=1|ModuleId=worker|JobId=0||$$$$$$$$$$$$$$$$$$$$ Samine Info|| |
@ryan65 @leonardo-albertovich Both Docker runtime and CRI-O show this behaviour, although Docker cuts off at 16k and CRI-O at 8k, so you will notice it sooner / more often with CRI-O. The whole point is that Fluent-Bit should recognize the stream and perform the concatenation logic only for lines from the same stream. Simple as that. |
hi @mdraijer |
From what I can see at the code (and my limited knowledge of the multiline parser system) the stream segregation feature should be there but I need to take a deeper look into it to be able to make a proper assessment. I'll assign this issue to myself and I'll take some time to reproduce it as time permits (probably next week) to wrap it up as soon as possible. At first I thought it could be an issue with this regular expression having an excessively greedy time pattern but that should alter the results so I doubt that's the case. |
@leonardo-albertovich , |
Quick update: @Syn3rman will work on this issue next week. |
Hi , not sure how to try it out, is there a docker image version tag available to download? |
@Syn3rman I see 2 merged PR's in this thread. Does that mean that your fix (from May 2023, sorry for the late response), is already in some release? If yes, from which version is it included? |
Bug Report
Describe the bug
When a container application produces log messages that are split by the container runtime into multiple parts, and there are log messages written to stdout and stderr more or less at the same time, it is possible that Fluent Bit does not concatenate those parts.
To Reproduce
Create an application that writes a couple of long lines to stdout and stderr and run in a CRI-O runtime (e.g. Openshift Kubernetes cluster).
Expected behavior
The output (in our case Elastic) shows each log message complete as a separate document.
Your Environment
Additional context
Note that this is not about the multiline filter (see e.g. #4309), nor about the old multiline parser (see Tail - Multiline).
The text was updated successfully, but these errors were encountered: