-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/filelog] Filelog receiver missing reading log lines in high throughput scenario #35137
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
There are some circumstances where the receiver would not be able to keep up with log rotation. Specifically, if the files are being rotated to a location where they are not found in subsequent polling intervals, or if the log files are deleted before the receiver has had a chance to consume them (due to a rotation configuration that defines the max number of backups). It sounds like you have a high throughput scenario so it seems possible. Are the rotated files being deleted eventually, and if so, can we establish in some way the frequency? If you can provide an estimate of the max number of backups and the average log size in bytes, then I think we can work out whether or not such a limitation is in play. The other factor which I'm wondering about is whether the debug exporter is too slow to keep up, and may in fact be causing backpressure on the receiver which unnecessarily delays reading the logs. You could consider using the recievers:
filelog:
include: ...
connectors:
count:
processors:
deltatocumulative:
exporters:
debug:
service:
pipelines:
logs/in:
receivers: [ filelog ]
exporters: [ count ]
metrics/out:
receivers: [ count ]
processors: [ deltatocumulative ] # aggregate "+1"s into cumulative count
exporters: [ debug ] # only print the counts, rather than the logs themselves |
Let me share a little more around the directory and how the log rotation is being handled. Below is the directory structure.
The logs are continuously written to Here is the snapshot that describes how quickly For handling the case where debug exporter was too slowThe config you shared with
I used it without the
Since the Open questions/ideas @djaglowski
|
@djaglowski It seems like this issue might be related to this issue. What do you say? Reducing the polling interval can be beneficial, but only up to a certain point. @ceevaaa i think you can utilize the fileexporter in this case to gather all the logs and verify the count. |
Just to confirm, the file is definitely moved and
I'm not sure the status of the The count maxing at 100 is a consequence to batches being emitted by the filelog receiver once they reach 100 logs.
I agree with this. If you're not using the
Only if the receiver is able to keep up with ingestion. For example, if you set the poll interval to 10ns but it takes 100ms to read each time, then you're essentially just telling it to start a new poll as soon as the previous one finishes. All that said, it sounds like there may be a real issue here. I'd like to reiterate my request for a decent estimate of the average log size. This would help us establish what the behavior should look like. We know:
If we know the average log size, we can compute how often the file should be rotating. This may or may not lead to an explanation but I'd appreciate if we can start there so I can understand the scenario better.
@VihasMakwana I don't see the connection. That issue requires there to be multiple separate files. This issue is about a single file written to by multiple processes. |
Here are the updated details about the logs (average taken over 5 different log files) @djaglowski.
I have asked the team to confirm this. For reference, we are using
I used the Update as on Sep 16We set the At least the limited testing we could do today (manually picking random I will continue to test and update you guys here. PS: Keeping it open for some more time, till we get everything stabilized. |
I see. What I meant to say is that reducing the poll interval will only help to a certain extent; if the file is large, it might not respect the poll_interval and could still cause issues. |
""Once the file reaches 250MB in size, it is renamed and archived (rotated), and a new log file begins."" - team Therefore no truncation takes place @djaglowski. |
Also, what do you guys think is better, esp in high throughput scenarios? A. Reading logs from a file using the filelog receiver. I know the answer might depend on multiple factors but in general what is better. |
In theory it is more efficient to bypass the files, since you then don't have to write, read, and reinterpret the logs again. |
Component(s)
receiver/filelog
What happened?
Description
I am trying to read logs from a log file.
A bit about the log files
Is there a limit on how quickly can filelog receiver read logs from a file ?
A snippet from the log file
Expected Result
All the log lines are read, processed and sent to next hop.
Actual Result
The filelog receiver misses reading lot of log lines. In fact almost 70% of the log lines are missed while reading.
Collector version
v0.104.0
Environment information
Environment
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CPU: 4vCPU
RAM: 8GB
OpenTelemetry Collector configuration
Internal Telemetry
In the below image the log receiver skips tons of lines.
The text was updated successfully, but these errors were encountered: