-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/filelog]High memory consumption of collector and dropped logs #31129
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
At a glance this appears to be a back pressure problem but it's unclear where. Have you tried using the exporter's retry & queueing functionality to manage this? Example |
cc: @dmitryax |
@djaglowski Thanks for your reply. I tested your suggestion and it didn't help I do not think I have described the test scenario in detail OpenTelemetry Collector configuration
Log output
Result |
Here is second scenario where the collector is now working as expected OpenTelemetry Collector configuration
I used this script to generate a continuous flow of logs to a file
Log output 2024-02-09T14:08:39.208Z info adapter/receiver.go:45 Starting stanza receiver {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2024-02-09T14:08:39.284Z warn fileconsumer/file.go:51 finding files: no files match the configured criteria {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer"}
2024-02-09T14:08:39.290Z info adapter/receiver.go:45 Starting stanza receiver {"kind": "receiver", "name": "filelog/continues", "data_type": "logs"}
2024-02-09T14:08:39.315Z info [email protected]/service.go:165 Everything is ready. Begin running and processing data.
2024-02-09T14:08:49.330Z info fileconsumer/file.go:268 Started watching file {"kind": "receiver", "name": "filelog/continues", "data_type": "logs", "component": "fileconsumer", "path": "C:\\test.txt"}
2024-02-09T14:08:59.495Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 5}
2024-02-09T14:09:09.505Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 5}
2024-02-09T14:14:09.463Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 5}
2024-02-09T14:14:19.336Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 5}
2024-02-09T14:14:19.420Z info fileconsumer/file.go:268 Started watching file {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "C:\\ERRORLOG"}
2024-02-09T14:14:19.426Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:20.337Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:20.368Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:21.211Z info memorylimiter/memorylimiter.go:222 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 167}
2024-02-09T14:14:21.359Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:21.453Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:21.847Z info memorylimiter/memorylimiter.go:192 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 178}
2024-02-09T14:14:21.953Z warn memorylimiter/memorylimiter.go:229 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 178}
2024-02-09T14:14:24.212Z warn memorylimiter/memorylimiter.go:203 Memory usage is above hard limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 265}
2024-02-09T14:14:24.453Z info memorylimiter/memorylimiter.go:192 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 76}
2024-02-09T14:14:24.463Z info memorylimiter/memorylimiter.go:215 Memory usage back within limits. Resuming normal operation. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 76}
2024-02-09T14:14:26.020Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:26.025Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:27.034Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:27.121Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:14:27.457Z warn memorylimiter/memorylimiter.go:229 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 205}
2024-02-09T14:14:29.214Z info memorylimiter/memorylimiter.go:215 Memory usage back within limits. Resuming normal operation. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 136}
2024-02-09T14:14:29.341Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 5}
2024-02-09T14:14:30.215Z warn memorylimiter/memorylimiter.go:229 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 151}
2024-02-09T14:16:29.222Z info memorylimiter/memorylimiter.go:215 Memory usage back within limits. Resuming normal operation. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 35}
2024-02-09T14:16:29.779Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 6, "log records": 60}
2024-02-09T14:16:38.800Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 11, "log records": 1044}
2024-02-09T14:16:38.810Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:16:39.811Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:16:39.826Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:16:40.238Z info memorylimiter/memorylimiter.go:222 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 182}
2024-02-09T14:16:40.816Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:16:40.863Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:16:40.869Z info memorylimiter/memorylimiter.go:192 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 125}
2024-02-09T14:16:41.231Z warn memorylimiter/memorylimiter.go:229 Memory usage is above soft limit. Refusing data. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 193}
2024-02-09T14:18:42.212Z info memorylimiter/memorylimiter.go:215 Memory usage back within limits. Resuming normal operation. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 35}
2024-02-09T14:18:53.290Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:18:53.290Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 10, "log records": 1000}
2024-02-09T14:18:55.222Z info memorylimiter/memorylimiter.go:222 Memory usage is above soft limit. Forcing a GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 156}
2024-02-09T14:18:55.282Z info memorylimiter/memorylimiter.go:192 Memory usage after GC. {"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "cur_mem_mib": 42}
2024-02-09T14:18:58.586Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 6, "log records": 65}
2024-02-09T14:19:29.419Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 4}
2024-02-09T14:19:30.314Z info [email protected]/collector.go:258 Received signal from OS {"signal": "interrupt"}
2024-02-09T14:19:30.314Z info [email protected]/service.go:179 Starting shutdown...
2024-02-09T14:19:30.319Z info adapter/receiver.go:140 Stopping stanza receiver {"kind": "receiver", "name": "filelog/continues", "data_type": "logs"}
2024-02-09T14:19:30.364Z info adapter/receiver.go:140 Stopping stanza receiver {"kind": "receiver", "name": "filelog", "data_type": "logs"}
2024-02-09T14:19:30.383Z info extensions/extensions.go:59 Stopping extensions...
2024-02-09T14:19:30.383Z info [email protected]/service.go:193 Shutdown complete. Result From my experience with Fluent Bit, a more effective approach to monitor memory consumption involves checking it as early as possible to minimize backpressure. I assume that with proper configuration, receivers in the Collector should not impact one another |
To me, it is not clear why the pipeline is paused for some time, even if this message is displayed
I assumed that it should resume immediately after recover to below soft memory limit |
Another test, Handling High Volumes of Logs and Metrics with the Memory Limiter Processor in pipelines OpenTelemetry Collector configuration
Log output
Result Initially, OpenTelemetry Collector successfully collected the metrics. However, after processing a file containing a large number of logs (200,000), the collector encountered a soft limit on the pipeline metrics, resulting in the collection of only 120,000 logs. The situation resembled a "hard limit" scenario for logs, as the log flow was stopped and even after a 120-second pause, the logs were not retried |
Other test First OpenTelemetry Collector configuration
Log output for first Collector
Second OpenTelemetry Collector Configuration
Log output for second Collector
Result |
Another test Communication between 2 Instances of OpenTelemetry Collector with memory_limiter processors First OpenTelemetry Collector configuration
Log output for first Collector
Second OpenTelemetry Collector Configuration
Log output for second Collector
Results Test Scenario
Initially, the OpenTelemetry Collector was unable to send metrics to the second Collector and encountered a retryable error.
After the second collector became available, metrics were successfully transmitted. When a log file was moved and processed by the first collector, both collectors reached their soft limits the same situation is when the reach hard limit. At this point, the second collector started rejecting data, redirecting it back to the receiver. As a result, the first OpenTelemetry Collector encountered errors for both logs and metrics.
I expected that the first collector would obtain a recoverable error, such as "too many requests," so when the second collector resumes normal operation, pending requests can be resent instead of being dropped. At the end, after shutting down the second collector, the first collector displayed an error
|
I'd be curious to hear thoughts on how we address this. Do we need to remove the batching behavior in the adapter package? It's caused plenty of problems before but emitting every log in a separate |
I found OpenTelemtry specification how http error codes should be handled. It looks like the current behavior is not implemented correctly, and the error codes are not followed |
Why would we need to implement http error codes in a file reading receiver? |
Sorry, I did not specify that it is not related to the file log receiver, but to the OLTP receiver If I understand correctly, when setting up a chain like this: Collector 1 (client) with the OLTP exporter (or fluent-bit with output OLTP), and Collector 2 (server) with the OLTP receiver, the receiver on Collector 2 should handle HTTP status codes as specified in the specification. Am I right? |
Apologies @marcinsiennicki95, I got wires crossed with #31074. I'm not sure how I can help here. I'm going to close this issue but recommend the following:
|
Component(s)
receiver/filelog
What happened?
Description
I am currently configuring the OpenTelemetry Collector to gather logs through the FileLog Receiver, which are then transmitted to Fluent Bit via the OpenTelemetry exporter. Fluent Bit ingests these logs through the OpenTelemetry Input and forwards them using OpenTelemetry Output. In this configuration, OpenTelemetry acts as a proxy for Fluent Bit, allowing it to collect and process data from other various telemetry sources
Steps to Reproduce
Expected Result
Each of the logs should be processed. No error logs should visible on console
Actual Result
Without memory-limiting processor, the number of logs displayed in the user interface does not correspond exactly to 1 million and service consume a lot of memory
With memory limiter processor.
Collector version
v.0.93
Environment information
WindowsServer:2019-Datacenter:latest
OpenTelemetry Collector configuration
Log output
No response
Additional context
Log output with memory limiter.
Actual
OpenTelemetry Collector sends only a limited number of logs before reaching a memory limit, which starts the garbage collection (GC) process and stops further log processing. As a result, about 950,000 logs are not send. Only new generated logs are sent . Ie, Fluent Bit, when the memory limit is reached, pauses log entry and after a short pause resumes processing without losing logs
FluentBit Configuration.
Does the OpenTelemetry Collector offer a parameter analogous to fleuntbit
mem_buf_limit
for managing memory buffer constraints?The text was updated successfully, but these errors were encountered: