-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/loki] retry/queue causes any log to be blocked until queue resolves #18060
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I have mixed feelings about this: you certainly have a good point about all tenants sharing the same pipe to the backend, but perhaps you could use the routing processor and have each tenant on their own exporter? I'm not sure the logic to split a connection per tenant would belong to the Loki exporter. I'm eager to hear other component owner's opinions. What do you think @kovrus and @mar4uk? |
I share your thoughts. In retrospect; I'm wondering what retries could potentially cause this. At the moment it's more a bug that rejected logs in Loki get retried. Normally it should be more related to service/network outage impacting the entire "backend" and not a specific stream. But I could be mistaken here. |
I agree that loki exporter shouldn't be responsible for splitting connection.
was fixed in #18083
Probably fix #18083 has eliminated impact. I don't think we should implement splitting logic on exporter side. The logic is already implemented on Loki side by sending |
Well, the thing is tho, if we use a separation of concerns from the start, we still needed that fix, but it also would mean it would not blow up the entire collector stack for all tenants. |
The same concern applies to other aspects of the collector, not just the connection to the backend. I would argue that each tenant should have its exporter, perhaps tying all together with connectors, similar to what we have with the routing processor today. cc @kovrus |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I created a thread on Slack inviting folks to share their thoughts and ideas around this: https://cloud-native.slack.com/archives/C01N6P7KR6W/p1686168282317269 |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
@wiardvanrij , A few months later, I wanted to check back and see how you progressed with this. The thread I created generated some thoughts and ideas, but I still have a feeling that we cannot prevent noisy neighbors from influencing other tenants if they share the same collector instance. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Component(s)
exporter/loki
What happened?
Description
If the queue starts building up, it blocks all other logs (that would be perfectly fine to ship) to be shipped to Loki. For example, it would be possible to stop shipping logs to Loki entirely on a ~5k logs/s collector by pushing ~100 logs/s that will get rejected by Loki (for ease of testing, using a log that has a very old timestamp and then gets retried due to this bug: #18059).
Steps to Reproduce
Expected Result
bar
should have zero impact on tenantfoo
Actual Result
Completely deadlocked environment
Collector version
otel/opentelemetry-collector-contrib:0.66.0
Environment information
Environment
K8s
OpenTelemetry Collector configuration
Log output
No response
Additional context
Example visualisation:
The text was updated successfully, but these errors were encountered: