-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Move batching to exporterhelper
#4646
Comments
A couple quick counterarguments:
These are not necessarily strong enough to block the proposal, but I think it is worth thinking about these.
I think we discussed the possibility of adding persistency to the batch processor as well. Admittedly it may result in awkward user experience though since now you need to configure persistency in 2 different places. |
Another benefit of eliminating batch processor is that it is one less thing to configure. This will simplify the user experience. |
While responding to your comments I realize that a possible thing is to offer both, do you think that is too confusing?
This is a good argument to possibly keep the processor as well, maybe using the same shared library that the exporterhelper uses. This comes with another downside that I haven't thought, if we want to have a fast batching we need to mutate the pdata (move them to the batch) which causes N (num exporter) copies needed even if the exporter may not need to mutate the data itself.
Indeed, I expect thought the work to be minimal, but definitely need to think about.
Need to think/propose of 1-2 components to benchmark for this, maybe the new "transform" processor since will be one of the most used? Any other idea? |
Yes, I think it will be confusing if we do batching in 2 places, with 2 different overlapping sets of configuration options available. |
The exporters are currently not allowed to mutate data: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter#data-ownership |
Since "transform" doesn't exist yet perhaps test with a couple basic ones for now, like |
you know that nobody reads documentation :)) One of them is the signalfx exporter which uses https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/batchperresourceattr/batchperresourceattr.go. Luckily after my change to move "capability" to Consumer, we now do the right thing for exporters because they also have capabilities and we clone if we need there as well. So the documentation is out-dated unfortunately. |
I think supporting both would be confusing. The 'customization of the "size" function' feature would be really helpful for us: we want batches to be below our intake's API max size, but currently we can't really give a precise number for the max batch size, since what we care about is the number of bytes in the payload we export, which can't be recovered from the number of points in a batch. |
@dashpole would be good if you drive this to resolution, see open-telemetry/opentelemetry-collector-contrib#7134 which you also need :) |
There's an edge case in which |
Correct me if I'm wrong, but we can lose data this way even without a crash, if we have an exporter with a queue and a batch processor, the data can be dropped due to the exporter queue being full, and the batch processor makes it impossible to propagate the error back to the receiver. I don't know that I have an informed opinion on whether batching should happen in a separate processor or in an exporter, but it definitely shouldn't happen in both, for the above reason, unless we're willing to accept this kind of data loss and abandon any hope of applying backpressure to the receiver from any component after the batch processor. |
For now, we've implemented batching within the gcp exporter. There is only one "correct" way to batch for GCM, so it seemed sensible to make it work correctly without requiring configuration. It is also needed to work around other quirks of the GCM api, such as #4442. open-telemetry/opentelemetry-collector-contrib#7134 isn't blocking us--we've implemented our conversion in our exporter. That proposal would only be valuable if other exporters that need summary splitting also wanted to use it. I agree with others that supporting both processor and exporterhelper would be confusing. For now, I think it is reasonable to ask unusual use-cases like ours to re-implement batching themselves. Making it available as a library would be nice, but probably not worth the dependency unless we wanted to mark it stable. |
@bogdandrutu what do you think about making the batch processor conduct backpressure better in the meantime? Currently, batch processor simply ignores consumer errors. If a transient error happens later in the pipeline, the data is effectively dropped and this information isn't propagated backwards. For example, this means that if we have an exporter with a full queue at the end of the pipeline, we'll keep batching and dropping data, whereas we really should return a transient error to the receiver so it can make its own flow control decisions. For example, we could:
Right now, it's impossible to have batching and reasonable error propagation. Fixing this should be possible right now, and wouldn't prevent us from moving batching to exporterhelper later. |
@swiatekm-sumo if you want to help with this it would be great if you could do the benchmarking to prove that we don't need the separate batch processor to have reasonable performance. If we can demonstrate this then we should eliminate the batch processor and make batching an exporterhelper feature. This would be more preferable than spending time improving the batch processor which we end up retiring any way. |
@tigrannajaryan the reason I'd prefer to start with improving batch processor is that the change to it would be fairly straightforward and would improve user experience right now, as opposed to several months from now after we're done with the migration involving several breaking changes. I'm also not entirely convinced this is the right thing to do. Even putting efficiency aside, removing the batch processor would have various knock-on effects - for example it would make https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/groupbyattrsprocessor worthless. I can take up the benchmarking either way; I don't think these activities are mutually exclusive. But it seems like a much more open-ended project, as I'd need to come up with a bunch of representative configurations on my own. |
@swiatekm-sumo Sure, if there is a maintainer/approver who wants to review the batch processor improvements please feel free to go ahead with what you suggest (I won't be able to help review the improvements). If you can help with benchmarking it will be an added bonus :-) |
@swiatekm-sumo I am a bit reluctant because:
These are already implemented in the exporter helper as "`queue retry" :) that is yet another argument to unify them. |
I like the idea. Another reason to do this is that the batch processor removes the client context and has to be put in a particular place in the pipeline to avoid causing problems to the processors that use the context which can be easily misconfigured |
exportehelper
exportehelper
exportehelper
exporterhelper
Right now batching capability is offered as a processor, and this is extremely generic in our current pipeline model, which is a great thing, but also comes with limitations:
sizer
can actually work using the exporter wire format. So batching can happen after data serialization.This will also fix some more problems:
The proposal is to look into offer the "batching" capability similar to timeout/queue/retry logic that we have in the
exporterhelper
.The text was updated successfully, but these errors were encountered: