exporterhelper support for back-pressure #8245

jmacd · 2023-08-16T23:43:43Z

Is your feature request related to a problem? Please describe.

For an exporter that uses exporterhelper, I have the following requirements.

When a data item is consumed by the exporter, the exporter should have an option to block until the exporter succeeds
When a data item is consumed by the exporter and the request fails, the exporter should return the error to the caller

These two requirements need to be met whether or not there are retries enabled, and whether or not there there is a queue, meaning whether or not QueueSettings.Enabled is true. The problem is that when QueueSettings.Enabled is true, which enables parallelism (i.e., NumConsumers > 1), then the caller will not get the error from the downstream receiver and the caller has no way to block until the receiver returns.

Here is where today's exporterhelper w/ QueueSettings.Enabled==false will block until the call returns and return the caller's error:

opentelemetry-collector/exporter/exporterhelper/queued_retry.go

Line 291 in ebed8dd

err := qrs.consumerSender.send(req)

Here is where, when QueueSettings.Enabled==true and the queue is full, the code will not block and returns a generic error immediately, instead.

opentelemetry-collector/exporter/exporterhelper/queued_retry.go

Line 312 in ebed8dd

return errSendingQueueIsFull

Here is where, when QueueSettings.Enabled==true and the queue is not full, the code returns a nil without waiting to hear from the downstream receiver.

opentelemetry-collector/exporter/exporterhelper/queued_retry.go

Line 316 in ebed8dd

return nil

As it stands, the only way for an exporter to block on the downstream service and receive its responses synchronously is to disable queueing. When Queueing is enabled, there are two problems (a) no option to block on queue-full, (b) no option to wait on response.

Describe the solution you'd like
Optional behavior that enables blocking on a queue full and waiting for a response.

Describe alternatives you've considered
No alternatives come to mind.

Additional context
The OTel-Arrow project has a developed a bridging-mode for OpenTelemetry Collectors that enables high-compression on the network link between two collectors. For this setup to work reliably and under high load, we need the otelarrow exporter to be configured with a blocking pipeline. We would recommend both of the optional behaviors described in this issue to otelarrow users.

The text was updated successfully, but these errors were encountered:

dmitryax · 2023-08-17T06:03:33Z

I'm not sure I fully understand the ask here. Enabling the queue introduces asynchronous behavior by design. Why do you need the queue if there is a requirement for the blocking/synchronous pipeline? What's the purpose of the queue in that case?

jmacd · 2023-08-17T15:25:08Z

I am more-or-less requesting a feature that I would expect from the batchprocessor.

As we expect to move batchprocessor functionality into the exporterhelper, I expect queue will be a place where requests stand waiting to be batched and sent. When I disable the queue, there is no limit on the number of concurrent senders and no possibility of batching.

When I enable the queue, I expect to get batching. More importantly, I put a limit on the number of outgoing RPCs (i.e., NumConsumers) which can be important in controlling load balance.

jmacd · 2023-08-17T15:30:05Z

I will put this another way. I believe the batching facility should be independent of the facility for backpressure.

When the exporterhelper has the backpressure option I'm looking for and queueing enabled, the caller has to stick around until each data item is processed and return the error (as well as block on the caller's context timeout).

I've been working on the OTel Arrow project, which has gRPC-streaming exporter which does the same sort of thing. For each incoming request, it allocates a make(chan error, 1) to receive the response, places that channel into the queue along with the data, and then falls into a select awaiting either the response or the timeout.

See the code: https://github.com/open-telemetry/otel-arrow/blob/95211c5a139c84f7117905531dd45e2122778f06/collector/exporter/otelarrowexporter/internal/arrow/stream.go#L423

jmacd · 2024-06-18T22:43:59Z

I am happy with the latest developments in the batching support. We still require the queue sender to enable concurrency in the batch processor => so we have back-pressure but with unacceptable concurrency loss. Therefore, I will leave this open. @moh-osman3 is looking into a concurrency sender option that would let us replace the concurrentbatchprocessor.

jmacd mentioned this issue Aug 16, 2023

Support for blocking OTel-Arrow pipeline open-telemetry/otel-arrow#11

Closed

jmacd mentioned this issue Sep 23, 2024

Concurrent batch processor features → batch processor #11248

Closed

jmacd mentioned this issue Sep 30, 2024

Batch processor concurrency and error transmission #11308

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exporterhelper support for back-pressure #8245

exporterhelper support for back-pressure #8245

jmacd commented Aug 16, 2023

dmitryax commented Aug 17, 2023

jmacd commented Aug 17, 2023

jmacd commented Aug 17, 2023

jmacd commented Jun 18, 2024

exporterhelper support for back-pressure #8245

exporterhelper support for back-pressure #8245

Comments

jmacd commented Aug 16, 2023

dmitryax commented Aug 17, 2023

jmacd commented Aug 17, 2023

jmacd commented Aug 17, 2023

jmacd commented Jun 18, 2024