-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Simple/Batch processor to concurrently invoke Exporter.Export() #4231
Comments
What's interesting is Logs spec has some language which could be seen as permissive: opentelemetry-specification/specification/logs/sdk.md Lines 289 to 291 in 0a78571
If we consider changing the concurrency of the simple processor an advanced scenario 😄 I have a prototype in .NET of @cijothomas's idea:
Prototype: open-telemetry/opentelemetry-dotnet#5758 If users decorate a log or span exporter like this... [ConcurrencyModes(ConcurrencyModes.Multithreaded | ConcurrencyModes.Reentrant)]
public class MyLogExporter : BaseExporter<LogRecord>
{
// Not shown
}
[ConcurrencyModes(ConcurrencyModes.Multithreaded | ConcurrencyModes.Reentrant)]
public class MySpanExporter : BaseExporter<Activity>
{
// Not shown
} Then the simple processor will allow PS: We have the same design (attribute-based) already in .NET for Push/Pull metrics exporter decoration. |
I think this would be valuable in a few different scenarios I can think of outside the ones that were identified -- especially around export of OTLP-JSON to an intermediate transmission medium (e.g., writing to a Lambda buffer where the OTLP will be reconstructed and forwarded by an extension layer) |
@open-telemetry/technical-committee this seems reasonable, are you ok if we mark it as |
The TC accepts this issue and I have volunteered to act as sponsor. The topic of concurrent export is something I have a lot of experience with across OTel, and I have implemented concurrent batch processor logic in both the SDK and the Collector. As I understand it, we have several concerns here: Enabling Exporter concurrencyCan the exporter accept concurrent calls? This is a "capabilities" question, one that can be answered in language-specific ways. For example, some languages have marker traits, some use interfaces for the same, and so on. Each language should come up with an idiomatic way for the SDK to determine whether the exporter is willing to accept concurrency. Enabling Processor concurrencyThe functional requirement is that Export() can be called concurrently, and it appears useful for both classes of processor in today's specification. However, we cannot always permit unlimited concurrency, and SDKs need some sort of limit or users would not trust them. I have seen two forms of limit that can apply, and both are successful: we can limit actively-used memory, or we can limit concurrency and batch size. The most important aspect of that is that the Exporter is able to impede the Processor to prevent it from consuming more resources when the limit is reached. Simple processor caseThe simple processor is special because it calls Export synchronously while blocking the caller. Without concurrency, each call to the simple exporter blocks until the exporter can give it service. If we permit exporter concurrency in this case, users will get what they expect: their logs and spans are sent one-by-one, and the greater the concurrency factor, the less likely they are to be queued waiting for export. In this case, I would not expect unlimited concurrency to be a problem in most cases. When the user has selected a simple processor, it means the exporter directly impedes the caller. The total resource footprint of the SDK, Processor, and Exporter in this case is proportional to the application's usage of telemetry APIs, thus whatever limits the application has for itself carry over to the SDK and its resources. In other words, we expect a constant-factor increase in memory usage resulting from each call site, but callers are blocked while this happens, so the application naturally limits itself and the simple processor may not need to impose additional concurrency limits. In this case, I am not convinced that it it is worthwhile to support configurable concurrency -- users are probably best served by an unlimited-concurrency simple processor if the exporter supports it, and users are not well-served by a single-concurrent-export processor. However, we have a stable specification so I would not wish to change the existing behavior. This suggests we can could:
I favor the first option because it is the only truly simple alternative. IMO we should not call a processor with a concurrency limit "simple". Batch Span/Log processor caseIgnoring the Metrics SDK, discussed below, there are several ways to limit resource impact. The most important outcome from selecting the batch processor option is that telemetry data will drop instead of exceeding the configured resource footprint. Here are some ways we can limit resources to control when dropping happens:
From experience, I favor the first two approaches. When you have a limit on pending items or bytes, there's no great need to also limit concurrency. However, the third option is the easiest to implement and we can make an incremental change to the batch processor specification to add concurrency under this approach. So, while recognizing that pending item-count or bytes-count limits are closer to how a user thinks about governing SDK resources, extending the batch processor with a configurable concurrency limit (default = 1) makes sense. Metrics caseThe OTel metrics SDK specification does not directly lend itself to concurrent exports, because concurrency introduces a risk of partially-complete data covering an interval. Today's specification has an all-or-none quality which is, I think, a good default. However, I've seen cases where an SDK emits so many metrics that a single collection cycle can exceed maximum-request-size limits in a typical pipeline. When this is the case, a user may desire to split the export into smaller requests and export them concurrently. Equivalent functionality exists in the OpenTelemetry collector, where the batch processor component is capable of both splitting and combining requests into batches with a maximum item count. If users are interested, we could work on specifications for push-based metrics exporters to support splitting requests so that concurrent export is supported, however there is no existing "Metrics Processor" concept in our specification. Additional resourcesThere are two different kinds of batching functionality in the Collector today, and both have a problem with concurrency. I have proposed fixing the batch processor as follows: Others are investigating how to extend the Collector's exporterhelper-based batcher with concurrency, but there is a problem with code organization standing in the way. Lightstep uses custom OTel metrics and tracing SDKs. Our SDKs have replaced the OTel-Go metrics and trace exporters with pipelines based on OTel Collector components. Therefore, our SDKs can be configured with the same batching parameters that the collector has, and we use this both for concurrency (e.g., of spans) and for splitting (e.g., of metrics). In order for this to work, we have developed a custom "concurrentbatchprocessor" component for the collector, which we are currently trying to contribute back to the upstream batch processor component: |
What are you trying to achieve?
Current spec restricts Simple/Batch processors from calling
exporter.export()
concurrently. This can be limiting in some scenarios:In short, there is need for allowing both Simple,Batch processors to call export() concurrently.
To keep back-compatibility, we can do something like:
Additional context.
Previous attempt to make this happen, but only for SimpleProcessor: #4163
Opened for Logs, but equally applicable for Tracing too.
The text was updated successfully, but these errors were encountered: