-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generated pdata code should create interfaces #1346
Comments
If we are getting to the discussion of prioritizing semantics, probably we should use oneof where we need, rather than having logical definitions for that. |
@jmacd also we will deal with interface or concrete types internally in the collector once the protocol is stable. I don't think the way how we generate the internal format should be influencing the protocol. |
I believe we are settled on pdata for 1.0. I am going to close this, please reopen if there are any objections. |
…ry#1346) * test: Add CI scenarios for eBPF Chart. open-telemetry#964 Signed-off-by: Yang, Robin <[email protected]> * test: Add CI scenarios for eBPF Chart. [issue-964] Signed-off-by: Yang, Robin <[email protected]> * test: Add CI scenarios for eBPF Chart. [issue-964] Signed-off-by: Yang, Robin <[email protected]> * test: Add CI scenarios for eBPF Chart. [issue-964] Signed-off-by: Yang, Robin <[email protected]> * test: Add CI scenarios for eBPF Chart. [issue-964] Signed-off-by: Yang, Robin <[email protected]> * test: Add CI scenarios for eBPF Chart. [issue-964] Signed-off-by: Yang, Robin <[email protected]> * fix: EBPF_NET_CRASH_METRIC_PORT env in ebpf chart not being quoted issue. Signed-off-by: Yang, Robin <[email protected]> * feat: grant ebpf k8s collector job watching permission. Signed-off-by: Yang, Robin <[email protected]> * feat: grant ebpf k8s collector job watching permission. Signed-off-by: Yang, Robin <[email protected]> * feat: grant ebpf k8s collector job watching permission. Signed-off-by: Yang, Robin <[email protected]> --------- Signed-off-by: Yang, Robin <[email protected]>
Is your feature request related to a problem? Please describe.
I've been working to answer a performance question about OTLP in the collector, specifically w.r.t. metrics. I have run Tigran's benchmarks and confirmed the expected performance of the change in open-telemetry/opentelemetry-proto#162. It's behaving as expected: slightly higher overhead and slightly higher memory usage, which leaves us with a question: should we let performance dictate the structure of the protocol, or should we let semantics and convenience guide us, and is this really an exclusive choice? A the root of this, protocol buffers suck in Go and we can't really fix that on our own. More in this 🧵
Describe the solution you'd like
I want to propose a long-term solution to this that lets us move quickly in the short term. We can address the performance question later. Here are the details:
The changes in proto#162 degrade performance slightly, as expected, but we could regain this performance (for the most part) by hand-coding a decoder for use in the collector. In some ways the code is already set up for this: we have the consumer/pdata package which contains a wrapper around the protocol structure. We could (in theory, almost) introduce a second representation of OTLP data to facilitate this.
Describe alternatives you've considered
One problem is the generated code created by cmd/pdatagen generates structs, not interfaces. I believe we could change the generated code to use interfaces w/o a loss of performance. Instead of structs containing pointers, we'll use pointers to structs that satisfy an interface. Then we could introduce a second representation that did not use the proto message as the internal representation in a pipeline.
This is a nice story, until some module of code wants to modify or directly access a proto message. The more code that's written to directly access a proto, the worse this situation gets. There is a little bit of code (the batch processor, mainly) that combines proto messages in a pipeline, but it's not doing so recursively, so it's a special case.
Here's what I want: I want to move forward with the protocol change in #162 as I can show that it's not a large change in performance. As secondary goals: (1) change the generated pdata code to use interfaces, to allow multiple representations of OTLP in the collector, (2) potentially add a hand-coded implementation of OTLP that saves CPU and memory, (3) potentially refine the OTLP design to perform better by design: this would include interning of label sets to avoid repetition and to facilitate easier processing in the collector.
Lastly I want to note that before #162 the Metric struct is larger (because it has four slices) and after #162 the DataPoint struct is larger (because it is a logical oneof). Before #162 there are three unused slices (72 bytes wasted), and after #162 there are about the same number of wasted bytes (depending on whether exemplars are separated from raw_values). So when we have a single DataPoint per Metric the change in #162 is a break-even from a memory consumption point of view.
The text was updated successfully, but these errors were encountered: