-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: span compression #2623
feat: span compression #2623
Conversation
@@ -1181,3 +1181,57 @@ require('elastic-apm-node').start({ | |||
] | |||
}) | |||
---- | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Note: docs taken from java agent's docs of the same field.
@@ -7,7 +7,8 @@ const agent = require('../../../..').start({ | |||
metricsInterval: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Note: Span Compression can effect when a span is recorded, which can create race-like conditions in the existing test suites when enabled. To counter this, spanCompressionEnabled: false
has been set in these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with setting spanCompressionEnabled: false
by default until the implementation is complete. It would be helpful to mark that config var as "experimental" in configuration.asciidoc until the implementation is complete, in case we do releases in the interim.
@trentm Design's been changed to explicitly match the pseudo code, per your request. Also, the other feedback I'm addressing has been addressed. Finally -- I wasn't sure if your pseduo code request extended to the fact that the logic for everything was broken out into a |
Co-authored-by: Trent Mick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This may be something for a separate change and I think it points to an issue in the span-compression spec. I recall Felix mentioning something about the span-compression spec needing to deal with the case of an ended parent.)
If you run the following script with the current PR code:
// A bug in span-compression where a child span gets buffered to an
// *already ended* parent, which results in the child span never being sent.
var apm = require('./').start({
metricsInterval: '0s',
centralConfig: false,
cloudProvider: 'none',
spanCompressionEnabled: true
})
function main () {
const t0 = apm.startTransaction('t0')
const s1 = apm.startSpan('SELECT FROM a', 'db', 'mysql', { exitSpan: true })
s1.setDestinationContext({ service: { name: 'mysql', resource: 'mysql', type: 'db' } })
setTimeout(() => {
s1.end()
}, 100)
// Create an async task which captures the current async run context
// (i.e. apm.currentSpan === s1)
setTimeout(() => {
// s2 is a child of s1 *even if s1 may have already ended*.
const s2 = apm.startSpan('SELECT FROM b', 'db', 'mysql', { exitSpan: true })
s2.setDestinationContext({ service: { name: 'mysql', resource: 'mysql', type: 'db' } })
// When s2 is ended, it is compression eligible, so it is buffered on its
// parent (s1). However, s1 has already ended, so the current span
// compression code will never send the buffered s2.
s2.end()
}, 200)
setTimeout(() => {
t0.end()
}, 300)
}
main()
Then you get a trace like this:
trace 153dcb
`- transaction 139abe "t0" (303.727ms, outcome=unknown)
`- span b57c98 "SELECT FROM a" (103.077ms, mysql)
This is broken because the "SELECT FROM b" span is missing. The issue is that we've created a case where a child span ends after its parent has already ended. This results in the span potentially (if eligible for compression) being buffered on that parent, and then never sent. (While the example above looks contrived, it isn't actually too unexpected in Node.js code. There is also a case with async functions where a user doing manual instrumentation can have this happen unexpectedly: #2611.)
With some debug-prints, this is what happens in the example above:
% node compressionplay-break-children.js
XXX -- Instrumentation.addEndedSpan(<Span "SELECT FROM a">)
XXX parent has no buffered span: buffer <Span "SELECT FROM a"> on <Span "t0">
XXX -- Instrumentation.addEndedSpan(<Span "SELECT FROM b">)
XXX parent has no buffered span: buffer <Span "SELECT FROM b"> on <Span "SELECT FROM a"> <-- The bug is here, span "SELECT FROM a" has already ended.
XXX -- Instrumentation.addEndedTransaction(<Transaction "t0">)
XXX send buffered <Span "SELECT FROM a">
If you set spanCompressionEnabled: false
in this script, you get the expected two spans.
spec issue
I think this points to an oversight in the spec in the onChildEnd
pseudo-code at https://github.com/elastic/apm/blob/main/specs/agents/handling-huge-traces/tracing-spans-compress.md#span-buffering. Perhaps that could change to:
...
void onChildEnd(Span child) {
if (ended || !child.isCompressionEligible()) {
...
Here is a potential fix in this PR:
--- a/lib/instrumentation/index.js
+++ b/lib/instrumentation/index.js
@@ -355,25 +358,31 @@ Instrumentation.prototype.addEndedSpan = function (span) {
this._encodeAndSendSpan(span.getBufferedSpan())
}
- if (!span.isCompressionEligible()) {
+ if (span.getParentSpan().ended || !span.isCompressionEligible()) {
// span is not compressible -- send
const buffered = span.getBufferedSpan()
if (buffered) {
I can open a spec PR for discussion.
|
Co-authored-by: Trent Mick <[email protected]>
Co-authored-by: Trent Mick <[email protected]>
Implemented the suggested fix and adopted the reproduction code as a test in the repository. |
Implements: #2600
Implements an initial version of span compression that compresses/collapses both identical and same kind spans into a single "composite" span.
This PR does not identify compressible spans by the exit span and injected trace context rules -- instead a hard coded list of span types is referenced. The work to use exit spans and injected trace context to identify compressible spans will be completed in
Since this is an initial version we have set the default of
spanCompressionEnabled
tofalse
. We will make thistrue
by default when finishing up in #2604. (I went back and forth on this one -- so if you think we should enable by default just let me know)Checklist