Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: span compression #2623

Merged
merged 31 commits into from
Apr 11, 2022
Merged

feat: span compression #2623

merged 31 commits into from
Apr 11, 2022

Conversation

astorm
Copy link
Contributor

@astorm astorm commented Mar 24, 2022

Implements: #2600

Implements an initial version of span compression that compresses/collapses both identical and same kind spans into a single "composite" span.

This PR does not identify compressible spans by the exit span and injected trace context rules -- instead a hard coded list of span types is referenced. The work to use exit spans and injected trace context to identify compressible spans will be completed in

Since this is an initial version we have set the default of spanCompressionEnabled to false. We will make this true by default when finishing up in #2604. (I went back and forth on this one -- so if you think we should enable by default just let me know)

Checklist

  • Implement code
  • Add tests
  • Update TypeScript typings
  • Update documentation
  • Add CHANGELOG.asciidoc entry

@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Mar 24, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Mar 24, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-04-06T06:12:31.764+0000

  • Duration: 21 min 29 sec

Test stats 🧪

Test Results
Failed 0
Passed 244817
Skipped 0
Total 244817

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@@ -1181,3 +1181,57 @@ require('elastic-apm-node').start({
]
})
----

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Note: docs taken from java agent's docs of the same field.

@@ -7,7 +7,8 @@ const agent = require('../../../..').start({
metricsInterval: 0,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Note: Span Compression can effect when a span is recorded, which can create race-like conditions in the existing test suites when enabled. To counter this, spanCompressionEnabled: false has been set in these tests.

@astorm astorm marked this pull request as ready for review March 29, 2022 22:05
@astorm astorm requested a review from trentm March 29, 2022 22:05
Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with setting spanCompressionEnabled: false by default until the implementation is complete. It would be helpful to mark that config var as "experimental" in configuration.asciidoc until the implementation is complete, in case we do releases in the interim.

docs/configuration.asciidoc Show resolved Hide resolved
CHANGELOG.asciidoc Outdated Show resolved Hide resolved
lib/config.js Outdated Show resolved Hide resolved
docs/configuration.asciidoc Outdated Show resolved Hide resolved
docs/configuration.asciidoc Outdated Show resolved Hide resolved
lib/instrumentation/span-compression.js Show resolved Hide resolved
lib/instrumentation/span-compression.js Show resolved Hide resolved
lib/instrumentation/span-compression.js Outdated Show resolved Hide resolved
lib/instrumentation/span-compression.js Show resolved Hide resolved
lib/instrumentation/span-compression.js Show resolved Hide resolved
@astorm
Copy link
Contributor Author

astorm commented Apr 4, 2022

@trentm Design's been changed to explicitly match the pseudo code, per your request. Also, the other feedback I'm addressing has been addressed.

Finally -- I wasn't sure if your pseduo code request extended to the fact that the logic for everything was broken out into a SpanCompression object/class. If you think this logic is better off in/on the span object itself just let me know -- that shouldn't be too big a refactor with the tests in place.

@astorm astorm requested a review from trentm April 4, 2022 17:10
lib/instrumentation/span.js Outdated Show resolved Hide resolved
@astorm astorm requested a review from trentm April 5, 2022 05:48
Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This may be something for a separate change and I think it points to an issue in the span-compression spec. I recall Felix mentioning something about the span-compression spec needing to deal with the case of an ended parent.)

If you run the following script with the current PR code:

// A bug in span-compression where a child span gets buffered to an
// *already ended* parent, which results in the child span never being sent.

var apm = require('./').start({
  metricsInterval: '0s',
  centralConfig: false,
  cloudProvider: 'none',
  spanCompressionEnabled: true
})

function main () {
  const t0 = apm.startTransaction('t0')

  const s1 = apm.startSpan('SELECT FROM a', 'db', 'mysql', { exitSpan: true })
  s1.setDestinationContext({ service: { name: 'mysql', resource: 'mysql', type: 'db' } })

  setTimeout(() => {
    s1.end()
  }, 100)

  // Create an async task which captures the current async run context
  // (i.e. apm.currentSpan === s1)
  setTimeout(() => {
    // s2 is a child of s1 *even if s1 may have already ended*.
    const s2 = apm.startSpan('SELECT FROM b', 'db', 'mysql', { exitSpan: true })
    s2.setDestinationContext({ service: { name: 'mysql', resource: 'mysql', type: 'db' } })

    // When s2 is ended, it is compression eligible, so it is buffered on its
    // parent (s1). However, s1 has already ended, so the current span
    // compression code will never send the buffered s2.
    s2.end()
  }, 200)

  setTimeout(() => {
    t0.end()
  }, 300)
}

main()

Then you get a trace like this:

trace 153dcb
`- transaction 139abe "t0" (303.727ms, outcome=unknown)
    `- span b57c98 "SELECT FROM a" (103.077ms, mysql)

This is broken because the "SELECT FROM b" span is missing. The issue is that we've created a case where a child span ends after its parent has already ended. This results in the span potentially (if eligible for compression) being buffered on that parent, and then never sent. (While the example above looks contrived, it isn't actually too unexpected in Node.js code. There is also a case with async functions where a user doing manual instrumentation can have this happen unexpectedly: #2611.)

With some debug-prints, this is what happens in the example above:

% node compressionplay-break-children.js
XXX -- Instrumentation.addEndedSpan(<Span "SELECT FROM a">)
XXX parent has no buffered span: buffer <Span "SELECT FROM a"> on <Span "t0">
XXX -- Instrumentation.addEndedSpan(<Span "SELECT FROM b">)
XXX parent has no buffered span: buffer <Span "SELECT FROM b"> on <Span "SELECT FROM a">    <-- The bug is here, span "SELECT FROM a" has already ended.
XXX -- Instrumentation.addEndedTransaction(<Transaction "t0">)
XXX send buffered <Span "SELECT FROM a">

If you set spanCompressionEnabled: false in this script, you get the expected two spans.

spec issue

I think this points to an oversight in the spec in the onChildEnd pseudo-code at https://github.com/elastic/apm/blob/main/specs/agents/handling-huge-traces/tracing-spans-compress.md#span-buffering. Perhaps that could change to:

...
void onChildEnd(Span child) {
    if (ended || !child.isCompressionEligible()) {
...

Here is a potential fix in this PR:

--- a/lib/instrumentation/index.js
+++ b/lib/instrumentation/index.js
@@ -355,25 +358,31 @@ Instrumentation.prototype.addEndedSpan = function (span) {
       this._encodeAndSendSpan(span.getBufferedSpan())
     }

-    if (!span.isCompressionEligible()) {
+    if (span.getParentSpan().ended || !span.isCompressionEligible()) {
       // span is not compressible -- send
       const buffered = span.getBufferedSpan()
       if (buffered) {

I can open a spec PR for discussion.

test/instrumentation/span-compression.test.js Outdated Show resolved Hide resolved
test/instrumentation/span-compression.test.js Outdated Show resolved Hide resolved
test/instrumentation/span-compression.test.js Outdated Show resolved Hide resolved
@trentm
Copy link
Member

trentm commented Apr 5, 2022

I can open a spec PR for discussion.

elastic/apm#632

@astorm
Copy link
Contributor Author

astorm commented Apr 6, 2022

(This may be something for a separate change and I think it points to an issue in the span-compression spec. I recall Felix mentioning something about the span-compression spec needing to deal with the case of an ended parent.)

Implemented the suggested fix and adopted the reproduction code as a test in the repository.

@astorm astorm requested a review from trentm April 6, 2022 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants