Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run-context for spans created by instrumentations should not bleed out to user code #2430

Closed
38 tasks done
trentm opened this issue Nov 8, 2021 · 0 comments
Closed
38 tasks done
Assignees
Labels
agent-nodejs Make available for APM Agents project planning.
Milestone

Comments

@trentm
Copy link
Member

trentm commented Nov 8, 2021

the issues

Typically a package that makes some external call will have an API like this:

var foo = require("foo")
var client = foo.createClient(/* ... */)

// Callback API
client.doAThing(/* opts */, function afterTheThingIsDone (err, result) {
  // ... do whatever next ...
})
// ... synchronous code after client.doAThing() ...

// And/or a Promise API
client.doAThing(/* opts */)
  .then(function afterTheThingIsDone (result) {
    // ... do whatever next ...
  })
// ... synchronous code after client.doAThing() ...

// Using the Promise API via await.
async function main() {
  const result = await client.doAThing(/* opts */)
  // ... do whatever next ...
}
main()

Typically, the APM agent's instrumentation of these types of packages will create a span (an exit span) for the doAThing external call.

  1. When the user's ... do whatever next ... code is called, it is important that the "run context" does not include that automatically created span -- it should be the same run context when client.doAThing() was called (the "parent" context). If ... do whatever next ... is bound to the create span context, even if that span has now ended, it can result in subsequently created spans being a child of that automatically created span.
  2. Likewise, when the user's ... synchronous code after client.doAThing() ... code is called, it is important that the run context does not include the automatically created span.

In the current redis instrumentation, for example, this code:

var client = redis.createClient('6379', process.env.REDIS_HOST)
agent.startTransaction('t0')
client.set('key1', 'val1', function (err, reply) {
  // ...
})
client.set('key2', 'val2')

results in the span for SET key2 being a child of the span for SET key1. I.e. the redis instrumentation suffers from issue 2. IIRC the @elastic/elasticsearch instrumentation suffers from issues 1 and 2.

This is a problem for handling exit spans. https://github.com/elastic/apm/blob/master/specs/agents/tracing-spans.md#child-spans-of-exit-spans says:

Exit spans MUST not have child spans that have a different type or subtype.

And exit span handling is a prerequisite for span compression and work for span.context.destination.service.resource.

checklist

Review of all instrumentations and the need for: bindFunction, .with, startSpan -> createSpan, etc. Basically this involves adding test/asserts for the apm.currentSpan in the callbacks, event handlers, etc. for these integrations. Often creating a examples/trace-$thing.js is helpful to learn and illustrate. When an item is marked "N/A" in the following list, that indicates that the instrumentation does not createSpan, so there is no expectation of bleeding spans. There is still possibility that these instrumentations could better be ins.bind*()'ing for user expectations.

@trentm trentm added this to the 8.0 milestone Nov 8, 2021
@trentm trentm self-assigned this Nov 8, 2021
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Nov 8, 2021
@trentm trentm changed the title spans created by instrumentations should not bleed out to user code run-context for spans created by instrumentations should not bleed out to user code Nov 8, 2021
trentm added a commit that referenced this issue Nov 16, 2021
When disableInstrumentations support was added in #353 it only
half-disabled these instrumentations, briefly mentioning "continuation
patches". I believe this was about user-land callback queue handling
that should no longer be necessary after run-context and #2430 work.
trentm added a commit that referenced this issue Nov 19, 2021
…ntext, captureError

- This fixes run-context handling for proper parent/child relationships.
  The run-context in code in the same tick and the optional callback are
  no longer changed.
- Add destination context, so ioredis usage shows redis on the Service Map.
- Capture an error and set the span outcome to "failure" if the command
  errors.

Refs: #2430
trentm added a commit that referenced this issue Nov 23, 2021
This fixes the 'mysql' instrumentation to not have the mysql span
context be active in user code. This ensures that user code cannot
create a child span of the mysql span, which would (a) be misleading
and (b) cause problems for coming exit span and compressed span work.

Refs: #2430
trentm added a commit that referenced this issue Nov 23, 2021
…2463)

This fixes the 'mysql' instrumentation to not have the mysql span
context be active in user code. This ensures that user code cannot
create a child span of the mysql span, which would (a) be misleading
and (b) cause problems for coming exit span and compressed span work.

Refs: #2430
trentm added a commit that referenced this issue Nov 24, 2021
…ntext, captureError (#2460)

- This fixes run-context handling for proper parent/child relationships.
  The run-context in code in the same tick and the optional callback are
  no longer changed.
- Add destination context, so ioredis usage shows redis on the Service Map.
- Capture an error and set the span outcome to "failure" if the command
  errors.
- No longer double-instrument queued ioredis commands
- Limit the number of ioredis versions tested, it is one of the 
  top 5 longest to run.

Fixes: #2459
Refs: #2430
trentm added a commit that referenced this issue Nov 25, 2021
…ill into user code

This fixes the 'http' and 'https' instrumentation for outgoing requests
to not have the 'http' span context be active in user code. This ensures
that user code cannot create a child span of the http span, which would
(a) be misleading and (b) cause problems for coming exit span and
compressed span work.

Refs: #2430
trentm added a commit that referenced this issue Nov 25, 2021
This fixes aws-sdk (S3, SQS, SNS) instrumentation to ensure the
automatically created AWS spans do not spill into user code. This also
ensures that any captured APM error for a failed AWS command is a child
of the AWS span.

Refs: #2430
trentm added a commit that referenced this issue Nov 26, 2021
This ensures that a created 'memcached' span is not the currentSpan
in user code after the call.

Refs: #2430
trentm added a commit that referenced this issue Dec 1, 2021
…spill into user code (#2470)

This fixes the 'http' and 'https' instrumentation for outgoing requests
to not have the 'http' span context be active in user code. This ensures
that user code cannot create a child span of the http span, which would
(a) be misleading and (b) cause problems for coming exit span and
compressed span work.

Also, fix a bug in the https instrumentation in older versions of node
(version <9.0.0) where the instrumentation of `https.request` relied on
intercepting `http.request` (that Node's `https.request()` would call). 
The agent didn't guarantee that the 'http' module was instrumented. 
A user program that used `https.request` without indirectly
`require('http')`ing would not get an HTTP span.

Refs: #2430
trentm added a commit that referenced this issue Dec 1, 2021
…spill into user code (#2470)

This fixes the 'http' and 'https' instrumentation for outgoing requests
to not have the 'http' span context be active in user code. This ensures
that user code cannot create a child span of the http span, which would
(a) be misleading and (b) cause problems for coming exit span and
compressed span work.

Also, fix a bug in the https instrumentation in older versions of node
(version <9.0.0) where the instrumentation of `https.request` relied on
intercepting `http.request` (that Node's `https.request()` would call). 
The agent didn't guarantee that the 'http' module was instrumented. 
A user program that used `https.request` without indirectly
`require('http')`ing would not get an HTTP span.

Refs: #2430
trentm added a commit that referenced this issue Dec 7, 2021
This fixes the 'mysql2' instrumentation to not have the mysql span
context be active in user code. This ensures that user code cannot
create a child span of the mysql span, which would (a) be misleading
and (b) cause problems for coming exit span and compressed span work.

Refs: #2430
trentm added a commit that referenced this issue Jan 6, 2022
- Add 'mongodb' to test-all-versions (TAV) testing.
- Ensure a mongodb span is not accidentally a child of an inflight
  mongodb span
- Fix an issue where instrumentation of `new MongoClient(url)` would
  fail because the internal handling of `arguments` would add
  `arguments[1]` but arguments.length stayed at 1. Fix by first
  changing to an Array.
- Fix instrument of a connection made using the MongoClient.connect
  static method. #2467

Refs: #2430
Fixes: #2467
trentm added a commit that referenced this issue Jan 27, 2022
…ntation

This is a re-write of the @elastic/elasticsearch instrumentation that
stops using the ES client observability events
    https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/observability.html
and switches to patching Transport#request() instead. This is necessary
to be able to bind that `request()` invocation to the RunContext for the
Span we've created; without using `apm.startSpan(...)`, which bleeds the
RunContext out to user code. (Patching Transport#request() is what the
legacy 'elasticsearch' instrumentation is also doing.)

Refs: #2430
trentm added a commit that referenced this issue Jan 31, 2022
…ntation (#2550)

This is a re-write of the @elastic/elasticsearch instrumentation that
stops using the ES client observability events
    https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/observability.html
and switches to patching Transport#request() instead. This is necessary
to be able to bind that `request()` invocation to the RunContext for the
Span we've created, without using `apm.startSpan(...)`, which bleeds the
RunContext out to user code. (Patching Transport#request() is what the
legacy 'elasticsearch' instrumentation is also doing.)

Also: '@elastic/elasticsearch-canary' TAV tests were failing because an
ES service container was not being setup for its test run; we didn't see
this in CI before because the Jenkins TAV config doesn't have the
canary module.

Refs: #2430
@trentm trentm closed this as completed Jan 31, 2022
trentm added a commit that referenced this issue Jan 31, 2022
When disableInstrumentations support was added in #353 it only
half-disabled these instrumentations, briefly mentioning "continuation
patches". I believe this was about user-land callback queue handling
that should no longer be necessary after run-context and #2430 work.

In #2498 it was discussed and decided that if listed in
disableInstrumentations the agent should not touch the module at all.

Refs: #2498
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

No branches or pull requests

2 participants