Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentry produces incorrect data when async code is involved #4071

Closed
4 tasks done
srcspider opened this issue Oct 16, 2021 · 46 comments
Closed
4 tasks done

Sentry produces incorrect data when async code is involved #4071

srcspider opened this issue Oct 16, 2021 · 46 comments

Comments

@srcspider
Copy link

srcspider commented Oct 16, 2021

Package + Version

  • @sentry/node

Version:

6.13.3

Description

When dealing with async code sentry fails to properly record data. Not only does it fail to record data, it also constantly adds bad data.

Exception 1: The problem is somewhat hidden when you deal with a context that sits in a sandbox environment which only exists for the duration of ONE AND ONLY ONE transaction. The most popular example would be a browser session, but since we're speaking node scenarios the other example would be a "cloud function" (ie. pops into existence per request, and only ever handles hopefully the one transactions); there's also the scenarios that use actual threads (which are rare in the current ecosystem). Needless to say, those 3 I just mentioned are also technically only fine so long as they use only one transactions; if more then one is used for any reason they're suddenly going to have bad data.

Exception 2: The problem is also hidden when transactions are completely dependent on user input (eg. user has to click some navigation item) and also happen faster then user input. This creates an artificial serial nature to them, hiding the fact that if more then one run at once they botch things since no more then one would ever run at once. However, the more complex the system, the less likely this exception will save you.

The problem is most evident and most aggravating in the scenario of a http server (which arguably covers a lot of the use cases for sentry with nodejs). Since a http server will pickup requests and work on them as-they-come, regardless of if the previous one is finished, sentry has to work perfectly with 2, 3, 1000 transactions running simultaneously.

In general, with your average http server (and similar use cases), it's your job to keep your stuff in one request out of the others. This isn't particularly hard normally since all it means is: don't use globals. Unfortunately sentry seems to do just that, for everything.

I've read though maybe 50 issues, and this problem has been brought before under various forms, some more direct then others. The solution has also been mentioned several times. And I've also seen the reply of "works for us" with the not-very obvious subtext "in this environment where only one transaction would ever exist before the process exists".

So, I'll make this very very simple, by giving you some code you can easily run (and is in accordence to what your documentation, responses to issues, etc are saying is the "correct way") along with screenshots of your dashboards with datapoints that are as obvious as I can possibly make them for you.


There will be only one file, so just copy all the blocks in a main.mjs file; .mjs is the current nodejs of saying the file uses modern imports.

First some basic boilerplate.

import Sentry from '@sentry/node';

const SENTRY_DSN = ""; // add your test project DSN here

Sentry.init({

	dsn: SENTRY_DSN,

	release: 'v1', // increment to get clean test sample

	// Set tracesSampleRate to 1.0 to capture 100%
	// of transactions for performance monitoring.
	// We recommend adjusting this value in production
	tracesSampleRate: 1.0

});

Next we'll add a function to help simulate "real world long async operations"

async function delay(time, fn) {
	return new Promise((resolve) => {
		setTimeout(async () => {
			resolve(await fn());
		}, time);
	});
}

Next we'll add a function that simulates all the breadcrumbs sentry adds whenever some random 3rd party library is used god knows where in dependencies. The parameter theRealTransaction is only used to make it obvious what the breadcrumbs and spans actually originate from.

async function randomThirdPartyIntegration(theRealTransaction) {

	// eg. "Got" library or whatever you guys are doing that's capturing HTTP

	// This context intentionally doesn't have access to objects in parameters
	// or other methods since that's how it typically happens in the real world.

	let hub = Sentry.getCurrentHub();
	let scope = hub.getScope();
	let transaction = scope.getTransaction();

	let span;
	if (transaction) {
		span = transaction.startChild({
			op: 'http_req',
			description: `http://google.com (real transaction: ${theRealTransaction})`
		});
	}

	scope.addBreadcrumb({
		category: 'http',
		message: `REQ http://google.com [Start] (real transaction: ${theRealTransaction})`
	});

	return await delay(300, () => {
		scope.addBreadcrumb({
			category: 'http',
			message: `REQ http://google.com [End] (real transaction: ${theRealTransaction})`
		});
		if (span) {
			span.finish();
		}
	});

}

Next we'll create a function that simulates what may happen in typical http server (or whatever else) request handle; we'll call it exampleInput since it will be the only function that takes input in theory (our randomThirdPartyIntegration is also a dependency of it).

Edit 1: Original function had incorrect tag code. (As such tags errors were ignored, does not affect rest of issue) Scope has been added to exception handler, to be even more correct. (Still produces same problems)

async function exampleInput ({
	name, // the name of this transaction
	user, // the user for the transaction
	throwsError,
	timers: { // delays between actions
		action1,
		action2,
		action3
	}
}) {

	const tagName = name.replace('#', '');

	const hub = Sentry.getCurrentHub();
	const scope = hub.pushScope();

	scope.setUser({ username: user });

	const transaction = hub.startTransaction({
		op: "test",
		name: `Test Transaction ${name}`,
	});

	scope.setSpan(transaction);

	try {

		// Tag

		scope.setTag(`T${tagName} Tag 1`, 'yes');

		// Action 1

		let span1 = transaction.startChild({
			op: 'action',
			description: `Transaction ${name} Action #1`
		})

		try {
			await delay(action1, () => {
				scope.addBreadcrumb({
					category: 'step',
					message: `Transaction ${name} Action #1`
				});
			});
		}
		finally {
			span1.finish();
		}

		// Tag

		scope.setTag(`T${tagName} Tag 2`, 'yes')

		// Action 2

		let span2 = transaction.startChild({
			op: 'action',
			description: `Transaction ${name} Action #2`
		})

		try {
			await delay(action2, () => {
				scope.addBreadcrumb({
					category: 'step',
					message: `Transaction ${name} Action #2`
				});
			});
		}
		finally {
			span2.finish();
		}

		// Random Third Party Dependency

		let span3 = transaction.startChild({
			op: 'third_party',
			description: `Transaction ${name} Third-Party Thing`
		});

		try {
			await randomThirdPartyIntegration(name);
		}
		finally {
			span3.finish();
		}

		// Tag

		scope.setTag(`T${tagName} Tag 3`, 'yes')

		if (throwsError) {
			throw new Error("Test Error: Something went wrong");
		}

		// Action 3

		let span4 = transaction.startChild({
			op: 'action',
			description: `Transaction ${name} Action #3`
		});

		try {
			await delay(action3, () => {
				scope.addBreadcrumb({
					category: 'step',
					message: `Transaction ${name} Action #3`
				});
			});
		}
		finally {
			span4.finish();
		}

		// Tag

		scope.setTag(`T${tagName} Tag 4`, 'yes');

	}
	catch (err) {
		Sentry.captureException(err, scope)
	}
	finally {
		hub.popScope();
		transaction.finish();
	}

}

And, finally we'll simulate some transactions. I've included a Timeline, but be aware that the timeline isn't something that will necessarily happen guaranteed as it suggests, its simply where the current settings in the example actually align if plotted on a timeline.

// The following simulates what a basic http server would do
// We don't wait for one transaction to finish before other starts because
// no sane server in the world does that sh*t

// a1 = action1, a2 = action2, a3 = action3
//
// Timeline (ms)      0        300  400  500  600  700  800  900  1000  1100
// -----------------------------------------------------------------------------
// Transaction 1      start              a1             a2              a3
// Transaction 2      start    a1             a2             a3
// Transaction 3      start         a1             a2             a3
// Transaction 3      start                   a1   a2 -> err

exampleInput ({
	name: '#1',
	user: 'transaction1_user',
	throwsError: false,
	timers: {
		action1: 500,
		action2: 300,
		action3: 300
	}
})

exampleInput ({
	name: '#2',
	user: 'transaction2_user',
	throwsError: false,
	timers: {
		action1: 300,
		action2: 300,
		action3: 300
	}
})

exampleInput ({
	name: '#3',
	user: 'transaction3_user',
	throwsError: false,
	timers: {
		action1: 400,
		action2: 300,
		action3: 300
	}
})

exampleInput ({
	name: '#4',
	user: 'transaction4_user',
	throwsError: true,
	timers: {
		action1: 600,
		action2: 100,
		action3: 100
	}
})

To run this just do. (after you provide the SENTRY_DSN)

node ./main.mjs

You can also change the release to have a clean copy you can filter to in transactions.


Now lets look at what sentry has to say after running it once. I'll highlight all the mistakes I can spot with red arrows (there may be more).

transactions

In the above transaction 1 doesn't have a user, even though we set user on every transactions.

transaction1

Above is Transaction 1. We're missing the spans generated by the randomThirdPartyIntegration function itself, and we're missing all breadcrumbs.

transaction2

Second transaction is also incorrect.

transaction3

Third transaction somehow got wrong user and for some reason managed to miss the 3rd action breadcrumb.

transaction4

Fourth transaction also got wrong user and also has 3rd party spans from transaction 2 and 3.

error

The 4th transaction was set to also throw a error. Above we can see what we get in the error. Why are we getting in the error breadcrumbs that never appear in the transactions themselves?


Anyway, so how is this fixed?

Technically speaking there are three approaches.

The first is to use is to use Async Context Tracking that is made available as stable api from nodejs 16 onwards. This doesn't solve all the problems with how (from a quick glance at the code) sentry seems to work, but it does mean sentry is able to consistently get the real context with out the use of globals. Specifically the only parts it doesnt' solve are how sentry currently works with scope; which is not even clear to me how exactly this "ever merging" scope is suppose to work with what is essentially "isolated contexts" (eg. http requests, work queues, etc)

The second method will work in all versions of node, including prior versions (and people paranoid about perf impact of async context tracking), and the method is to just REQUIRE every operation pass the transaction object. So things like Sentry.captureException(err) should really be Sentry.captureException(err, transaction) where the transaction is ignored if null (and I mean IGNORED, no guessing of what it might be, there is none). Scope should just be transaction.getScope().

The third method, and this is purely only for these special cases that sentry devs claim "works for them" is to have a method Sentry.setMasterTransaction that works as follows: if no transaction is set, set the transaction as the master; if a master is already set and not finished then error out and invalidate the master and the current transaction; if the master is set but finished (not counting transport costs), replace the current master with the new transaction. Essentially, you can have your "global transaction" but sentry guarantees you won't shoot yourself in the foot with it.

@andreialecu
Copy link
Contributor

I'm not super familiar with the sentry api yet but captureException does take a second parameter which can be set to the scope:

Sentry.captureException(err, scope), if you add the scope there does it help?

@srcspider
Copy link
Author

@andreialecu I've added that and also updated the original post, no significant change, still bleed everywhere and otherwise incorrect data. I've also fixed a problem with the tag code (was wrong originally).

Tag code is now correct in original post. And yes the data around tags is wrong. Note: given usage, each transaction should have unique tags.

image

image

image

@andreialecu
Copy link
Contributor

Indeed, I just took a look at the code here:

public pushScope(): Scope {
// We want to clone the content of prev scope
const scope = Scope.clone(this.getScope());
this.getStack().push({
client: this.getClient(),
scope,
});
return scope;
}
/**
* @inheritDoc
*/
public popScope(): boolean {
if (this.getStack().length <= 1) return false;
return !!this.getStack().pop();
}

It seems evident that this isn't going to support async processes properly. The stack is a singleton, and popping and pushing from it might happen at any given time.

I notice some related discussion in #3572 along with a potential ugly workaround. I'm currently also running into this issue in my own projects and will investigate further.

@andreialecu
Copy link
Contributor

A possible solution might be to instantiate a new Hub(client) instance manually on every "thread" and use that instead of using Sentry.getCurrentHub(); which seems to return a singleton.

The documentation does seem to hint at it here:
https://docs.sentry.io/platforms/javascript/enriching-events/scopes/#whats-a-scope-whats-a-hub

You can think of the hub as the central point that our SDKs use to route an event to Sentry. When you call init() a hub is created and a client and a blank scope are created on it. That hub is then associated with the current thread and will internally hold a stack of scopes.

This mentions it is associated with the current thread, but from what I can read from the code that doesn't happen. It's being reused for the entire app hence why events are all over the place.

Maybe you could check what happens if instead of let hub = Sentry.getCurrentHub(); you create a new hub instance.

@derekrprice
Copy link

derekrprice commented Oct 19, 2021

@andreialecu, the problem with that is that your hub is still "global". There is no way to find the right one from an arbitrary location anymore than the right scope could be found. My workaround on #3572 fakes isolated scope by replacing withScope and getScope with versions that store a unique hash key in a function name that can be read later via a stack trace. It works fine but I haven't done any performance testing. The "right" way would be to use async context but, as the original poster noted, that isn't available until Node 16 and my team, at least, is still stuck using Node 14 for the time being. Perhaps the Sentry SDK could implement a fix using async context when it can and with my workaround implemented as a polyfill?

Sentry's response on #3572 has been that they can't implement this until they rewrite a bunch of other stuff, but I don't see why not. My workaround just replaces those two functions in a backwards compatible way.

Oh, @srcspider implies in his original post that this problem is hidden with "cloud functions", but this is just dependent on transaction volume and concurrency, just like with his HTTP server. This is exactly the case where we encountered the issue, with cloud functions processing hundreds of concurrent transactions. Sentry's scope and transaction tracking is almost useless in our environment. All the breadcrumb data for hundreds of transactions ends up getting submitted with a single transaction.

@andreialecu
Copy link
Contributor

andreialecu commented Oct 19, 2021

async_hooks has been experimental since node 8, so the implementation in node 14 is likely fine.

I have just been experimenting with it today in node 16 however, and it's seems it easily loses the context and doesn't integrate with EventEmitter.

Here are some reports:
nodejs/help#2537
nodejs/help#2546
nodejs/node#33723

In my case I'm not currently able to pass the context properly, it seems to be undefined. Probably because of the EventEmitter issue.

Edit: actually the problem is mongodb related ref: https://jira.mongodb.org/browse/NODE-3010 and Automattic/mongoose#10020 - so AsyncLocalStorage still comes with its own set of problems.

@andreialecu
Copy link
Contributor

andreialecu commented Oct 19, 2021

An update, I've been able to properly separate requests from what I can see from some initial tests.

The trick is this (I'm using a NestJS interceptor to isolate requests):

  async intercept(
    context: ExecutionContext,
    next: CallHandler,
  ): Promise<Observable<unknown>> {
    // trick is here (not sure if there's a better way to get the client)
    const client = this.sentry.instance().getCurrentHub().getClient();
    const hub = new Sentry.Hub(client);

    const transaction = hub.startTransaction({
      op: 'resolver',
      name: [context.getClass().name, context.getHandler().name].join('.'),
      tags: {
        class: context.getClass().name,
        handler: context.getHandler().name,
        // others
      },
    });

    hub.configureScope((scope) => {
      scope.setSpan(transaction);

      scope.setUser(sentryUser);
    });

    return new Observable((observer) => {
      runWithSentry(transaction, () => {
        next
          .handle()
          .pipe(
            tap({
              complete: () => {
                transaction.finish();
              },
              error: (exception) => {
                transaction.setStatus(SpanStatus.InternalError);
                transaction.finish();
                hub.withScope((scope) => {
                  if (req) {
                    const data = Sentry.Handlers.parseRequest({}, req);
                    scope.setExtra('req', data.request);
                    if (data.extra) scope.setExtras(data.extra);
                  }
                  scope.setUser(sentryUser);

                  hub.captureException(exception);
                });
              },
            }),
          )
          .subscribe({
            next: (res) => observer.next(res),
            error: (error) => observer.error(error),
            complete: () => observer.complete(),
          });
      });
    });
  }

It's important to call all methods on the hub instead of on the global Sentry object.

There's also this helper module for async hooks:

import { Transaction } from '@sentry/types';
import { AsyncLocalStorage } from 'async_hooks';

const storage = new AsyncLocalStorage<Transaction>();

export const getCurrentTransaction = (): Transaction | undefined =>
  storage.getStore();

export const runWithSentry = storage.run.bind(storage);

I'm now able to call const transaction = getCurrentTransaction(); elsewhere and get a proper reference to the transaction and add children to it.

Hope this helps.

@derekrprice
Copy link

@andreialecu, are you saying that this code works as a wrapper around the current SDK?

@andreialecu
Copy link
Contributor

andreialecu commented Oct 19, 2021

@derekrprice I just shared a sample of how I'm using it, hoping it helps. Not sure what you mean by wrapper.

@derekrprice
Copy link

derekrprice commented Oct 19, 2021

@andreialecu, I just mean that your code is something that will work without modifying the current Sentry SDK, as opposed to a demonstration of how to fix the SDK. I think you just confirmed that. I should have some time to put it to the test next week. Thanks!

@andreialecu
Copy link
Contributor

andreialecu commented Oct 19, 2021

Yes, the sample above does not modify the SDK in any way, but I think the way it's being used is unconventional.

I'm not sure if any memory leaks would occur yet from creating new hub instances for every request. I haven't checked.

It also appears that it may repeatedly set up "integrations" unnecessarily on the client on each Hub instantiation:

if (client && client.setupIntegrations) {
client.setupIntegrations();
}

I'm hoping this issue gets triaged and we get an official response. The issue above can easily be solved with a PR.

@srcspider
Copy link
Author

@andreialecu

I have just been experimenting with it today in node 16 however, and it's seems it easily loses the context and doesn't integrate with EventEmitter.

Here are some reports:
nodejs/help#2537
nodejs/help#2546
nodejs/node#33723

In all 3 example issues with problem from my understanding looking at them is that the functions in questions are not called, they're declared. The AsyncLocalStorage only works if it's called (I think), since the store only passes though functions that are called.

I haven't fully read though discussion there but seems all 3 are fixed by just doing something like binding the store to the callbacks, so it's available when they are called, then repassing it though a new AsyncLocalStorage call.

From my understanding (and it's more guessing) the rule of thumb with it should be that if it's not in your stacktrace it doesn't go though.

So in all those "doesn't work cases" you just do something like this... (copy paste everything in a single main.mjs file)

Boilerplate:

import { AsyncLocalStorage } from 'async_hooks';

const asyncLocalStorage = new AsyncLocalStorage();

Helper:

/**
 * Forces async store to pass though sync/async callback.
 */
function asyncstore_bind(fn) {
	let data = asyncLocalStorage.getStore();

	if (data != null) {
		return (...args) => {
			return asyncLocalStorage.run(data, () => {
				fn(...args)
			});
		}
	}
	// else: no store, just return the function as-is

	return fn;
}

We'll create something that takes a callback and uses it later (which is what all 3 of those issues seem to have)

class Example {

	handler = null;

	updateHandler(fn) {
		this.handler = fn;
	}

	runHandler(name) {
		this.handler(name);
	}

}

And now we run some test cases:

let expectedValue = 42;

// Bad Example (should not work)
// ===========

let test1 = new Example();

asyncLocalStorage.run(expectedValue, () => {
	test1.updateHandler(name => {
		let value = asyncLocalStorage.getStore();
		console.log(`${name}: ${expectedValue == value}`);
	})
});

test1.runHandler('Test 1')

// Good Example (should work)
// ============

let test2 = new Example();

asyncLocalStorage.run(expectedValue, () => {
	test2.updateHandler (
		asyncstore_bind (
			name => {
				let value = asyncLocalStorage.getStore();
				console.log(`${name}: ${expectedValue == value}`);
			}
		)
	)
});

test2.runHandler('Test 2');

// Good Example (async version)
// ============

let test3 = new Example();

asyncLocalStorage.run(expectedValue, () => {
	test3.updateHandler (
		asyncstore_bind (
			async (name) => {
				let value = asyncLocalStorage.getStore();
				console.log(`${name}: ${expectedValue == value}`);
			}
		)
	)
});

test3.runHandler('Test 3');

Output:

$ node main.mjs
Test 1: false
Test 2: true
Test 3: true

@andreialecu
Copy link
Contributor

@srcspider it's not always that simple, unfortunately. Depending on which libraries you use, you can still lose the context in certain instances.

Take a look at https://stackoverflow.com/questions/64330910/nodejs-asynclocalstorage-getstore-return-undefined for some other reports about context being lost just by using bodyparser for express.

Also a report apollographql/apollo-server#2042 which also applies to AsyncLocalStorage about context being lost in Apollo server.

See the "Edit" I added in that post, in my case I'm using mongoose which uses mongodb via callbacks instead of promises internally. The callbacks lose the async local storage context. Without patching mongoose or mongodb this can't easily be fixed.

I have however found a workaround which I posted in Automattic/mongoose#10020 (comment)

Anyways, this discussion is a bit offtopic for this thread.

See my second post where I mentioned I got everything working. 🙂

@AbhiPrasad
Copy link
Member

Scope propogation for async code is something we are aware of, but would require breaking changes to fully address: #3751. It's on our radar though, we are actively thinking about the problem for both node and browser.

Also, async hook support is something we are considering also. See: #3660

@andreialecu
Copy link
Contributor

@AbhiPrasad I've been able to work around this by creating a hub via (ref: #4071 (comment)):

const client = Sentry.getCurrentHub().getClient();
const hub = new Sentry.Hub(client);

Then I can call methods and start transactions on that hub instance instead of the global sentry object and things properly get isolated.

Is this the right approach? I haven't seen it documented, and the global hub is unusable in nodejs otherwise (as explained in the OP).

@AbhiPrasad
Copy link
Member

AbhiPrasad commented Oct 21, 2021

Yup explicit propagation of hubs and scopes will always work - the issues arise when you try to implicitly propagate it (such as when you have automatic instrumentation of libraries). In the case of implicit propagation, the auto-instrumentation has to "pick" the correct hub/scope to call methods on, which breaks down in async environments. Using async hooks and domains is the way to work toward a solution here.

On the browser though, this is still something that needs to be figured out (unless one uses a library like https://www.npmjs.com/package/zone.js)

@AbhiPrasad
Copy link
Member

We have further discussion about this here: https://develop.sentry.dev/sdk/research/performance/#scope-propagation

@derekrprice
Copy link

@AbhiPrasad, have you had a look at my code on #3572? It works around the async issue without changing the Sentry SDK API by taking advantage of the fact that JS stack traces preserve the correct call chain even with async functions. Using that fact, it moves the scope stack into an object and uses a withScope() method to wrap anything you wish to trace with a function with a name containing the nested scope's key for that object. Basically, the technique should work as a polyfill for the async context feature available after node 1.16. I read through your associated links, but I still don't understand why this isn't an acceptable way to find the correct nested scope. Can you be more specific? Thanks!

@AbhiPrasad
Copy link
Member

AbhiPrasad commented Oct 21, 2021

Yup we have @derekrprice - your polyfill is a great way to approach a possible solution - though some care would have to be put in to manage call stack issues.

We actually want to extend the API so that users can select whatever scope management strategy they prefer - hence we are waiting for the next major version to roll out these changes. So for example, you can use async storage, your solution, or something completely custom.

The hope is that this will scale so that we can have a common scope propagation API that node, browser, electron, react native, and any other js platform can all share.

@srcspider
Copy link
Author

srcspider commented Oct 21, 2021

@AbhiPrasad

Scope propogation for async code is something we are aware of, but would require breaking changes to fully address: #3751. It's on our radar though, we are actively thinking about the problem for both node and browser.

I assume it's on radar for v8? since v7 doesn't have it: https://github.com/getsentry/sentry-javascript/milestone/7

the issues arise when you try to implicitly propagate it (such as when you have automatic instrumentation of libraries)

Do you have an option or workaround to disable all of that? I assume that's not a "breaking change" to add.

[edit]

Also want to mention that, while it causes probably problems in browsers too (read though your research paper), I actually don't care as much there as I do for a nodejs server. It's just so aggrieving on nodejs compared to the browser.

Also solutions like the client/hub one from @andreialecu should just be in your docs.

@srcspider
Copy link
Author

srcspider commented Oct 21, 2021

@AbhiPrasad

Yup explicit propagation of hubs and scopes will always work

In my opening post example, replacing all instances of Sentry. with hub created via @andreialecu's method and even passing hub to the randomThirdPartyIntegration and only using that will mostly work and is sure hell of a lot better then nothing, BUT breadcrumbs will vanish on all transactions, users will not get recorded on all transactions and tags will not get recorded on all transactions. The error will however have the user, tags and breadcrumbs (and correct ones at that). So yes a lot better, but far from "will always work"

You may need to revisit other aspects of what exactly Sentry does vs just hub does. Since there's either an extra step or something else to make it a full workaround.

@andreialecu
Copy link
Contributor

users will not get recorded on all transactions

Users are being logged for all transactions for me and I have a relatively high traffic app, make sure you attach the user to the transaction. This seems to work for me:

hub.configureScope((scope) => {
  scope.setSpan(transaction);
  scope.setUser(sentryUser);
});

@srcspider
Copy link
Author

Tested a bunch of variations. You're right only passing any scope change though hub.configureScope over and over seems to work. Trying to use things like const scope = hub.pushScope(); while in theory should work (since the configureScope method is only grabbing the top scope) they only actually produce anything when used with the Sentry global version and produce nothing when used with the explicit hub version.

Thanks @andreialecu

@srcspider
Copy link
Author

For anyone stumbling on this, here's a wrapper.

Only tested on my original example code (which doesn't cover all cases). Still, produces perfect result when refactored to use the wrapper; including when randomThirdPartyIntegration is refactored to use the wrapper's findTransaction method that uses async_hooks.

import Sentry from '@sentry/node';
import { AsyncLocalStorage } from 'async_hooks';
import { Client, Hub, Span, SpanContext, Transaction, User } from '@sentry/types';

const TrasanctionStorage = new AsyncLocalStorage<TracerTransaction>();

type SpanOptions = Pick<SpanContext, "status" | "tags" | "startTimestamp" | "endTimestamp" | "data">;

enum TracerSeverity {
    Fatal = "fatal",
    Error = "error",
    Warning = "warning",
    Log = "log",
    Info = "info",
    Debug = "debug",
    Critical = "critical"
}

enum TracerSpanStatus {
    /** The operation completed successfully. */
    Ok = "ok",
    /** Deadline expired before operation could complete. */
    DeadlineExceeded = "deadline_exceeded",
    /** 401 Unauthorized (actually does mean unauthenticated according to RFC 7235) */
    Unauthenticated = "unauthenticated",
    /** 403 Forbidden */
    PermissionDenied = "permission_denied",
    /** 404 Not Found. Some requested entity (file or directory) was not found. */
    NotFound = "not_found",
    /** 429 Too Many Requests */
    ResourceExhausted = "resource_exhausted",
    /** Client specified an invalid argument. 4xx. */
    InvalidArgument = "invalid_argument",
    /** 501 Not Implemented */
    Unimplemented = "unimplemented",
    /** 503 Service Unavailable */
    Unavailable = "unavailable",
    /** Other/generic 5xx. */
    InternalError = "internal_error",
    /** Unknown. Any non-standard HTTP status code. */
    UnknownError = "unknown_error",
    /** The operation was cancelled (typically by the user). */
    Cancelled = "cancelled",
    /** Already exists (409) */
    AlreadyExists = "already_exists",
    /** Operation was rejected because the system is not in a state required for the operation's */
    FailedPrecondition = "failed_precondition",
    /** The operation was aborted, typically due to a concurrency issue. */
    Aborted = "aborted",
    /** Operation was attempted past the valid range. */
    OutOfRange = "out_of_range",
    /** Unrecoverable data loss or corruption */
    DataLoss = "data_loss"
}

class TracerSpan {

    public parent: TracerTransaction;
    public alreadyFinished: boolean = false;
    public span: Span;

    constructor (
        parent: TracerTransaction,
        op: string,
        description: string,
        opts: SpanOptions = {}
    ) {
        this.parent = parent;
        if ( ! parent.noop) {
            this.span = parent.transaction.startChild({
                op, description, ...opts
            })
        }
    }

    status(status: TracerSpanStatus): TracerSpan {
        this.span?.setStatus(status);
        return this;
    }

    httpStatus(httpStatus: number): TracerSpan {
        this.span?.setHttpStatus(httpStatus);
        return this;
    }

    ok(): TracerSpan {
        this.span?.setStatus(TracerSpanStatus.Ok);
        return this;
    }

    end(): void {
        if (this.alreadyFinished) {
            throw new Error('Span end() called multiple times.');
        }

        if (this.parent.alreadyFinished) {
            throw new Error('Span end() called after transaction ended.');
        }

        this.alreadyFinished = true;
        this.span?.finish();
    }
}

class TracerTransaction {

    public noop: boolean;

    public hub: Hub;
    public transaction: Transaction;
    public alreadyFinished: boolean = false;

    constructor() {
        this.noop = true; // by default unstarted, do nothing
    }

    start (
        op: string,
        name: string,
    ): void {
        this.noop = false;

        const client = Sentry.getCurrentHub().getClient();
        this.hub = new Sentry.Hub(client);

        this.transaction = this.hub.startTransaction({ op, name });

        this.hub.configureScope(scope => {
            scope.setSpan(this.transaction);
        });
    }

    /**
     * Binds transaction to context and executes context.
     * Returns whatever context returns.
     */
    bind<T>(context: (...args: any[]) => T): T {
        return TrasanctionStorage.run(this, context);
    }

    /**
     * Start new Span on the transaction.
     */
    span (
        op: string,
        description: string,
        opts: SpanOptions = {}
    ): TracerSpan {
        return new TracerSpan (
            this, op, description, opts
        );
    }

    setUser(user: User): TracerTransaction {
        this.hub?.configureScope(scope => {
            scope.setUser(user);
        });

        return this;
    }

    addTag (
        tagName: string,
        value: string | number
    ): TracerTransaction {
        this.hub?.configureScope(scope => {
            scope.setTag(tagName, value);
        });

        return this;
    }

    addBreadcrumb (
        category: string,
        message: string,
        type: string = null,
        severity: TracerSeverity = TracerSeverity.Info,
        data?: {
            [key: string]: any;
        }
    ): TracerTransaction {
        this.hub?.configureScope(scope => {
            scope.addBreadcrumb({
                category,
                message,
                type,
                level: Sentry.Severity.fromString(severity),
                data
            });
        });

        return this;
    }

    end(): void {
        if (this.alreadyFinished) {
            throw new Error('Transaction end() called multiple times.');
        }

        this.alreadyFinished = true;
        this.transaction.finish();
    }

    captureError(err: any): void {
        if (this.hub) {
            this.hub.captureException(err);
        }
        else { // no huh (ie. noop mode)
            Sentry.captureException(err);
        }
    }

    captureMessage (
        message: string,
        severity: TracerSeverity = TracerSeverity.Info
    ): void {
        if (this.hub) {
            this.hub.captureMessage(message, Sentry.Severity.fromString(severity));
        }
        else { // no huh (ie. noop mode)
            Sentry.captureMessage(message, Sentry.Severity.fromString(severity));
        }
    }

}

export class Tracer {

    // used only by setMasterTransaction / getMasterTransaction
    protected static master: TracerTransaction;

    /**
     * Optional functionality. Ideally just ensure you never have to find it
     * and always just know it by placing it in some context of the unit of
     * work in question.
     */
    static findTransaction(): TracerTransaction {
        let transaction: TracerTransaction | null
            = TrasanctionStorage.getStore() as TracerTransaction;

        if (transaction) {
            return transaction;
        }
        else { // failed to find transaction; just return a noop version
            return new TracerTransaction();
        }
    }

    /**
     * Limited in functionality due to use of global, but avoids use
     * of async_hooks for situations where they are not available and there's
     * only ever one transaction or transactions can be guranteed to
     * happen in a series (rather then parallel).
     *
     * See: getMaster
     */
    static setMaster(transaction: TracerTransaction): void {
        if (this.master && ! this.master.alreadyFinished) {
            throw new Error('Tried to set new master while current master was still ongoing.');
        }

        this.master = transaction;
    }

    /**
     * Limited in functionality due to use of global, but avoids use
     * of async_hooks for situations where they are not available.
     *
     * See: setMaster
     */
    static getMaster(): TracerTransaction {
        if (this.master && ! this.master.alreadyFinished) {
            // this isn't fully correct, but better then nothing
            return this.master;
        }
        else { // no master transaction was set; just return noop
            return new TracerTransaction();
        }
    }

    /**
     * Create new transaction and start it.
     *
     * Use the method transaction.bind if you wish to propagate the
     * transaction via async_hooks
     */
    static transaction (
        op: string,
        name: string
    ): TracerTransaction {
        let transaction = new TracerTransaction();
        transaction.start(op, name);
        return transaction;
    }

}

Usage

const transaction = Tracer.transaction (
    "test", `Test Transaction ${name}`
);
transaction.setUser({ username: user });
transaction.addTag(`T${tagName} Tag 1`, 'yes');
transaction.addBreadcrumb (
    'step', `Transaction ${name} Action #1`
);
await transaction.bind(async () => {
    await randomThirdPartyIntegration(name);
});
// at any level after you called transaction.bind on something
const transaction = Tracer.findTransaction(); // returns noop version if it doesn't find one
try {
    // ...
}
catch (err) {
    transaction.captureError(err)
}
finally {
    transaction.end();
}

etc

@elliotleelewis
Copy link

One solution that worked for my team was to new up a Hub to pass around with each client request received by the NodeJS App (in our scenario we have an Apollo + Express GraphQL service). It's a little ugly, because you need to pass your client instance to the Hub constructor, and calling the global init method returns void instead of the client instance so you need to find which global methods give you that instance, but it does work.

@JVMartin
Copy link

This documentation: https://develop.sentry.dev/sdk/research/performance/#scope-propagation

Is incorrect, when it states:

JavaScript, for example, is single-threaded with an event loop and async code execution. There is no standard way to carry contextual data that works across async calls.

The standard way is https://nodejs.org/api/async_context.html

@JVMartin
Copy link

When this issue: #3751

Claims that:

@sentry/node that automatically clones a new hub per Node Domain

What is it talking about? @sentry/node does NOT use AsyncLocalStorage, so it's not possible to use @sentry/node without your spans getting jumbled into whatever random transaction is latest when you're running many async handlers in parallel.

@AbhiPrasad
Copy link
Member

@JVMartin - this is specific document referring to browser JavaScript, which currently has no mechanisms similar to async_hooks. For node.js we isolate contextual data using domains.

@JVMartin
Copy link

JVMartin commented Jan 31, 2023

@AbhiPrasad But @sentry/node doesn't use AsyncLocalStorage, so it suffers the jumbling of spans problem that this issue is tracking. It's not possible to use Sentry in Node without your spans getting randomly assigned to the latest transaction, instead of the unit of concurrency they should be. For instance, an http server processing 20 requests at the same time, where each request for example hits the pg auto-instrumentation, which calls the same global hub from all locations. Your pg query spans will be assigned to the wrong transactions, because obviously 20 different units of concurrency are all clobbering the same global.

@AbhiPrasad
Copy link
Member

But @sentry/node doesn't use AsyncLocalStorage

This is something we are tracking and working toward! #3660. It's also part of our roadmap this year: #5878

For instance, an http server processing 20 requests at the same time, where each request for example hits the pg auto-instrumentation, which calls the same global hub from all locations

If each request is isolated to it's own domain, it should get a unique hub, which means that we isolate scope/transactions/breadcrumbs correctly. We do this out of the box for our express, nextjs, and remix integrations.

If you wrap your request handler (or whatever your concurrency generation mechanism is) with a new domain, it should stay isolated.

I know domains are deprecated, but for now they work - and support all the wide variety of Sentry users we have (different node versions, deployment configs, etc.). This year we are going to look at adding async hooks (or similar) - so look forward to news there about this.

We might also take a slighty different approach though, like around what cloudflare workers did with AsyncLocalStorage, seeing if we can generate a more performant approach by directly hooking into v8 or similar. We've started investigating this as part of our production profiling sdk and support for local variables in stacktraces - still need to experiment! Any feedback/thoughts about that is greatly appreciated.

@JVMartin
Copy link

JVMartin commented Jan 31, 2023

@AbhiPrasad Thank you for your guidance. I'm glad to hear that there is a roadmap towards a resolution using AsyncLocalStorage here.

Can you please give me some advice for my specific use case? I use GraphQL via Apollo server on NestJS. There is no auto-instrumentation for this, since the auto-instrumentation does not support the way NestJS initializes Apollo: #5808 (comment)

I need each "unit of parallelism" to be a GraphQL query/mutation, not an http request, so I cannot use the Express implementation. (Each http request has many GraphQL queries that are processed in parallel.)

Is there anywhere that clearly documents how I can instrument anything (GraphQL or otherwise) in Sentry using Domains so that we don't get jumbled spans? Perhaps I could "reverse engineer" this from the @sentry/node instrumentation for Express, but that seems like a huge pain.

For instance, how does the pg auto-instrumentation use the same hub as the current Express request? Is this documented anywhere clearly such that it's easy to replicate for anything else we want to instrument, such as GraphQL queries or TypeORM queries, etc.?

@andreialecu
Copy link
Contributor

andreialecu commented Feb 1, 2023

Hey @JVMartin - I have exactly this working, but with mongoose.

I put up a gist here for inspiration: https://gist.github.com/andreialecu/40cb13c01b0d88c8163bbec3de59c580

@JVMartin
Copy link

JVMartin commented Feb 1, 2023

@andreialecu I don't understand - why would you use AsyncLocalStorage when Sentry uses Domains?

If you use AsyncLocalStorage instead of Domains, that means that none of the Sentry auto-instrumented monkey patched spans will be attached to the proper transactions. So you would need to re-instrument all of the packages that Sentry auto-instruments, which would be an extremely time intensive task.

Until Sentry switches to AsyncLocalStorage as the standard way for us all to share transaction context, I think we should all be using Domains to share transaction context, so we aren't fighting the framework.

@JVMartin
Copy link

JVMartin commented Feb 1, 2023

@andreialecu If you had just used Domains instead of AsyncLocalStorage, you wouldn't have needed to re-instrument mongoose since it's already auto-instrumented by Sentry: https://docs.sentry.io/platforms/node/performance/database/auto-instrument/

@andreialecu
Copy link
Contributor

I just tried to help you to get this working. 🤷‍♂️

Sentry's instrumentation was extremely slow every time I used it. This custom implementation has almost no performance impact.

@JVMartin
Copy link

JVMartin commented Feb 1, 2023

@andreialecu And I really appreciate you. I would love if you could help me instrument GraphQL using Domains, so that I don't have to spend months re-instrumenting every auto-instrumented package. I understand how to do it using AsyncLocalStorage, but I don't understand how to do it using Domains, and I would have to reverse-engineer e.g. the Express middleware that Sentry provides to try to figure it out.

@andreialecu
Copy link
Contributor

I'm not sure what "Domains" refers to. Is it the long deprecated nodejs feature: https://nodejs.org/api/domain.html ?

If Sentry is already using Domains for this, then I believe it's already clear by now that it's not practical and doesn't work well.

I think AsyncLocalStorage is the way to go about this issue. Perhaps the code I shared helps someone figure out how to integrate it.

@JVMartin
Copy link

JVMartin commented Feb 1, 2023

@andreialecu If you follow this thread above, you will see that Sentry is already knees-deep into moving from Domains to AsyncLocalStorage. However, it could be a year or longer before they ship that, so in the meantime, we should all use Domains so that we aren't fighting the framework. Or, if you want to use AsyncLocalStorage, first PR into the actual Sentry SDKs for Express / pg / mongoose / etc.

@timfish
Copy link
Collaborator

timfish commented Apr 6, 2023

There's an issue tracking progress towards fixing this.

@derekrprice
Copy link

@timfish, that tracks the Node.JS solution. Browsers will still be affected.

@timfish
Copy link
Collaborator

timfish commented Apr 6, 2023

That issue is tracking for Node.js but the abstraction that's being added could be used with Zone.js if you're ok with the bundle size impact and if/when the tc39 async context proposal lands in browsers that can be plugged in there too.

@AbhiPrasad
Copy link
Member

With the 7.48.0 release of the JS SDK we use AsyncLocalStorage instead of domains under the hood.

If you want to manually wrap methods in an async context to isolate breadcrumbs/scope/spans, you can use Sentry.runWithAsyncContext, which is documented here: https://docs.sentry.io/platforms/node/configuration/async-context/

Note the above only works with Node - in Browser JavaScript environments there is no async context mechanism we can use to accomplish this (like @timfish mentioned above).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests