Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serverless Function Tracing Needs an Isolated Scope #3572

Closed
5 tasks done
derekrprice opened this issue May 19, 2021 · 19 comments
Closed
5 tasks done

Serverless Function Tracing Needs an Isolated Scope #3572

derekrprice opened this issue May 19, 2021 · 19 comments
Labels
Package: node Issues related to the Sentry Node SDK Package: serverless Issues related to the Sentry Serverless SDK Type: Bug

Comments

@derekrprice
Copy link

Package + Version

  • @sentry/node 6.3.6

Version:

6.3.6

Description

Here is my withSentry wrapper for Azure serverless Function invocations:

const Sentry = require("@sentry/node");
const Tracing = require("@sentry/tracing");
const mongodb = require("mongodb");

const config = require("../../config/sentry");

if (config.dsn) {
    Sentry.init({
        ...config,
        integrations: [
            // enable DB calls tracing
            new Tracing.Integrations.Mongo({ useMongoose: true })
            // new Tracing.Integrations.Mysql()  // No integration available for mysql2.
        ]
    });
}

/**
 * Add Sentry transaction information to outgoing messages.
 * @param {*} transaction
 * @param {*} results
 * @returns
 */
const propagateTransaction = (transaction, results) => {
    if (results && transaction) {
        for (const message of Object.values(results)) {
            Object.assign(message, {
                sentryParentSpanId: transaction.spanId,
                sentryTraceId: transaction.traceId,
                sentrySampled: transaction.sampled
            });
        }
    }
    return results;
};

/**
 * A higher order function which logs error data to Sentry.
 *
 * @param {*} callback          The actual function which might throw errors.
 */
module.exports = callback => async (context, trigger, ...args) => {
    let transaction;
    if (config.dsn) {
        // Define a transaction for performance tracking.
        const transactionContext = {
            op: "serverless-task",
            name: context?.executionContext?.functionName || "test-function"
        };
        if ("sentryParentSpanId" in trigger) {
            Object.assign(transactionContext, {
                parentSpanId: trigger.sentryParentSpanId,
                traceId: trigger.sentryTraceId,
                sampled: trigger.sentrySampled
            });
        }
        transaction = Sentry.startTransaction(transactionContext);

        Sentry.configureScope(scope => {
            scope.setSpan(transaction);
            scope.setContext(
                "executionContext",
                context?.executionContext || {
                    errorMessage:
                        "No executionContext defined.  This could be a notification from a test environment."
                }
            );
            scope.setContext("functionTrigger", trigger);
        });
    }

    try {
        return propagateTransaction(
            transaction,
            await callback(context, trigger, ...args)
        );
    } catch (error) {
        if (config.dsn) {
            Sentry.captureException(error, scope => {
                scope.setContext(
                    "executionContext",
                    context?.executionContext || {
                        errorMessage:
                            "No executionContext defined.  This could be a notification from a test environment."
                    }
                );
                scope.setContext("functionTrigger", trigger);
                scope.setContext(
                    "theFullContext",
                    Object.assign(
                        {},
                        ...Object.keys(context).map(k => ({
                            [k]: JSON.stringify(context[k])
                        }))
                    )
                );
            });
            scope.setSpan(transaction);
            await Sentry.flush();
        }

        throw error;
    } finally {
        await transaction?.finish();
    }
};

Defining an Azure Function like this:

module.exports = withSentry(processLogFile);

This logs everything correctly for exceptions since the scope is set immediately before sending the log, but performance tracing associates DB children with the wrong span and other context. This appears to be because the DB integrations use the span set on the scope. Since we have a high volume application (on the order of 3400 invocations per minute), many function invocations run concurrently and, since Sentry.configureScope() configures the span in a global scope, logged child DB transactions are associated not with the invocation they actually ran during but instead with context of the function invocation that last called Sentry.configureScope(). In these images, there are dozens of MongoDB calls associated with the *_sessions function even though the inserts are actually all called by the parent *_timeseries function:
image
image

@derekrprice
Copy link
Author

Presumably, I would want to wrap calls in Sentry.withScope(), but it doesn't look like that can handle an async callback, so that won't work here.

@derekrprice
Copy link
Author

derekrprice commented May 19, 2021

Every Azure Function invocation is passed a unique "invocation ID" which you could probably use to index a map of scope stacks but the current Sentry SDK implementation's single global scope stack isn't going to cut it here.

@derekrprice
Copy link
Author

derekrprice commented May 20, 2021

Not the most elegant code, but so far I haven't been able to come up with anything better within the limitations of JS.
Something like this should work to replace the SDK's existing withScope. I do also need it to be async and return its caller's result because I need to be able to propagate any exceptions thrown inside the callback.

Basically, this maintains the scope map index as a dynamic function name - you would have to replace the SDK's current global stack with a global map, but that shouldn't be complicated. You wouldn't even need to use the function ID I suggested earlier - each call to withScope can simply supply its own retrievable unique key. My key generation is simplistic, but I think it should be sufficient for this use case. This even works when nested:

function getScopeKey() {
    const e = new Error();
    const key = e.stack.match(/(?<=as sentryScopeKeyIs:)(?:\d+)/);
    return key?.[0];
}

let uniqueKey = 1;
async function withScope (callback) {
    const id = uniqueKey++;
    const functionName = "as sentryScopeKeyIs:" + id;
    x[functionName] = function(callback) {
        return callback();
    }

    return x[functionName](callback);
}

// One level.
withScope(getScopeKey).then(scopeKey => console.log(scopeKey));

// Nested scopes.
withScope(async () => {
    return [
        getScopeKey(),
	await withScope(getScopeKey)
    ];
}).then(scopeKey => console.log(scopeKey));

@derekrprice
Copy link
Author

derekrprice commented May 20, 2021

This is tested in chrome, btw. Hopefully it works as well with node's chromium engine. Other browsers may need some sort of polyfill.
image

@derekrprice
Copy link
Author

Bump. Thought about this some more. Obviously having withScope become async would be a breaking change, but that isn't the most important part of this fix and I think if I passed withScope an async function, withScope itself wouldn't need to be async as long as it returned the result of the callback since I can await the returned promise then.

Anyhow, the rest of this is a really important change. Any chance of it getting into a release soon?

@derekrprice
Copy link
Author

Here is the revised code. Looks like I had left the dummy "x" object in my previous code global too, which is unneeded:

function getScopeKey() {
    const e = new Error();
    const key = e.stack.match(/(?<=as sentryScopeKeyIs:)(?:\d+)/);
    return key?.[0];
}

let uniqueKey = 1;
function withScope (callback) {
    const id = uniqueKey++;
    const functionName = "as sentryScopeKeyIs:" + id;
    const x = {
        [functionName]: function(callback) {
            return callback();
        }
    };

    return x[functionName](callback);
}

async function doApplicationStuff() {
    return getScopeKey();
}

// One level.
withScope(doApplicationStuff).then(scopeKey => console.log(scopeKey));

// Nested scopes.
withScope(async () => {
    return [
        getScopeKey(),
	withScope(doApplicationStuff)
    ];
}).then(scopeKey => console.log(scopeKey));

@kamilogorek
Copy link
Contributor

Have you tried to play around with manually pushing/popping scopes? This is how we do it in the AWSLabda case - https://github.com/getsentry/sentry-javascript/blob/master/packages/serverless/src/awslambda.ts#L267-L287

If it won't cut it, then it'll have to wait until we rework how integrations work, as using getCurrentHub doesn't play nicely with multiple client instances or with async code. This, however, requires a lot of internal refactoring, which will take some time until completed.

@derekrprice
Copy link
Author

Pushing and popping scopes does not work with async code. What is wrong with my suggested replacements for withScope and getScope? I believe it would be a non-breaking change.

@richardsimko
Copy link
Contributor

Any particular reason pushing and popping scopes doesn't work for you? I was in a similar situation and was tearing my hair off trying to find a solution to how to keep the Sentry context tied to a request in a web service environment. Passing Koa's context object around is not an option in real applications, so I don't know why the usage of push/pop scopes is so poorly documented when it comes to Sentry tracing.

Anyhow, this solved it for me:

// Mostly code from https://docs.sentry.io/platforms/node/guides/koa/
const transaction = sentry.startTransaction({
    name: `${reqMethod} ${reqUrl}`,
    op: 'http.server'
});
const hub = sentry.getCurrentHub();
try {
    // Push the scope
    const scope = hub.pushScope();
    scope.setSpan(transaction);
    context.res.on('finish', () => {
        transaction.setHttpStatus(context.status);
        transaction.finish();
    });
    await next();
} finally {
    // ... some other cleanup and finishing the transaction
    hub.popScope();
}

@derekrprice
Copy link
Author

@richardsimko: Because we have dozens, sometimes hundreds of threads running in parallel (yes, I understand that JS threads are virtual, but the end result is the same). So, every time a process awaits IO, another thread might pick up control. Actual flow ends up looking like:

thread 1: const scope = hub.pushScope();
thread 1: scope.setSpan(transaction);
thread 1: // ... do stuff that might await IO
thread 2: const scope = hub.pushScope();
thread 2: scope.setSpan(transaction);
thread 2: // ... do stuff that might await IO
thread 1: // do stuff that logs to thread 2's scope & transaction
thread 1: // ... do stuff that might await IO
thread 3: // ... and so on...

In practice, we have an Azure Functions app that receives dozens of queued requests in waves, and the waves sometimes come in quick succession. I believe that each listener is configured to receive up to 32 queued messages at a time and about a dozen listeners are running in parallel, listening to different queues. We end up seeing all the transactions from a wave attached to whichever was the last to call hub.pushScope() and register its transaction, and often transactions are even registered to the wrong app, since the individual application listeners are also running in parallel and pulling 32 messages each. I've seen HTTP calls and DB queries from what looks like over 100 discrete function invocations attached to a single function invocation, with the other all invocations registering as not having logged any HTTP or DB requests to Sentry.

@richardsimko
Copy link
Contributor

Since I don't know the details of your implementation I can't say for sure, but to me this sounds very similar to how a Node web server works. It also handles concurrent (Of course concurrent-ish, since Node is single-threaded as you say) requests and while waiting for slow async tasks it processes other things. This is exactly the scenario we're experiencing and from my understanding the scenario push/pop scope is intended to solve when reading the docs:

The default SDK integrations will push and pop scopes intelligently. For instance web framework integrations will create and destroy scopes around your routes or controllers.
https://docs.sentry.io/platforms/javascript/enriching-events/scopes/#whats-a-scope-whats-a-hub

(Although I disagree on the "intelligently" part, but that's a different discussion)

@derekrprice
Copy link
Author

It is exactly a Node app. Sentry's Node SDK's push and pop is very naïve and does not take into account multiple threads. There is a single shared hub object with a single stack. Push/pop was the first thing I tried. It did not work. Sentry's support and dev teams are aware of the issue but have not issued an ETA on providing a fix. From my discussions with them, they seem to think that it would require a complete rewrite of their scope handlers but, after poring over their code, I disagree. The problem is challenging, but I documented a workaround and a suggested path forward with sample code above.

There is, unfortunately, no way in JS to make a "context" object available to objects down the call stack without actually passing them as arguments to all intervening functions, which makes this a tough problem to solve in JS. My workaround, which encodes a key into a function name on the call stack and then uses a bogus exception to retrieve it from the stack trace, would work, but I haven't had time to make that into a PR for the Sentry SDK yet. I was hoping one of their developers would be inspired by my test code which proves the concept will work, but that hasn't happened yet. I will keep trying to get time to solve this for them here but other pressures are keeping my focus on other projects at the moment.

@CesarLanderos
Copy link

@derekrprice I happened to stumble with this, I was having the same issues, I recommend checking the AsyncLocalStorage API from node, I am pretty sure it will help you.

https://nodejs.org/api/async_context.html

@AbhiPrasad AbhiPrasad added Package: node Issues related to the Sentry Node SDK Type: Bug labels Oct 4, 2021
@andreialecu
Copy link
Contributor

Related: #4071

@derekrprice
Copy link
Author

derekrprice commented Oct 19, 2021

@richardsimko, check out #4071 for a detailed explanation of why you may not have experienced this issue yet with your web app. Basically, the problem is hidden in low traffic scenarios.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2021

This issue has gone three weeks without activity. In another week, I will close it.

But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!


"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀

@derekrprice
Copy link
Author

Don't be stale.

@mattfysh
Copy link

Could this not also be solved the same way the http handler works? i.e. running the wrapped handler in a new domain?

@HazAT HazAT added the Package: serverless Issues related to the Sentry Serverless SDK label Jan 25, 2023
@AbhiPrasad
Copy link
Member

With the 7.48.0 release of the JS SDK we use AsyncLocalStorage instead of domains under the hood, which should help with some of these problems.

If you want to manually wrap methods in an async context to isolate breadcrumbs/scope/spans, you can use Sentry.runWithAsyncContext, which is documented here: https://docs.sentry.io/platforms/node/configuration/async-context/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Package: node Issues related to the Sentry Node SDK Package: serverless Issues related to the Sentry Serverless SDK Type: Bug
Projects
None yet
Development

No branches or pull requests

8 participants