You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@peated/worker - a faktory-based async worker (think Celery)
@peated/cli - manual command line driven interactions (think a backfill)
Running @sentry/node-experimental wherever possible, otherwise generally latest version of relevant SDKs.
The Objective
We want perfect instrumentation. That means:
Traces start in three locations:
@peated/web (the Remix server hydration)
@peated/cli
@peated/worker (for things like cron jobs)
Traces have OTel-specified standards and semantics, and are complete end to end, with the correct root
All core fields have the correct data (such as the transaction field in errors)
All errors have traces (unless Sentry otherwise hard-prevents us), and spans create an actionable debug path for errors.
A single project, as we're modeling a micro service architecture, using a single DSN and a service tag to replicate upcoming changes.
The Numerous Issues
Traces are really fucking hard to instrument
We rely way too much on magical instrumentation, and while I know this is changing, it hasn't changed fast enough.
I'm using an experimental SDK because not only is it our future direction, but the auto instrumentation is superior.
Its not obvious how to actually use our SDK in many cases, because I can't cleanly always use OTel's documentation. For example, I don't ever create 'tracers', but instead call Sentry's startSpan. Our docs contain completely separate sections on how to actually implement tracing, and none of them are complete.
🎯 Should we expose more of OTel's API?
Sentry's implementation sometimes looks similar to OTel but isn't compatible. For example, startSpan has different args than OTel's implementation. Additionally I believe OTel flips the meaning of startSpan. We should change our terminology to avoid confusing developers.
🎯 Similar to above: are we moving away from OTel too much?
startSpan requires a callback, but not all code is callback driven. For example, my CLI uses commander, which uses a standard hook pattern (e.g. beforeCommand and afterCommand). I can't easily use a callback to instrument that. If I try to use startInactiveSpan it actually doesn't associate with the current parent (at least per our docs) so that has even more issues.
🎯 Again is this something we need to look at the OTel API and enable? Do we have a similar use case with our tools?
// this does not work correctly at all because of the missing parentprogram.name("peated").description("CLI for assisting with Peated").hook("preSubcommand",async(thisCommand,subcommand)=>{(thisCommandasMutatedCommand).span=Sentry.startInactiveSpan({op: "command",name: `commander.${subcommand.name()}`,});}).hook("postAction",async(thisCommand)=>{awaitshutdownClient();if((thisCommandasMutatedCommand).span){(thisCommandasMutatedCommand).span.end();}});```5) Continuing traces is needlessly complicated. OTel's APIs aren't any better here. We need obvious APIs to get the current traceparent, and to pass in a traceparent. These should be available as high level interfaces and not have complex abstractions.For example, to grab the current context:```typescript// pull out traceparent to forward to faktory jobconstactiveContext={};propagation.inject(context.active(),activeContext);awaitclient.job(jobName,args,{traceContext: activeContext}).push();
Using op and name, I believe per OTel's recs, does not play nicely with Sentry. We group on name and ignore op, which means we create transaction summaries that have mixed operations and makes things less actionable.
🎯 can we get reference to specifics of OTel here?
🎯 this leads to txn names and breaking error links and display name in Performance, detail below
Transaction is deprecated and y'all keep ignoring that we have an error monitoring product
scope.setTransaction binds event.transaction, which exists for very explicit reasons. It makes it much faster to triage an error as a human when the culprit is shown as myCoolRouteName than it does with openai.ts(lib:foo). Additionally that same transaction gives me reasonable aggregates if it happens on different routes, and in an ideal world, maps up to the same name as the span (meaning I can cross reference data).
There are additional problems here that have continued to exist for fucking ages, such as needing to call two different APIs to update the current transaction name, and now we have deprecated setTransaction without implementing a valid replacement in the experimental SDK.
🎯 do we just need a setOpName or setTransactionName itself, i think we have a category list of APIs and names
In general, transaction MUST exist on errors, and almost certainly should be tightly coupled to the span (in which the span should probably always exists, rather than having two discrete code paths).
SourceMaps still suck
I can't even be asked to try to spend more time to set these up because its never fun and the debug timelime often takes hours. Not sure if I have any feedback here other than this is very unfun and it'd be a lot easier if I could just publish .map files and Sentry can pick up the slack.
Trace explorer is useless
📈 <3
Half the time I can't get our instrumentation right, and it requires perfection instrumentation.
Even if I get instrumentation right, I will likely get sampled, and again, the trace explorer requires perfect instrumentation
Its broken by design, and more importantly, it makes me think our product doesn't work. If I were a real customer doing a POC, I would have used a competitor by this point because of how broken the product looks.
I need spans to debug JavaScript
Stack traces are useless more and more frequently, and spans give you the picture of inputs. That is, a valid trace and span tree tells me how an error happened, and is almost required to be able to debug modern javascript errors.
🐛 why are they useless?
Except the span tree is buried in a trace explorer which is empty half the time or otherwise broken, and then error page itself is filled with upsells instead of just rendering the information that helps me debug the error.
🐛 issue details page declutter project 🙏
Replays are not the answer, and should be reserved for outlier conditions that are harder to debug (let alone, they are not useful on backend errors in many situations).
My feed is filled with performance issues
I literally only care about errors 99% of the time. I want to seek out performance issues, not be forced to triage them non-stop.
🐛 perf issue noise
trpc errors suck
🐛 otel gives us intsrumention but just performance, meaning SDK data might not be best for errors
🐛 i also always found breadcrumbs to be inconsistent and repetitive but customers like them supposedly
They end up with a loop of information because they're a buffer and not a tree. They need to have a hard sit down on why they continue to exist when a span tree is often the right solution to diagnostics.
For example:
for item in list:
do_a_thing_that_gracefully_fails_and_logs()
Each of those will pollute breadcrumbs even tho most crumbs are only relevant to the specific function call instance.
wtf is Object.<anonymous>
🐛 is there anything we can do with anonymous? Can we inform the user better? Like with figma, detect and inform that is a platform issue
This should somehow have a route name or url or something, and it should have it because of OTel's superior instrumentation, but seemingly I'm missing a trace and I've got an absolute trash error.
The content you are editing has changed. Please copy your edits and refresh the page.
Cramer's Experience
The Setup
Four services, all in TypeScript:
Running @sentry/node-experimental wherever possible, otherwise generally latest version of relevant SDKs.
The Objective
We want perfect instrumentation. That means:
transaction
field in errors)service
tag to replicate upcoming changes.The Numerous Issues
Traces are really fucking hard to instrument
We rely way too much on magical instrumentation, and while I know this is changing, it hasn't changed fast enough.
I'm using an experimental SDK because not only is it our future direction, but the auto instrumentation is superior.
Its not obvious how to actually use our SDK in many cases, because I can't cleanly always use OTel's documentation. For example, I don't ever create 'tracers', but instead call Sentry's
startSpan
. Our docs contain completely separate sections on how to actually implement tracing, and none of them are complete.Should we expose more of OTel's API?
startSpan
has different args than OTel's implementation. Additionally I believe OTel flips the meaning of startSpan. We should change our terminology to avoid confusing developers.Similar to above: are we moving away from OTel too much?
startSpan
requires a callback, but not all code is callback driven. For example, my CLI usescommander
, which uses a standard hook pattern (e.g. beforeCommand and afterCommand). I can't easily use a callback to instrument that. If I try to usestartInactiveSpan
it actually doesn't associate with the current parent (at least per our docs) so that has even more issues.Again is this something we need to look at the OTel API and enable? Do we have a similar use case with our tools?
And injecting it is even worse:
op
andname
, I believe per OTel's recs, does not play nicely with Sentry. We group onname
and ignoreop
, which means we create transaction summaries that have mixed operations and makes things less actionable.can we get reference to specifics of OTel here?
this leads to txn names and breaking error links and display name in Performance, detail below
Transaction is deprecated and y'all keep ignoring that we have an error monitoring product
scope.setTransaction
bindsevent.transaction
, which exists for very explicit reasons. It makes it much faster to triage an error as a human when the culprit is shown asmyCoolRouteName
than it does withopenai.ts(lib:foo)
. Additionally that same transaction gives me reasonable aggregates if it happens on different routes, and in an ideal world, maps up to the same name as the span (meaning I can cross reference data).https://peated.sentry.io/issues/5007350914/?query=is%3Aunresolved&referrer=issue-stream&stream_index=17
There are additional problems here that have continued to exist for fucking ages, such as needing to call two different APIs to update the current transaction name, and now we have deprecated
setTransaction
without implementing a valid replacement in the experimental SDK.do we just need a setOpName or setTransactionName itself, i think we have a category list of APIs and names
In general,
transaction
MUST exist on errors, and almost certainly should be tightly coupled to the span (in which the span should probably always exists, rather than having two discrete code paths).SourceMaps still suck
I can't even be asked to try to spend more time to set these up because its never fun and the debug timelime often takes hours. Not sure if I have any feedback here other than this is very unfun and it'd be a lot easier if I could just publish .map files and Sentry can pick up the slack.
Trace explorer is useless
<3
Half the time I can't get our instrumentation right, and it requires perfection instrumentation.
Even if I get instrumentation right, I will likely get sampled, and again, the trace explorer requires perfect instrumentation
Its broken by design, and more importantly, it makes me think our product doesn't work. If I were a real customer doing a POC, I would have used a competitor by this point because of how broken the product looks.
I need spans to debug JavaScript
Stack traces are useless more and more frequently, and spans give you the picture of inputs. That is, a valid trace and span tree tells me how an error happened, and is almost required to be able to debug modern javascript errors.
why are they useless?
Except the span tree is buried in a trace explorer which is empty half the time or otherwise broken, and then error page itself is filled with upsells instead of just rendering the information that helps me debug the error.
issue details page declutter project
🙏Replays are not the answer, and should be reserved for outlier conditions that are harder to debug (let alone, they are not useful on backend errors in many situations).
My feed is filled with performance issues
I literally only care about errors 99% of the time. I want to seek out performance issues, not be forced to triage them non-stop.
perf issue noise
trpc errors suck
otel gives us intsrumention but just performance, meaning SDK data might not be best for errors
https://peated.sentry.io/issues/4630767004/?query=is%3Aunresolved&referrer=issue-stream&stream_index=2
So do Zod errors and a bunch of other shit.
https://peated.sentry.io/issues/5007350836/?query=is%3Aunresolved&referrer=issue-stream&stream_index=10
Breadcrumbs are useless in so many contexts
i also always found breadcrumbs to be inconsistent and repetitive but customers like them supposedly
They end up with a loop of information because they're a buffer and not a tree. They need to have a hard sit down on why they continue to exist when a span tree is often the right solution to diagnostics.
For example:
Each of those will pollute breadcrumbs even tho most crumbs are only relevant to the specific function call instance.
wtf is
Object.<anonymous>
is there anything we can do with anonymous? Can we inform the user better? Like with figma, detect and inform that is a platform issue
https://peated.sentry.io/issues/4546294867/?query=is%3Aunresolved&referrer=issue-stream&stream_index=13
wtf is
Object captured as exception with keys: data, error, internal, status
data issue or product inform issue?
https://peated.sentry.io/issues/5010841858/?query=is%3Aunresolved&referrer=issue-stream&stream_index=14
Transactions are shit all over the place even with OTel's better instrumentation
need further investigation data quality
https://peated.sentry.io/issues/5006626006/?query=is%3Aunresolved&referrer=issue-stream&statsPeriod=14d&stream_index=0
This should somehow have a route name or url or something, and it should have it because of OTel's superior instrumentation, but seemingly I'm missing a trace and I've got an absolute trash error.
SDK API 🎯
Issues Platform
The text was updated successfully, but these errors were encountered: