-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: open telemetry #4664
feat: open telemetry #4664
Conversation
2cb7f75
to
1536994
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly great work!
I mostly had questions since I'm not super familiar with OTel SDK.
I would also very much appreciate getting @edvald's eyes on this, since it touches on some critical parts of the code, in particular around the initialisation.
@@ -122,9 +123,9 @@ Use ${chalk.bold("up/down")} arrow keys to scroll through your command history. | |||
}, | |||
}) | |||
|
|||
useInput((input, key) => { | |||
useInput(bindActiveContext((input, key) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow here. Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SDK tracks the current context using AsyncLocalStorage.
Since those callbacks get executed at a random time in the future from an event emitter, they lose the automatic binding that would be in place normally. So for that reason, event handlers need to be bound explicitly to keep the context being propagated into the code that gets triggered by the event handler.
@@ -6,4 +6,7 @@ | |||
* file, You can obtain one at http://mozilla.org/MPL/2.0/. | |||
*/ | |||
|
|||
import { initTracing } from "./util/tracing/tracing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this in multiple places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this here so anything that when garden
is loaded, we immediately install all the hooks.
It might be redundant here but since the other initialization happened only in /cli
I wanted to ensure that there's at least also one entry point in /core
itself.
return otelSDK | ||
} | ||
|
||
if (!gardenEnv.GARDEN_ENABLE_TRACING) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super familiar with this flow, but shouldn't we just exit here as opposed to overwriting the env and continuing the function execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're still going to call OTEL SDK functions even if we don't initialize the SDK, the behavior of which we're not entirely sure about if we don't init the SDK.
Thus we went with OTEL_SDK_DISABLED
which ensures that all operations are no-ops on the OTEL SDK level.
@@ -1108,6 +1119,9 @@ export class Garden { | |||
}) | |||
} | |||
|
|||
@OtelTraced({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this is quite granular.
I think the end user will mostly care about:
- Overall command execution
- Config resolution
- Individual actions
That being said, this could be very useful for our own internal debugging. But wondering if we should discern between that and data that users are interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can build upon this and maybe have different detail levels.
I would do that as a follow up though.
@@ -353,6 +354,7 @@ function runPersistent({ | |||
env: { | |||
...getDefaultEnvVars(action), | |||
...(env ? mapValues(env, (v) => v + "") : {}), | |||
...getTracePropagationEnvVars(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed here?
If so, are there other places we could be missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that in the - admittedly very specific - case of someone calling a garden
command from within the exec
command, but not using gardenCommand
we still trace things through.
Also in theory any other piece of software that we spawn there and that follows the environment variable spec will be able to pick up the traces.
I didn't find any other obvious spots to add the propagation vars, but I may have missed some.
I looked for all execa
calls to see if they do something that might require the headers.
b79d4ad
to
d300a06
Compare
Thanks for the response! All of that make sense to me. I'd still very much like to get @edvald take before we merge. In particular since we're planning on a release on Monday. If Jón is super busy, I'm happy to merge on Monday after the release and just dog food this some. |
wip: trace commands wip: working tracing decorator wip: more tracing for build and deploy tasks chore: update TS refactor: remove explicitly typed `this` argument feat: add more traces feat: automatically propagate session context feat: make HTTP requests also add the session ID attributes feat: namespace garden attributes feat: OTLP exporter and other stuff chore: clearer naming of internal tasks feat: trace more things feat: more detailed tracing chore: minor fix to OTEL shutdown message feat: use finally for tracing wrapper fix: remove redundant tracing chore: support custom commands refactor: ensure OTEL SDK is always initialized first before anything else refactor: some renaming build: pin otel dependencies correctly chore: added an env. variable to control tracing, default false feat: propagate traces to exec commands via environment variables feat: trace dev mode chore: adding a timeout to the OTEL SDK shutdown refactor: decompose into smaller modules fix: lint error fix: rebase mistake fix: licenses fix: redundant async
d300a06
to
fc0c544
Compare
result = await cli.run({ args, exitOnError }) | ||
// initialize the tracing to capture the full cli execution | ||
result = await withContextFromEnv(() => | ||
wrapActiveSpan("garden", async (span) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a major thing, but stylistically I'd tend to prefer wrapping these functions in an object/namespace, e.g. tracing.withContextFromEnv
and tracing.wrapActiveSpan
. That's what I've been doing with the new plugin SDK for example, and it makes it a bit more obvious what these functions are about when reading code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case we'd have to import and re-export all the different functions from the different submodules since we have for example withContextFromEnv
in one module and wrapActiveSpan
in another.
I feel like then it would be preferable if we used an index.ts
that groups all the things together so they lie under util/tracing
directly. Then in there we have to group those all into an object since we can't really enforce the import * as tracing from "tracing"
syntax we could use if we simply just re-export everything.
To me that seems to introduce different redundant ways of importing those modules which can't really be enforced, or discourage modularization in order to prevent that redundancy.
I'm not sure that's worth it. Maybe what would be better is to use clear naming everywhere so it's always obvious it's withTracingContextFromEnv
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One quite simple way of doing that is a good ol' underscore prefix on the "private" functions re-exported in one place. I do also tend to prefer a named object over import * from...
though.
This is not crucial btw, just a thought for better readability and discoverability going forward.
* feat: OTEL tracing * chore: document tracing functions
What this PR does / why we need it:
Adds OTEL tracing support to get deep insights into the build performance.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer: