feat: open telemetry #4664

TimBeyer · 2023-06-20T09:51:39Z

What this PR does / why we need it:
Adds OTEL tracing support to get deep insights into the build performance.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

eysi09

Truly great work!

I mostly had questions since I'm not super familiar with OTel SDK.

I would also very much appreciate getting @edvald's eyes on this, since it touches on some critical parts of the code, in particular around the initialisation.

eysi09 · 2023-06-22T11:28:00Z

core/src/commands/dev.tsx

@@ -122,9 +123,9 @@ Use ${chalk.bold("up/down")} arrow keys to scroll through your command history.
        },
      })

-      useInput((input, key) => {
+      useInput(bindActiveContext((input, key) => {


Not sure I follow here. Why is this needed?

The SDK tracks the current context using AsyncLocalStorage.

Since those callbacks get executed at a random time in the future from an event emitter, they lose the automatic binding that would be in place normally. So for that reason, event handlers need to be bound explicitly to keep the context being propagated into the code that gets triggered by the event handler.

core/src/commands/dev.tsx

eysi09 · 2023-06-22T11:30:38Z

core/src/index.ts

@@ -6,4 +6,7 @@
 * file, You can obtain one at http://mozilla.org/MPL/2.0/.
 */

+import { initTracing } from "./util/tracing/tracing"


Why do we need this in multiple places?

I added this here so anything that when garden is loaded, we immediately install all the hooks.
It might be redundant here but since the other initialization happened only in /cli I wanted to ensure that there's at least also one entry point in /core itself.

eysi09 · 2023-06-22T11:32:16Z

core/src/util/tracing/tracing.ts

+    return otelSDK
+  }
+
+  if (!gardenEnv.GARDEN_ENABLE_TRACING) {


Not super familiar with this flow, but shouldn't we just exit here as opposed to overwriting the env and continuing the function execution.

We're still going to call OTEL SDK functions even if we don't initialize the SDK, the behavior of which we're not entirely sure about if we don't init the SDK.

Thus we went with OTEL_SDK_DISABLED which ensures that all operations are no-ops on the OTEL SDK level.

eysi09 · 2023-06-22T11:34:48Z

core/src/garden.ts

@@ -1108,6 +1119,9 @@ export class Garden {
    })
  }

+  @OtelTraced({


Some of this is quite granular.

I think the end user will mostly care about:

Overall command execution

Config resolution

Individual actions

That being said, this could be very useful for our own internal debugging. But wondering if we should discern between that and data that users are interested in.

I think we can build upon this and maybe have different detail levels.
I would do that as a follow up though.

eysi09 · 2023-06-22T11:37:55Z

core/src/plugins/exec/deploy.ts

@@ -353,6 +354,7 @@ function runPersistent({
    env: {
      ...getDefaultEnvVars(action),
      ...(env ? mapValues(env, (v) => v + "") : {}),
+      ...getTracePropagationEnvVars(),


Is this needed here?

If so, are there other places we could be missing?

This ensures that in the - admittedly very specific - case of someone calling a garden command from within the exec command, but not using gardenCommand we still trace things through.

Also in theory any other piece of software that we spawn there and that follows the environment variable spec will be able to pick up the traces.

I didn't find any other obvious spots to add the propagation vars, but I may have missed some.
I looked for all execa calls to see if they do something that might require the headers.

eysi09 · 2023-06-23T15:58:17Z

Thanks for the response! All of that make sense to me.

I'd still very much like to get @edvald take before we merge. In particular since we're planning on a release on Monday.

If Jón is super busy, I'm happy to merge on Monday after the release and just dog food this some.

wip: trace commands wip: working tracing decorator wip: more tracing for build and deploy tasks chore: update TS refactor: remove explicitly typed `this` argument feat: add more traces feat: automatically propagate session context feat: make HTTP requests also add the session ID attributes feat: namespace garden attributes feat: OTLP exporter and other stuff chore: clearer naming of internal tasks feat: trace more things feat: more detailed tracing chore: minor fix to OTEL shutdown message feat: use finally for tracing wrapper fix: remove redundant tracing chore: support custom commands refactor: ensure OTEL SDK is always initialized first before anything else refactor: some renaming build: pin otel dependencies correctly chore: added an env. variable to control tracing, default false feat: propagate traces to exec commands via environment variables feat: trace dev mode chore: adding a timeout to the OTEL SDK shutdown refactor: decompose into smaller modules fix: lint error fix: rebase mistake fix: licenses fix: redundant async

edvald · 2023-06-27T14:05:40Z

cli/src/cli.ts

-    result = await cli.run({ args, exitOnError })
+    // initialize the tracing to capture the full cli execution
+    result = await withContextFromEnv(() =>
+      wrapActiveSpan("garden", async (span) => {


Not a major thing, but stylistically I'd tend to prefer wrapping these functions in an object/namespace, e.g. tracing.withContextFromEnv and tracing.wrapActiveSpan. That's what I've been doing with the new plugin SDK for example, and it makes it a bit more obvious what these functions are about when reading code.

In that case we'd have to import and re-export all the different functions from the different submodules since we have for example withContextFromEnv in one module and wrapActiveSpan in another.

I feel like then it would be preferable if we used an index.ts that groups all the things together so they lie under util/tracing directly. Then in there we have to group those all into an object since we can't really enforce the import * as tracing from "tracing" syntax we could use if we simply just re-export everything.

To me that seems to introduce different redundant ways of importing those modules which can't really be enforced, or discourage modularization in order to prevent that redundancy.

I'm not sure that's worth it. Maybe what would be better is to use clear naming everywhere so it's always obvious it's withTracingContextFromEnv or something like that.

One quite simple way of doing that is a good ol' underscore prefix on the "private" functions re-exported in one place. I do also tend to prefer a named object over import * from... though.

This is not crucial btw, just a thought for better readability and discoverability going forward.

core/src/util/tracing/spans.ts

* feat: OTEL tracing * chore: document tracing functions

TimBeyer requested review from eysi09, edvald and 10ko and removed request for edvald June 20, 2023 10:13

TimBeyer force-pushed the feat/open-telemetry branch from 2cb7f75 to 1536994 Compare June 21, 2023 09:37

eysi09 reviewed Jun 22, 2023

View reviewed changes

TimBeyer force-pushed the feat/open-telemetry branch from b79d4ad to d300a06 Compare June 22, 2023 15:10

TimBeyer requested a review from edvald June 26, 2023 08:36

TimBeyer force-pushed the feat/open-telemetry branch from d300a06 to fc0c544 Compare June 26, 2023 08:46

eysi09 approved these changes Jun 27, 2023

View reviewed changes

edvald reviewed Jun 27, 2023

View reviewed changes

core/src/util/tracing/spans.ts Show resolved Hide resolved

chore: document tracing functions

1e8341e

TimBeyer merged commit 10aee8b into main Jun 28, 2023

TimBeyer deleted the feat/open-telemetry branch June 28, 2023 12:08

ShankyJS pushed a commit that referenced this pull request Jul 10, 2023

feat: open telemetry (#4664)

b225cb4

* feat: OTEL tracing * chore: document tracing functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: open telemetry #4664

feat: open telemetry #4664

TimBeyer commented Jun 20, 2023

eysi09 left a comment

eysi09 Jun 22, 2023

TimBeyer Jun 22, 2023

eysi09 Jun 22, 2023

TimBeyer Jun 22, 2023

eysi09 Jun 22, 2023

TimBeyer Jun 22, 2023

eysi09 Jun 22, 2023

TimBeyer Jun 22, 2023

eysi09 Jun 22, 2023

TimBeyer Jun 22, 2023

eysi09 commented Jun 23, 2023

edvald Jun 27, 2023

TimBeyer Jun 27, 2023

edvald Jun 27, 2023

feat: open telemetry #4664

feat: open telemetry #4664

Conversation

TimBeyer commented Jun 20, 2023

eysi09 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eysi09 commented Jun 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment