Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP*HTTP Exporter: Custom HTTP client #2632

Open
roidelapluie opened this issue Feb 22, 2022 · 9 comments
Open

OTLP*HTTP Exporter: Custom HTTP client #2632

roidelapluie opened this issue Feb 22, 2022 · 9 comments
Labels
enhancement New feature or request pkg:exporter:otlp Related to the OTLP exporter package

Comments

@roidelapluie
Copy link

Problem Statement

It would be useful to be able to pass a custom HTTP client to this instrumentation library.

Proposed Solution

Add a WithClient() option.

Alternatives

Configuring everything with options, but it's quite complex and it costs a lot when you already have a fully featured http client.

Additional Context

I would use this for Prometheus, which has a fully featured http client.

@roidelapluie roidelapluie added the enhancement New feature or request label Feb 22, 2022
@MrAlias
Copy link
Contributor

MrAlias commented Feb 22, 2022

Is this a request related to the otelhttp instrumentation?

@Aneurysm9
Copy link
Member

The otelhttp instrumentation provides a RoundTripper that should be integrated into your client construction logic. This seems like a logical place to introduce a new option to configure an otelhttp.Transport.

@roidelapluie
Copy link
Author

This is about the HTTP client that sends the traces to the collector, not the HTTP client that we want to trace. E.g. our roundtripper can read passwords from files, and I'd like to be able to use the same abilities to speak to the OTLP receivers.

@Aneurysm9 Aneurysm9 changed the title Custom HTTP client OTLP*HTTP Exporter: Custom HTTP client Feb 22, 2022
@Aneurysm9 Aneurysm9 added the pkg:exporter:otlp Related to the OTLP exporter package label Feb 22, 2022
@MadVikingGod
Copy link
Contributor

MadVikingGod commented Feb 23, 2022

This issue has been raised a number of times. I'm not in favor of it because of the ease with which you can create loops.

Let's say somewhere you use the contribs net/http to do:

// Instrument all of our library HTTP calls
stdhttp.DefaultClient = &stdhttp.Client{
    Transport: contribhttp.NewTransport(...)
}

And later someone comes along and does

exp, err := otlptracehttp.New(context.Background(),
	otlptracehttp.WithHTTPClient(http.DefaultClient),
)

Now, every time an HTTP request is made, it will use the instrumented client to make a request, which will generate more traces. Forever.

Any solution to this should also include some protection that we don't create these loops, or that the instrumentation has some escape hatch to not add tracing to the exports.

@Aneurysm9
Copy link
Member

@lmolkova was working on specifying a mechanism for suppressing instrumentation from multiple layers of clients. I'm not sure if it's perfectly applicable here, but it would probably be worth looking at that OTEP to see if it aligns with the need to prevent instrumentation loops in exporters.

@eloo
Copy link

eloo commented Feb 14, 2023

looks like without this issue the oidc extension is mostly useless at it is not possible to update the used token

https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/oidcauthextension

maybe another possible solution could be to use a function as value for headers.. so it can be implemented more dynamically

@lmolkova
Copy link

lmolkova commented Feb 14, 2023

We're looking into adding OTLP endpoint with AAD (oauth2, etc) auth on Azure Monitor and would be intersted in passing custom client that would allow this.

I.e. we'd want to enable the following flow:

  • exporter sends batch to OTLP HTTP endpoint
  • endpoint responds with 401
  • custom client in exporter (or auth extension in collector) then goes and requests an auth token in any way it wants
  • and retries original export batch request now with valid token. The endpoint from now on (and until token expires) responds with success
  • at some point token expires and is refreshed using the same flow

This should be a common concern for any HTTP-based OTLP endpoint with auth enabled.

If it helps, OTLP exporter in .NET provides a similar option: https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry.Exporter.OpenTelemetryProtocol#configure-httpclient

@lprakashv
Copy link

@MadVikingGod can we add the feature to atleast provide a specifications (config) of the underlying custom http.Transport and we can build that client internally. This will be really helpful in implementing a trivial task of adding the oauth2 client auth support.

This will eliminate the usage an already instrumented client as the passed config/spec will not be the exact http.Client or http.Transport. An alternate way would be to a clone of provided http.Client and overriding certain fields to avoid the instrumentation to kick in.

@patrislav
Copy link

We've hit this issue when trying to export traces from an environment without external connectivity (a secure enclave.) We need to create a custom http.Transport with the Proxy and DialContext methods overwritten, which allows us to send requests through a non-TCP socket.

Currently, the only workaround that works for us is to not use the otlptracehttp package entirely, relying on a custom exporter instead. However, it'd be much simpler if we could just pass a custom client or at least an http.Transport.

vito added a commit to vito/dagger that referenced this issue Jun 6, 2024
OTel doesn't support injecting a transport, so we have to do this from
the outside. :(

open-telemetry/opentelemetry-go#2632

Signed-off-by: Alex Suraci <[email protected]>
vito added a commit to dagger/dagger that referenced this issue Jun 7, 2024
* use official otel log sdk, dedupe telemetry pkg

* dagger.io/dagger/telemetry is now the canonical package for
  integrating Dagger with OpenTelemetry (engine, CLI, Go SDK)
* add engine/telemetry for engine and CLI specific telemetry (labels,
  cloud, engine pub/sub, etc)

Signed-off-by: Alex Suraci <[email protected]>

* log dialing errors

Signed-off-by: Alex Suraci <[email protected]>

* session: ignore SIGPIPE, don't panic

if either of these happen ther CLI-side span won't be finished and
telemetry won't drain.

The same fix was applied recently for stdout, but I observed SIGPIPE
happening on write(2) - stderr. I suspect this could still happen for
stdout. My guess is it happens when the parent process goes away without
waiting for the process. I couldn't find where that's happening (if it
is happening), so I think the safest bet is to just drop the signal,
since we should be shutting down around then anyway with stdin also
being closed; might as well do it cleanly.

Also, I didn't see the panic(err) become an issue, but it happens in a
goroutine, which AFAIK is un-recoverable and won't run defers. Better to
just log to stderr.

Signed-off-by: Alex Suraci <[email protected]>

* ci: test custom: configurable -count for tests

Signed-off-by: Alex Suraci <[email protected]>

* sdk/go: heartbeatinga (TODO replace)

committing for now since I lost it at one point and it's better than
nothing

but really this responsibility should move outside of the SDK, probably
with an exporter, in both the CLI and the engine

Signed-off-by: Alex Suraci <[email protected]>

* i retire

Signed-off-by: Alex Suraci <[email protected]>

* TODO write commit message

Signed-off-by: Alex Suraci <[email protected]>

* lints

Signed-off-by: Alex Suraci <[email protected]>

* disable test otel integration

Cloud can't handle this yet, bring back later with testctx for proper
instrumentation, ideally also with proper parallelism

Signed-off-by: Alex Suraci <[email protected]>

* wip: enable otel for tests, see if it passes after all

Signed-off-by: Alex Suraci <[email protected]>

* tui: show spans IDs with --debug

very helpful when troubleshooting otel issues

Signed-off-by: Alex Suraci <[email protected]>

* plain frontend: show span IDs with --debug

Signed-off-by: Alex Suraci <[email protected]>

* ci: support building engine and CLI with -race

for hack/dev, set RACE=1

Signed-off-by: Alex Suraci <[email protected]>

* pubsub: finish draining if parent spans complete

this will prevent hangs in situations where a process exits abruptly
without cleaninly finishing all the spans it starts, as seems to happen
with go test -failfast, but in principle could happen just about
anywhere.

Signed-off-by: Alex Suraci <[email protected]>

* pubsub: emit to no one when no clients

Signed-off-by: Alex Suraci <[email protected]>

* add test for multiple clients

this passes reliably against hack/dev but seems to be flaky in dagger
call. still investigating.

Signed-off-by: Alex Suraci <[email protected]>

* assertion against error output

otherwise this can false pass because --debug prints the span that sets
up the code

Signed-off-by: Alex Suraci <[email protected]>

* frontend: show hashes with verbose=1

Signed-off-by: Alex Suraci <[email protected]>

* show client ID, helps with troubleshooting

Signed-off-by: Alex Suraci <[email protected]>

* fix data race with ctx var capture

this one is spooky - theoretically this would put a 10 minute timeout on
the connection

Signed-off-by: Alex Suraci <[email protected]>

* withDeadlineConn: fix data race

Signed-off-by: Alex Suraci <[email protected]>

* include stderr when session exits poorly

Signed-off-by: Alex Suraci <[email protected]>

* fix frontend not printing final frame

root cause of TestModuleConstructors/return_error/go

Signed-off-by: Alex Suraci <[email protected]>

* fix data race in heartbeating

Signed-off-by: Alex Suraci <[email protected]>

* move span heartbeating to Cloud exporter

Signed-off-by: Alex Suraci <[email protected]>

* remove unused exporter

Signed-off-by: Alex Suraci <[email protected]>

* lint

Signed-off-by: Alex Suraci <[email protected]>

* test that logs only get printed once

Signed-off-by: Alex Suraci <[email protected]>

* fix comments and client names

Signed-off-by: Alex Suraci <[email protected]>

* drain logs too, refactor logging infra

* otel logging now respects Close() which sends an 'eof' log record,
  indicated by stdio.eof=true.
* engine pub/sub now drains until it sees stdio.eof=true for each
  stream. this fixes the integration test which needed to seep(1s).
* refactor logging helpers to reduce duplication.
* retire ioctx; redundant with otel logging. still used for dagql tests
  though, so moved to internal.
* use an attr to indicate global logs, rather than a different library
  using library made little sense since you'd still want to know that
  info from the different call sites. using an attr is the same amount
  of effort without overloading a field.

Signed-off-by: Alex Suraci <[email protected]>

* disable gocritic.ifElseChain

this one is silly. switch carries subtle implications, and breaks break.
don't want to have to use labels to make this linter happy.

Signed-off-by: Alex Suraci <[email protected]>

* client stream test: work around log limitations

Signed-off-by: Alex Suraci <[email protected]>

* const-ify

Signed-off-by: Alex Suraci <[email protected]>

* lint

Signed-off-by: Alex Suraci <[email protected]>

* remove accidentally-committed file

Signed-off-by: Alex Suraci <[email protected]>

* sdk/go: use env carrier

Signed-off-by: Alex Suraci <[email protected]>

* auto-refresh Cloud exporter token

OTel doesn't support injecting a transport, so we have to do this from
the outside. :(

open-telemetry/opentelemetry-go#2632

Signed-off-by: Alex Suraci <[email protected]>

* remove early client ID setup

this isn't really load-bearing and raises more questions than it answers

Signed-off-by: Alex Suraci <[email protected]>

* fix heartbeating

this was broken in two ways that canceled each other out:

* deadlocked on first heartbeat
* never cleared out active spans, so it'd constantly make everything
  'running'

Signed-off-by: Alex Suraci <[email protected]>

* send function logs + spans to API call

* retire dagger.io/ui.mask - white whale defeated
* simplify TS/Go/Python SDKs - don't need to do the mask/passthrough
  dance anymore

Signed-off-by: Alex Suraci <[email protected]>

* pubsub: abandon child spans on completion

way simpler implementation

Signed-off-by: Alex Suraci <[email protected]>

* python lint

Signed-off-by: Alex Suraci <[email protected]>

* untie ClientMetadataMetaKey knot

Signed-off-by: Alex Suraci <[email protected]>

* simplify otel proxy setup

(*http.Server).Shutdown handles most of what we needed before

Signed-off-by: Alex Suraci <[email protected]>

* remove dead code

Signed-off-by: Alex Suraci <[email protected]>

* tui: move digests to -vvv

Signed-off-by: Alex Suraci <[email protected]>

* hide exec /runtime when redirecting telemetry

Signed-off-by: Alex Suraci <[email protected]>

* propagate log level config to engine client

* reduce telemetry logs down to debug
* fix debug var being clobbered via double-flag-parsing
  * --debug etc get parsed twice: once before function flags are
    learned, and again after. so we just need to pass in the current
    value. unfortunately --verbosity gets clobbered but that doesn't
    matter yet.

Signed-off-by: Alex Suraci <[email protected]>

* frontend: don't print blank line on EOF

Signed-off-by: Alex Suraci <[email protected]>

* use http.Error, nix comment

Signed-off-by: Alex Suraci <[email protected]>

* add sanity check to avoid infinite loop

Signed-off-by: Alex Suraci <[email protected]>

* link to SIGPIPE docs

Signed-off-by: Alex Suraci <[email protected]>

* plain: handle EOF log record for primary logs

Signed-off-by: Alex Suraci <[email protected]>

* plain: don't render logs if --silent

Signed-off-by: Alex Suraci <[email protected]>

* session: trigger exit on stdout write fail

Signed-off-by: Alex Suraci <[email protected]>

* simplify pub/sub client tracking

* spans don't inherit their parent span's ID anymore. I don't think we
  actually need this, since we always step recursively through the span
  ancestry to collect the set of relevant client IDs. better not to
  funge data if we don't need to.
* clean up spanDone state, not needed with new child abandonment scheme
* clean up all state for a trace when all clients go away. this way we
  also clean up spans that don't have an associated client.

Signed-off-by: Alex Suraci <[email protected]>

* ci: add spans to go lint

Signed-off-by: Alex Suraci <[email protected]>

* fix: plain finalRender should disable the context hold

Signed-off-by: Justin Chadwell <[email protected]>

---------

Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Justin Chadwell <[email protected]>
Co-authored-by: Justin Chadwell <[email protected]>
dagger-ci pushed a commit to dagger/dagger-go-sdk that referenced this issue Jun 12, 2024
* use official otel log sdk, dedupe telemetry pkg

* dagger.io/dagger/telemetry is now the canonical package for
  integrating Dagger with OpenTelemetry (engine, CLI, Go SDK)
* add engine/telemetry for engine and CLI specific telemetry (labels,
  cloud, engine pub/sub, etc)

Signed-off-by: Alex Suraci <[email protected]>

* log dialing errors

Signed-off-by: Alex Suraci <[email protected]>

* session: ignore SIGPIPE, don't panic

if either of these happen ther CLI-side span won't be finished and
telemetry won't drain.

The same fix was applied recently for stdout, but I observed SIGPIPE
happening on write(2) - stderr. I suspect this could still happen for
stdout. My guess is it happens when the parent process goes away without
waiting for the process. I couldn't find where that's happening (if it
is happening), so I think the safest bet is to just drop the signal,
since we should be shutting down around then anyway with stdin also
being closed; might as well do it cleanly.

Also, I didn't see the panic(err) become an issue, but it happens in a
goroutine, which AFAIK is un-recoverable and won't run defers. Better to
just log to stderr.

Signed-off-by: Alex Suraci <[email protected]>

* ci: test custom: configurable -count for tests

Signed-off-by: Alex Suraci <[email protected]>

* sdk/go: heartbeatinga (TODO replace)

committing for now since I lost it at one point and it's better than
nothing

but really this responsibility should move outside of the SDK, probably
with an exporter, in both the CLI and the engine

Signed-off-by: Alex Suraci <[email protected]>

* i retire

Signed-off-by: Alex Suraci <[email protected]>

* TODO write commit message

Signed-off-by: Alex Suraci <[email protected]>

* lints

Signed-off-by: Alex Suraci <[email protected]>

* disable test otel integration

Cloud can't handle this yet, bring back later with testctx for proper
instrumentation, ideally also with proper parallelism

Signed-off-by: Alex Suraci <[email protected]>

* wip: enable otel for tests, see if it passes after all

Signed-off-by: Alex Suraci <[email protected]>

* tui: show spans IDs with --debug

very helpful when troubleshooting otel issues

Signed-off-by: Alex Suraci <[email protected]>

* plain frontend: show span IDs with --debug

Signed-off-by: Alex Suraci <[email protected]>

* ci: support building engine and CLI with -race

for hack/dev, set RACE=1

Signed-off-by: Alex Suraci <[email protected]>

* pubsub: finish draining if parent spans complete

this will prevent hangs in situations where a process exits abruptly
without cleaninly finishing all the spans it starts, as seems to happen
with go test -failfast, but in principle could happen just about
anywhere.

Signed-off-by: Alex Suraci <[email protected]>

* pubsub: emit to no one when no clients

Signed-off-by: Alex Suraci <[email protected]>

* add test for multiple clients

this passes reliably against hack/dev but seems to be flaky in dagger
call. still investigating.

Signed-off-by: Alex Suraci <[email protected]>

* assertion against error output

otherwise this can false pass because --debug prints the span that sets
up the code

Signed-off-by: Alex Suraci <[email protected]>

* frontend: show hashes with verbose=1

Signed-off-by: Alex Suraci <[email protected]>

* show client ID, helps with troubleshooting

Signed-off-by: Alex Suraci <[email protected]>

* fix data race with ctx var capture

this one is spooky - theoretically this would put a 10 minute timeout on
the connection

Signed-off-by: Alex Suraci <[email protected]>

* withDeadlineConn: fix data race

Signed-off-by: Alex Suraci <[email protected]>

* include stderr when session exits poorly

Signed-off-by: Alex Suraci <[email protected]>

* fix frontend not printing final frame

root cause of TestModuleConstructors/return_error/go

Signed-off-by: Alex Suraci <[email protected]>

* fix data race in heartbeating

Signed-off-by: Alex Suraci <[email protected]>

* move span heartbeating to Cloud exporter

Signed-off-by: Alex Suraci <[email protected]>

* remove unused exporter

Signed-off-by: Alex Suraci <[email protected]>

* lint

Signed-off-by: Alex Suraci <[email protected]>

* test that logs only get printed once

Signed-off-by: Alex Suraci <[email protected]>

* fix comments and client names

Signed-off-by: Alex Suraci <[email protected]>

* drain logs too, refactor logging infra

* otel logging now respects Close() which sends an 'eof' log record,
  indicated by stdio.eof=true.
* engine pub/sub now drains until it sees stdio.eof=true for each
  stream. this fixes the integration test which needed to seep(1s).
* refactor logging helpers to reduce duplication.
* retire ioctx; redundant with otel logging. still used for dagql tests
  though, so moved to internal.
* use an attr to indicate global logs, rather than a different library
  using library made little sense since you'd still want to know that
  info from the different call sites. using an attr is the same amount
  of effort without overloading a field.

Signed-off-by: Alex Suraci <[email protected]>

* disable gocritic.ifElseChain

this one is silly. switch carries subtle implications, and breaks break.
don't want to have to use labels to make this linter happy.

Signed-off-by: Alex Suraci <[email protected]>

* client stream test: work around log limitations

Signed-off-by: Alex Suraci <[email protected]>

* const-ify

Signed-off-by: Alex Suraci <[email protected]>

* lint

Signed-off-by: Alex Suraci <[email protected]>

* remove accidentally-committed file

Signed-off-by: Alex Suraci <[email protected]>

* sdk/go: use env carrier

Signed-off-by: Alex Suraci <[email protected]>

* auto-refresh Cloud exporter token

OTel doesn't support injecting a transport, so we have to do this from
the outside. :(

open-telemetry/opentelemetry-go#2632

Signed-off-by: Alex Suraci <[email protected]>

* remove early client ID setup

this isn't really load-bearing and raises more questions than it answers

Signed-off-by: Alex Suraci <[email protected]>

* fix heartbeating

this was broken in two ways that canceled each other out:

* deadlocked on first heartbeat
* never cleared out active spans, so it'd constantly make everything
  'running'

Signed-off-by: Alex Suraci <[email protected]>

* send function logs + spans to API call

* retire dagger.io/ui.mask - white whale defeated
* simplify TS/Go/Python SDKs - don't need to do the mask/passthrough
  dance anymore

Signed-off-by: Alex Suraci <[email protected]>

* pubsub: abandon child spans on completion

way simpler implementation

Signed-off-by: Alex Suraci <[email protected]>

* python lint

Signed-off-by: Alex Suraci <[email protected]>

* untie ClientMetadataMetaKey knot

Signed-off-by: Alex Suraci <[email protected]>

* simplify otel proxy setup

(*http.Server).Shutdown handles most of what we needed before

Signed-off-by: Alex Suraci <[email protected]>

* remove dead code

Signed-off-by: Alex Suraci <[email protected]>

* tui: move digests to -vvv

Signed-off-by: Alex Suraci <[email protected]>

* hide exec /runtime when redirecting telemetry

Signed-off-by: Alex Suraci <[email protected]>

* propagate log level config to engine client

* reduce telemetry logs down to debug
* fix debug var being clobbered via double-flag-parsing
  * --debug etc get parsed twice: once before function flags are
    learned, and again after. so we just need to pass in the current
    value. unfortunately --verbosity gets clobbered but that doesn't
    matter yet.

Signed-off-by: Alex Suraci <[email protected]>

* frontend: don't print blank line on EOF

Signed-off-by: Alex Suraci <[email protected]>

* use http.Error, nix comment

Signed-off-by: Alex Suraci <[email protected]>

* add sanity check to avoid infinite loop

Signed-off-by: Alex Suraci <[email protected]>

* link to SIGPIPE docs

Signed-off-by: Alex Suraci <[email protected]>

* plain: handle EOF log record for primary logs

Signed-off-by: Alex Suraci <[email protected]>

* plain: don't render logs if --silent

Signed-off-by: Alex Suraci <[email protected]>

* session: trigger exit on stdout write fail

Signed-off-by: Alex Suraci <[email protected]>

* simplify pub/sub client tracking

* spans don't inherit their parent span's ID anymore. I don't think we
  actually need this, since we always step recursively through the span
  ancestry to collect the set of relevant client IDs. better not to
  funge data if we don't need to.
* clean up spanDone state, not needed with new child abandonment scheme
* clean up all state for a trace when all clients go away. this way we
  also clean up spans that don't have an associated client.

Signed-off-by: Alex Suraci <[email protected]>

* ci: add spans to go lint

Signed-off-by: Alex Suraci <[email protected]>

* fix: plain finalRender should disable the context hold

Signed-off-by: Justin Chadwell <[email protected]>

---------

Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Justin Chadwell <[email protected]>
Co-authored-by: Justin Chadwell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg:exporter:otlp Related to the OTLP exporter package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants