Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: enhance dependencies granularity for other backends/dependencies #646

Open
SylvainJuge opened this issue May 25, 2022 · 14 comments
Open

Comments

@SylvainJuge
Copy link
Member

SylvainJuge commented May 25, 2022

This is a follow-up task of #621 that was focused on relational databases.

For other types of dependencies, we need to define the values for following fields:

The specification will likely rely on the tests/agents/json-specs/span_types.json file that defines the types and sub-types for each kind of backend service which appears in the map or dependencies.

The goal of this specification would be to fill the gaps and remove the question marks in this table :

For each row, we will have service.target.type = subtype.

type subtype destination.service.resource service.target.name related PR/issue
db cassandra casandra, casandra/${cluster} ${cluster} #654
db cosmosdb ? ? #661
db dynamodb dynamodb, dynamodb/${cloud.region.name} ${cloud.region.name} #654
db elasticsearch elasticsearch, elasticsearch/${cluster} ${cluster} #654
db graphql ? ?
db mongodb mongodb, mongodb/${db.instance} ${db.instance} #654
db memcached memcached, ? ?
db redis redis, ? ?
external dubbo ${host}:${port} ${host}:${port}
external grpc ${host}:${port} ${host}:${port}
external http ${host}:${port} ${host}:${port}
external ldap ldap ?
messaging azurequeue azurequeue, azurequeue/${queue.name} #655
messaging azureservicebus azureservicebus, azureservicebus/${queue.name} (or topic). #655
messaging jms jms, jms/${queue.name} ${queue.name} #655
messaging kafka kafka, kafka/${queue.name} ${queue.name} #655
messaging rabbitmq rabbitmq, rabbitmq/${queue.name} ${queue.name} #655
messaging sns #655
messaging sqs #655
storage azureblob
storage azurefile
storage azuretable
storage s3 ?

Given there is quite a lot to define, we might split it into smaller issues / PRs.

@SylvainJuge SylvainJuge changed the title Spec: enhance dependencies granularity for non-relational databases Spec: enhance dependencies granularity for other backends/dependencies May 25, 2022
@adrian-skybaker
Copy link

Where would the right place to raise feedback about the use of ${host}:${port} for HTTP resources? This is quite an obfuscated way of describing them if you have many paths under a single host, eg an organisation wide API Gateway. Some way of allowing the path/uri-template to be introduced in a controlled way would be useful.

@SylvainJuge
Copy link
Member Author

Hi @adrian-skybaker , here we are trying to properly describe the "destination service" of an external call, from the perspective of the caller, for example when using an HTTP client in an application that is instrumented by APM agents.

So, from the perspective of the application, the destination (or service target to fit the new fields names), there is no way to know if a given HTTP call will be routed to a specific host or service, all we know is the next destination that is described by the ${host}:${port} string.

In the case of an API gateway, using an APM agent on the gateway itself could help to show the routing to downstream services. Also, the instrumentation of downstream services should enable to display them in the map properly, albeit not with the intermediate API gateway if the latter is not instrumented.

@adrian-skybaker
Copy link

adrian-skybaker commented Jun 5, 2022

In the case of an API gateway, using an APM agent on the gateway itself could help to show the routing to downstream services.

This isn't possible when using hosted or appliance-type gateway solutions such as AWS API Gateway or Azure API Manager. This problem is compounded if this gateway is used as a proxy for outbound 3rd party requests, in order to secure/manage external communication outside of a secure network: all your external resources now look the same to APM.

Also, the instrumentation of downstream services should enable to display them in the map properly,

This isn't feasible for genuinely external/3rd-party services. An example of a common pattern is https://auth0.com/docs/api/management/v2 - tens or potentially hundreds of different services accessed under a common host. Nor is it feasible for internal legacy services that cannot be modified to attach agents.

I realise that by default {host}:{port} is the only practical default, however the challenge we have is that there appears to be no option in the APM HTTP instrumentation to intervene to override this default. For example, in the Java agent all of the HTTP instrumentation I can see creates a new span and assigns a new resource name in a closed fashion that cannot be controlled (see https://github.com/elastic/apm-agent-java/blob/main/apm-agent-plugins/apm-httpclient-core/src/main/java/co/elastic/apm/agent/httpclient/HttpClientHelper.java).

This means even though the surrounding HTTP client frameworks (Java examples being Feign or JAX-RS client) "know" the resource name (accounting for URI templating to avoid explosions of URIs when data appears in paths), it's still impossible to make APM aware of this.

The result is that the usefulness of our "dependencies" view in APM is quite limited.

@eyalkoren
Copy link
Contributor

@SylvainJuge if the HTTP client provides interceptors (or similar concept) to allow observing requests, can the new related API (e.g. the Java not-yet-release setServiceTarget() API) be used for that? I am not sure whether we currently rely on the server side being aware of the service.target used by the client. If not, maybe this is a way to support that.

Otherwise, it worth considering adding some kind of wildcard matching configuration option like url_groups to allow such custom finer granularity for HTTP.

@SylvainJuge
Copy link
Member Author

As @eyalkoren just said, for now the best option is to use interceptors and call setServiceTarget that are specific to your HTTP client to override the default values.

In the future, if we get other similar feedback on default HTTP destination description (this is the first we got so far) to rely on URL, then adding an option similar to url_groups could be relevant. Also, there could be a high number of other possibilities: HTTP headers, URL parameters, parts of the host name, ..., so we need first to validate that an extra option here would allow to cover most of the use-cases.

@eyalkoren
Copy link
Contributor

Just to make clear how we think interceptors may be used: the way we capture HTTP spans is by instrumenting the client, start a span at the request start and end the span at the request end. During this time, we make sure that the span is made the "active" span on the request-making thread. So if your interceptor's entry is invoked after the HTTP had been started and activated, you can use the API to get it through ElasticApm.currentSpan() and then set the service target thus controlling how your service map will be assembled. Note that this is Java specific, not all agents do the same.

@adrian-skybaker
Copy link

adrian-skybaker commented Jun 7, 2022

So let's take my current stack as an example - targets are declared in https://github.com/OpenFeign/feign, which is delegating to Apache HTTP Client for the actual HTTP communication. And I believe Elastic APM's Java agent is automatically instrumenting Apache HTTP Client (though that's automatic + opaque to me).

By "interceptor", I believe you're proposing using something like https://hc.apache.org/httpcomponents-core-5.1.x/current/httpcore5/apidocs/org/apache/hc/core5/http/HttpRequestInterceptor.html ? I can try this, but I can already see a couple of issues.

  1. Apache HTTP client is a low level component that's unaware of the concept of a "target" as it is declared in Feign. By the time the request reaches this level, any URI templating has already been done. I can try hacks to try and communicate this value using a thread local or a custom request header to pass this information down to the http client, but I can't see it being straightforward. In Java this pattern is the norm - all of the higher level HTTP frameworks I'm aware of where high level "targets" are declared with URI templates (Jersey JAX-RS Client, Spring WebClient, Feign) all then delegate to lower level HTTP client libraries (apache http, the native Java client, etc).

  2. I have to make sure any interceptor runs after the low-level Elastic http instrumentation opens the new span. Delving in to the source, this doesn't look promising - APM instruments org.apache.http.impl.execchain.ClientExecChain, which seems to be called after any HttpRequestInterceptor. So it seems even adding an interceptor wouldn't work (and in any case even if it did, seems like it's quite fragile in relying on a very internal detail of where in the client APM decides to start the span).

It would be much easier if I could just call Elastic APM at the feign interceptor level (this would be preferable even over url_groups, as we can code a single generic interceptor to use everywhere), which is what I first tried. But at this higher level anything set on the span is discarded.

@eyalkoren
Copy link
Contributor

It would be much easier if I could just call Elastic APM at the feign interceptor level

@jackshirazi this sounds like a perfect candidate for an external plugin. Once we have the blog posts to share about that, please do.
@SylvainJuge are there OTel attributes that automatically get mapped to our relevant fields by APM Server- for service maps (destination target etc.), contextual data (status code etc.) and so forth?

@jackshirazi
Copy link
Contributor

https://www.elastic.co/blog/create-your-own-instrumentation-with-the-java-agent-plugin

@jackshirazi this sounds like a perfect candidate for an external plugin. Once we have the blog posts to share about that, please do.

@adrian-skybaker
Copy link

adrian-skybaker commented Jun 9, 2022

I'm not sure how custom instrumentation would solve this problem, at least not without a substantial amount of coding and recreation of built-in instrumentation.

Without having tried it, I don't believe it would be as simple as writing instrumentation just for the higher level Feign layer, because creating an exit span at this higher level would be too early: there are still potentially substantial layers of libraries and code still to be called at this point before the request is sent. As the built-in HTTP client instrumentation does, the exit span needs to be started late, when the actual request is made, down in the low level HTTP code.

Instead it seems like we would need to implement two layers of instrumentation - one to inject the detailed HTTP service name from the higher level framework (in this case Feign). Then a forked version of whichever HTTP client library instrumentation (at least apm-apache-httpclient-plugin, though the client used can be easily changed by config, so potentially 2-3 other equivalents as well). This forked version would be able to "inherit" a service target name at the time parent.createExitSpan(); is called.

The overall challenge I see is that, as far as I can tell, when the exit spans are created by the built-in HTTP instrumentations, there is no way to inherit or inject the service target property from the parent span.

@SylvainJuge
Copy link
Member Author

For me we can split this into two sub-issues/challenges:

  1. define an heuristic to extract the appropriate service.target.name value from an exit span, in the case of an HTTP request it could be part of the URL, some header values, ...
  2. allowing modification of the span to set the service.target.name before it is sent to APM Server.

First, for (1) we have to deal with a very high diversity of protocols, if we limit ourselves to HTTP it could be anything from headers, host name, network port, a part of the URL path or query string, as a consequence it would be quite challenging to cover everything through an API or configuration options.

For (2), some agents provide a way to "post-process" the data before it is sent, which maybe in this case would be a relevant option.

I'm not sure if it's a good idea, but if we had the capability to post-process the spans/transactions in the Java agent (which we don't yet), and the heuristic to set the service.target.name can rely only on the attributes that we already capture, then you could implement your desired behavior through post-processing logic as the original URL would be available in the span data.

In your case, does the backend services behind the gateway are also instrumented by the APM agent ? If yes, then the gateway should not be shown as part of the dependencies nor be visible at all. If those backend services aren't instrumented, then the gateway behaves like an abstraction of those and thus becomes a dependency.

@adrian-skybaker
Copy link

adrian-skybaker commented Jun 16, 2022

My view is that (2) is a much better category of solution, because it means you can write a one-off, generic piece of code for a higher level framework that will work in all cases (eg in Java's Feign it would involve a single feign.RequestInterceptor in library code we would then use as standard in many services). Potentially this type of instrumentation could eventually form part of the agent codebase itself.

An alternative but similar solution to post-processing might be to allow inheritance of service.target.name when parentSpan.startExitSpan(...) is called, in some standardised way. In practice even with post-processing any library code we'd write would try and get us to this state anyway: we'd write a very generic post-processor that looked for something populated by a parent span by more specific processing.

In constrast, (1) seems like more like the url_groups concept: it would likely require manually duplicating URL resource names/patterns in separate Elastic config, for every dependency, in every application.

In your case, does the backend services behind the gateway are also instrumented by the APM agent ?

Generally not, the focus for this issue for us has been dependencies which are effectively "external".

@SylvainJuge
Copy link
Member Author

@adrian-skybaker I think for your use-case you could easily use ingest pipelines, that would allow you to use parts of the collected data (which includes the complete URLs of the HTTP client calls) and set the values you want in the service.target.name and destination.service.resource (the UI relies on the latter as of 8.3, but will rely on both in 8.4 and later).

One limitation of this approach though, is that you can only rely on the data that is already captured by the agents, but in your case that does not seem to be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants