Adds CorrelationFields in support of in and between process propagation #577

codefromthecrypt · 2018-01-16T06:27:50Z

Overview

This change aims to support correlation fields which may be request-scoped (and propagated downstream), or locally scoped (propagated in process, but not to the next). Example use cases include request-scoped fields like current game or country code, or local fields, such as the current operation name. Where cross-process propagation is desired, the ideal encoding is the w3c Correlation Context.

Background on existing work in Brave

The first propagation mechanics of brave included propagation of the TraceContext in and to the next process. This object's duty is to place spans in the correct position in a potentially distributed operation (via parent IDs). Secondary duties include consistent sampling decisions, and correlation with other contexts such as logging. The data is mostly the same as the trace-context header being defined in w3c.

Brave has two roles in propagation: making data visible in-process, and injecting and extracting data out of process.

The CurrentTraceContext plugin synchronizes a TraceContext struct with implicitly visible state, such as a thread local or a logging context. Users can then call Tracer.getCurrentSpan for example, or read the logging context variable traceId. This in-process propagation is critical for correlation between systems, as often libraries have no other means to coordinate.

The Propagation plugin connects traces across message boundaries by injecting or extracting the trace context from headers. For example, an incoming trace from another process is extracted from headers, the tracer creates a child, and later instrumentation use the above CurrentTraceContext to synchronize the current trace ID with whatever contexts that need to see it.

Notably due to demand created by OpenTracing and Sleuth, we have had requests to push arbitrary data along with the trace-context (implicitly via the above apis described). We've also had a number of non-arbitrary requests to propagate extra data. For example, domain-specific data like the current playstation game, or extra properties needed for Amazon X-Ray.

Towards that end, we added an extension to our trace context object called "extra". Extra is a list of opaque data plugins can use to ensure their propagation needs are met.

We also added a default implementation that handles the common use-case of sending a mix of system and user fields. Here's an example:

Setup your tracing instance with allowed fields:

tracingBuilder.propagationFactory(
  ExtraFieldPropagation.newFactoryBuilder(B3Propagation.FACTORY)
                       .addField("x-vcap-request-id")
                       .addPrefixedFields("baggage-", Arrays.asList("country-code", "user-id"))
                       .build()
);

Later, you can call below to affect the country code of the current trace context

ExtraFieldPropagation.set("country-code", "FO");
String countryCode = ExtraFieldPropagation.get("country-code");

Why a change is needed

The current code is ok, but limited in implementation which makes it difficult to integrate with metrics contexts and scopes smaller than trace.

Using ExtraFieldPropagation, correlation with other fields involve the following trade-offs:

In ExtraFieldPropagation (and OpenTracing) propagating something in-process also means adding overhead out of process
ExtraFieldPropagation is harder to manage systematically as it uses a separate header per field
ExtraFieldPropagation coupling with propagation fields implies a whitelist approach
Users would want to wrap ExtraFieldPropagation as it isn't a great user-level api

For example, if you propagate the span name, to export changes to it for log correlation, you'll need to customize the log correlation plugins and also end up with another header on the wire that isn't useful on the other side.

On the other hand, with a CorrelationFields type, we can decouple the usage of propagated tags, their policy and their encoding scheme (defaulting to the correlation-context header). Like ExtraFieldPropagation, this could be added to B3, allowing a clean transition. In fact, ExtraFieldPropagation could be an implementation. At the end of the day, we end up with flexibly scoped propagation code with coherent usage patterns for context synchronization.

Scoping constraints

An understood limitation is that since this api exists in the scope of Brave (a tracing api), Correlation fields cannot be added before a trace context has been created. For example, an integration with metrics context could imply ordering metrics after tracing, eager provision of an unstarted trace context, or an integration to copy currently scoped metrics tags into the a newly provisioned CorrelationFields object.

yeah wording was a bit crap. (tags are the census name for correlation fields).

The jist is that if the correlation api is separate, it can scope fields before or even if tracing doesn't exist. One use case is to use the same correlation field in metrics and tracing (and further down the line, such as country-code)

To have metrics re-use correlation fields from tracing, for example, it would likely need to start after tracing started (or at the same time). Similar problem exists in logging. If you want logs to contain a correlation field, that field should be scoped as early as possible. Otherwise, you see request handling without and MDC value until sometime later when tracing kicks in.

The idea is that a tagging api could be made at the head of a request, and be simpler or at least less dangerously, ensuring that whether it is tracing metrics or logging (or something else), they all read exactly the same values, with no ordering interdependency apart from tagger.

What I'm calling out here is that if we assign that responsibility to brave (for convenience because parsing, inject extract propagation etc is already handled), there is at least an interdependency problem, which is most easily sorted if trace starts before other subsystems on a request, or at least as early as possible.

Now.. if I did clear up anything (mildly possible), could you help me revise it into a shorter bullet or two? :)

aldex32 · 2018-01-16T16:46:32Z

Hi guys,

is it possible to have the option of setting a custom header by using ExtraFieldPropagation?

codefromthecrypt · 2018-01-17T00:05:04Z

@aldex32 yeah I'd expect ExtraFieldPropagation to use this feature for mutability

codefromthecrypt · 2018-01-17T04:38:19Z

@aldex32 I'm going to address the mutability of ExtraFieldPropagation independently so that there's less time pressure on this PR. I have some discomfort on local changes, particularly around extracting a context which are not a concern with ExtraFieldPropagation

codefromthecrypt · 2018-01-17T05:50:26Z

@aldex32 Actually, there's a limitation in the current Propagation design where it can only affect traces that originated remotely. That impacts this, too.. will raise something independently

codefromthecrypt · 2018-01-17T06:00:29Z

#579 needs to be explored prior to finishing this design. Will work on that

codefromthecrypt · 2018-01-18T10:49:27Z

Had some chats with folks like @ivantopo and I feel I want to think this through a bit more. I generally don't like how this feels like it needs to hook together (want to see if there's another way to avoid special hooks in the tracer or not).

Meanwhile I've rejigged things in support of prefixed keys (and mutation) of the existing extra field propagation code. I'm fairly happy with this, as the logic is completely optional to those not pushing extra fields. Going to cut that below before picking this up again. #581

codefromthecrypt · 2018-01-18T15:42:41Z

I've pushed some interim improvements to extra field propagation in brave 4.13.3 to buy time here. Will revisit shortly

codefromthecrypt · 2018-01-22T02:21:35Z

did a huge revamp on the description. updates around the code will follow next

shakuzen · 2018-01-22T04:41:50Z

Just sharing our use case that I'm hoping this can solve: we have an additional ID that we would like to propagate throughout the trace and add to all services' logs for log correlation based on that ID.

codefromthecrypt · 2018-01-22T04:44:34Z

Just sharing our use case that I'm hoping this can solve: we have an additional ID that we would like to propagate throughout the trace and add to all services' logs for log correlation based on that ID.

Is that ID constant for all downstream requests? Also, does that need to be sent in a specific header, or does it matter as long as in-process correlation occurs?

shakuzen · 2018-01-22T06:04:03Z

Is that ID constant for all downstream requests?

The service that starts the trace (or possibly an uninstrumented service before the trace starts) should discern and set the ID. It shouldn't be changed.

Also, does that need to be sent in a specific header, or does it matter as long as in-process
correlation occurs?

Things aren't quite set in stone yet, but correlation is the primary goal. As alluded to above, the service initially setting this ID may be uninstrumented and therefore a well-known header is probably the best way to receive this in an instrumented app, was my line of thinking. If everything were instrumented, I guess it wouldn't matter how it is propagated.

codefromthecrypt · 2018-01-23T00:07:23Z

@SergeyKanzhelev ps if you like please review the PR description eventhough the code has drifted a bit This is in response to notes in the census issue that came out of the same discussion census-instrumentation/opencensus-specs#45

codefromthecrypt · 2018-01-24T00:03:31Z

feedback from @mpetazzoni included that if this feature didn't require a header-per-field encoding (whitelist), it would be easier to use

In the context of a broader code base where I define the tracer in one place (and then available via DI), it's not really convenient to define that whitelist of fields. Really depends on the service, use-case, etc.

codefromthecrypt · 2018-02-01T04:03:58Z

I've rejigged the code so that it is compilable now, but it isn't right. I'll try to peck at it over a flight

codefromthecrypt · 2018-02-01T16:02:46Z

here are some design thoughts:

An implementation of correlation context could be "accumulate and publish later". For example, data is placed into the correlation fields object, but only made visible when the span is re-scoped.

There are generally 3 scopes of data which one might correlate with:

process scoped: ex current amazon instance ID
request scoped: ex origin user id, country code
span scoped: ex span name

To address the scoping of data, we could consider options such as a flag per field, indicating whether it is to be published downstream or not, or if it is to expire with the span. Note that in the Correlation-Context spec, there's been discussion of a ttl flag (interpreted remotely), which for example the value 1 means drop at the next hop.

codefromthecrypt · 2018-02-05T09:16:57Z

deprioritizing this as during the tracing workshop, there are especially security concerns about adding user data to outgoing requests by default. Basically, what some call "baggage" is not only dangerous to the size of production requests, but also a potentially large security risk (imagine accidentally propagating data to a third party)

We should be careful when implementing it to make 100% optional and something that potentially re-validates per hop whether to send the correlation keys out or not.

In the mean time, users can use the whitelist approach with ExtraFieldPropagation. We may still want to continue with in-process propagation efforts, but at any rate be very careful with remote propagation (cc @garyd203 @SirTyro)

https://docs.google.com/document/d/1YJ0oriH62Ixk3W37z31X0SeuXpgXpXBetiz8EWxG6NU/edit#

codefromthecrypt · 2018-04-05T09:21:03Z

we'll want to make sure this addresses by api "re-scoping" for example, when someone adds a new correlation field and want it to be immediately visible. Currently, someone needs to call Tracer.withSpan or CurrentTraceContext.newScope to re-scope data (so that it is visible in log or other contexts).

This recently bumped again spring-cloud/spring-cloud-sleuth#927 raised by @dhoffi

codefromthecrypt · 2018-11-10T00:35:21Z

still waiting as prefer to see what census decides before continuing this: census-instrumentation/opencensus-specs#200

codefromthecrypt · 2018-11-10T00:39:31Z

Most notably, there is an api for filtering proposed. This is something that's currently possible in brave anyway as the "getter' concept can internally implement filtering for any key (trace id or a tag) OTOH has discussed in this PR, the fields are likely to be concatenated into a single header.. propagation decision logic makes sense.

Also, while this we are interested in what census does, Kamon has already rolled out an impressive feature (hats off @ivantopo @dpsoft) kamon-io/Kamon#552

What we do here will consider all angles in other words (cc also long time friend @nicmunroe)

This lets you use the extra field propagation mechanism to propagate in, but not out of process. This can be used to correlate properties with logs in a way that doesn't add network overhead. This design was chosen as it incurs very little in-process overhead and has the least burden on the code base. It can be used as a stop-gap until a more fully featured correlation context feature is developed. Fixes #929 See also #577

codefromthecrypt · 2020-04-02T08:57:16Z

fixed by #1130

codefromthecrypt mentioned this pull request Jan 16, 2018

More interoperable propagation via extra stuff in TraceContext #390

Closed

codefromthecrypt commented Jan 16, 2018

View reviewed changes

felixbarny reviewed Jan 16, 2018

View reviewed changes

brave/src/main/java/brave/propagation/CorrelationFields.java Outdated Show resolved Hide resolved

codefromthecrypt force-pushed the correlation-fields branch from 37096a5 to 189e2fc Compare January 16, 2018 09:16

ivantopo reviewed Jan 16, 2018

View reviewed changes

codefromthecrypt mentioned this pull request Jan 21, 2018

Create custom Spring scopes for Span (and Trace if possible) spring-cloud/spring-cloud-sleuth#144

Closed

codefromthecrypt changed the title ~~WIP Adds CorrelationFields in support of OpenTracing and Sleuth migration~~ Adds CorrelationFields in support of in and between process propagation Jan 22, 2018

codefromthecrypt force-pushed the correlation-fields branch from 68bc17a to 501f355 Compare February 1, 2018 04:02

codefromthecrypt mentioned this pull request Mar 25, 2018

Add a Scope property on TagKey census-instrumentation/opencensus-specs#69

Closed

codefromthecrypt mentioned this pull request Mar 30, 2018

make baggage more convenient. spring-cloud/spring-cloud-sleuth#921

Closed

This was referenced Apr 5, 2018

spring-cloud-stream channel instrumentation with ExtraFieldPropagation spring-cloud/spring-cloud-sleuth#927

Closed

Consider immediately flushing certain "extra fields" to a log layer #682

Closed

codefromthecrypt mentioned this pull request May 25, 2018

Add baggage openzipkin/b3-propagation#22

Closed

codefromthecrypt mentioned this pull request Jun 12, 2018

[enhancement] Extract baggage entries from HTTP headers and manage baggage entries as SLF4J MDC entries spring-cloud/spring-cloud-sleuth#1003

Closed

WIP Adds CorrelationFields in support of correlation-context

1fe4a25

codefromthecrypt force-pushed the correlation-fields branch from 501f355 to 1fe4a25 Compare August 14, 2018 02:42

codefromthecrypt mentioned this pull request Aug 29, 2018

TracingInterceptor should not override explicitly set headers #776

Closed

This was referenced Nov 3, 2018

Custom Fields Within a Service & Logging to Splunk spring-cloud/spring-cloud-sleuth#1097

Closed

Initial Draft for TagMap and its propagation. census-instrumentation/opencensus-specs#200

Merged

shakuzen mentioned this pull request May 21, 2019

support pattern for propagation_keys spring-cloud/spring-cloud-sleuth#1358

Closed

codefromthecrypt mentioned this pull request Jun 26, 2019

Adds ability to redact fields from downstream propagation #931

Merged

codefromthecrypt mentioned this pull request Aug 13, 2019

Resurrects duplicate handlers and explains all the bugs we can make #961

Merged

codefromthecrypt closed this Apr 2, 2020

codefromthecrypt deleted the correlation-fields branch April 2, 2020 08:57

+                * Census tags are types, encouraging prevalidation and use as constants.
+              * Sharing mutable data across threads can lead to inconsistent reads of fields.
+                * For example, child threads affect the same state their parent reads.
+                * Immutable tag contexts eliminate this

+                * For example, child threads affect the same state their parent reads.
+                * Immutable tag contexts eliminate this
+              * A tracer api is more complex than a tagger api, and has an ordering concern.
+                * To contribute tags, you need to have tracing start before metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds CorrelationFields in support of in and between process propagation #577

Adds CorrelationFields in support of in and between process propagation #577

codefromthecrypt commented Jan 16, 2018 •

edited

Loading

codefromthecrypt commented Jan 16, 2018

codefromthecrypt Jan 16, 2018

codefromthecrypt Jan 16, 2018

felixbarny Jan 16, 2018

codefromthecrypt commented Jan 16, 2018

ivantopo Jan 16, 2018

codefromthecrypt Jan 16, 2018

ivantopo Jan 16, 2018

codefromthecrypt Jan 17, 2018

aldex32 commented Jan 16, 2018 •

edited

Loading

codefromthecrypt commented Jan 17, 2018

codefromthecrypt commented Jan 17, 2018 •

edited

Loading

codefromthecrypt commented Jan 17, 2018

codefromthecrypt commented Jan 17, 2018

codefromthecrypt commented Jan 18, 2018

codefromthecrypt commented Jan 18, 2018 •

edited

Loading

codefromthecrypt commented Jan 22, 2018

shakuzen commented Jan 22, 2018

codefromthecrypt commented Jan 22, 2018 via email

shakuzen commented Jan 22, 2018

codefromthecrypt commented Jan 23, 2018

codefromthecrypt commented Jan 24, 2018

codefromthecrypt commented Feb 1, 2018

codefromthecrypt commented Feb 1, 2018

codefromthecrypt commented Feb 5, 2018

codefromthecrypt commented Apr 5, 2018

codefromthecrypt commented Nov 10, 2018

codefromthecrypt commented Nov 10, 2018

codefromthecrypt commented Apr 2, 2020

+                  }
+                };
+                static final class Real extends CorrelationFields {

+                /**
+                 * Retrieves a correlation key by name or returns null if not found.
+                 *
+                 * @throws IllegalArgumentException if the input name is not valid

+                  }
+                  private static boolean log(String message) {
+                    return false;

Adds CorrelationFields in support of in and between process propagation #577

Adds CorrelationFields in support of in and between process propagation #577

Conversation

codefromthecrypt commented Jan 16, 2018 • edited Loading

Overview

Background on existing work in Brave

Why a change is needed

Scoping constraints

See also

codefromthecrypt commented Jan 16, 2018

codefromthecrypt Jan 16, 2018

Choose a reason for hiding this comment

codefromthecrypt Jan 16, 2018

Choose a reason for hiding this comment

felixbarny Jan 16, 2018

Choose a reason for hiding this comment

codefromthecrypt commented Jan 16, 2018

ivantopo Jan 16, 2018

Choose a reason for hiding this comment

codefromthecrypt Jan 16, 2018

Choose a reason for hiding this comment

ivantopo Jan 16, 2018

Choose a reason for hiding this comment

codefromthecrypt Jan 17, 2018

Choose a reason for hiding this comment

aldex32 commented Jan 16, 2018 • edited Loading

codefromthecrypt commented Jan 17, 2018

codefromthecrypt commented Jan 17, 2018 • edited Loading

codefromthecrypt commented Jan 17, 2018

codefromthecrypt commented Jan 17, 2018

codefromthecrypt commented Jan 18, 2018

codefromthecrypt commented Jan 18, 2018 • edited Loading

codefromthecrypt commented Jan 22, 2018

shakuzen commented Jan 22, 2018

codefromthecrypt commented Jan 22, 2018 via email

shakuzen commented Jan 22, 2018

codefromthecrypt commented Jan 23, 2018

codefromthecrypt commented Jan 24, 2018

codefromthecrypt commented Feb 1, 2018

codefromthecrypt commented Feb 1, 2018

codefromthecrypt commented Feb 5, 2018

codefromthecrypt commented Apr 5, 2018

codefromthecrypt commented Nov 10, 2018

codefromthecrypt commented Nov 10, 2018

codefromthecrypt commented Apr 2, 2020

codefromthecrypt commented Jan 16, 2018 •

edited

Loading

aldex32 commented Jan 16, 2018 •

edited

Loading

codefromthecrypt commented Jan 17, 2018 •

edited

Loading

codefromthecrypt commented Jan 18, 2018 •

edited

Loading