Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype for JFR events using Context interceptor #963

Merged
merged 14 commits into from
Nov 24, 2020
Merged

Prototype for JFR events using Context interceptor #963

merged 14 commits into from
Nov 24, 2020

Conversation

sfriberg
Copy link
Contributor

@sfriberg sfriberg commented Mar 4, 2020

No description provided.

@jkwatson
Copy link
Contributor

jkwatson commented Mar 5, 2020

looks like we need to make sure this doesn't try to be built by CI when building with java 8.

@sfriberg
Copy link
Contributor Author

sfriberg commented Mar 6, 2020

@bogdandrutu

As you can see in SpanProcessor I need to keep a state, currently just a basic ConcurrentHashMap. The risk is that this will cause extra overhead and risk when a lot of spans a flowing through as it needs to be thread safe.

I think there are two possible ideas for how to handle this without requiring for the processor to keep an internal state.

  1. Ability to add object(s) to the Span itself that can then be extracted
    This could be a map where the hash key would be the processor itself, or more generic if we find that this would be useful outside of the SpanProcessor

  2. Change the SpanProcessor API so that the start method to return a optional callback method
    The span end method would iterate through a list of these callback methods.

  public Consumer<ReadableSpan> onStart(ReadableSpan rs) {
    if (rs.getSpanContext().isValid()) {
      SpanEvent event = new SpanEvent(rs.toSpanData());
      event.begin();
      // Method to be executed when span is ended
      return (readableSpan) -> event.commit();
      // In JDK 7 style (would of course be a different interface)
      return new Consumer<ReadableSpan>() {
          @Override
          public void accept(ReadableSpan readableSpan) {
              event.commit();
          }
      };
    }
    // No call back
    return null;
  }

@Description(
"Open Telemetry trace event corresponding to the span currently "
+ "in scope/active on this thread.")
class ScopeEvent extends Event {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be public? With the JFR streaming APIs is this going to be handed directly to the stream listeners?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't done much with the JFR stream, but doubt you will get the actual event class and rather get a generic event type that you can look up name and fields on?

@jkwatson
Copy link
Contributor

jkwatson commented Apr 1, 2020

This is very cool, @sfriberg . Is your intent to move this out of "prototype" status and get it spruced up for official inclusion as a contrib module?

@sfriberg
Copy link
Contributor Author

sfriberg commented Apr 3, 2020

Thanks for the review @jkwatson. I think the main blockers is how we want to do the GRPC context integration.

The other one would be if we should change the Processor API so it would be possible to avoid keeping a state here to capture when a Span is closed. (See my comment above about ideas).

The second part I don't see as a blocker and we can improve as we go, but I think we need to understand how we will handle GRPC before this is merged.

@sfriberg sfriberg requested a review from thisthat as a code owner May 15, 2020 23:37
settings.gradle Outdated Show resolved Hide resolved
settings.gradle Outdated
@@ -42,6 +42,10 @@ include ":opentelemetry-all",
":opentelemetry-sdk-contrib-jaeger-remote-sampler",
":opentelemetry-bom"

if(JavaVersion.current().isJava11Compatible()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered testing against any Java 8 distributions that include JFR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not, once that becomes readily available we can look at building for JDK 8 as well. First step I think is to focus on JDK 11 and make that work well.

@sfriberg sfriberg requested a review from anuraaga as a code owner October 22, 2020 18:11
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 22, 2020

CLA Check
The committers are authorized under a signed CLA.

@codecov
Copy link

codecov bot commented Oct 22, 2020

Codecov Report

Merging #963 (7eb048d) into master (e599d0e) will increase coverage by 85.35%.
The diff coverage is n/a.

Impacted file tree graph

@@              Coverage Diff              @@
##             master     #963       +/-   ##
=============================================
+ Coverage          0   85.35%   +85.35%     
- Complexity        0     2117     +2117     
=============================================
  Files             0      241      +241     
  Lines             0     8086     +8086     
  Branches          0      893      +893     
=============================================
+ Hits              0     6902     +6902     
- Misses            0      853      +853     
- Partials          0      331      +331     
Impacted Files Coverage Δ Complexity Δ
...va/io/opentelemetry/api/baggage/EntryMetadata.java 100.00% <0.00%> (ø) 4.00% <0.00%> (?%)
.../api/baggage/propagation/W3CBaggagePropagator.java 93.02% <0.00%> (ø) 14.00% <0.00%> (?%)
...a/io/opentelemetry/sdk/logging/data/LogRecord.java 76.19% <0.00%> (ø) 2.00% <0.00%> (?%)
...in/java/io/opentelemetry/api/trace/TraceFlags.java 100.00% <0.00%> (ø) 6.00% <0.00%> (?%)
...ava/io/opentelemetry/api/DefaultOpenTelemetry.java 88.88% <0.00%> (ø) 17.00% <0.00%> (?%)
...extension/trace/propagation/AwsXRayPropagator.java 87.05% <0.00%> (ø) 26.00% <0.00%> (?%)
...o/opentelemetry/sdk/metrics/view/Aggregations.java 71.87% <0.00%> (ø) 4.00% <0.00%> (?%)
...ntelemetry/exporter/zipkin/ZipkinSpanExporter.java 90.22% <0.00%> (ø) 31.00% <0.00%> (?%)
...main/java/io/opentelemetry/api/internal/Utils.java 72.22% <0.00%> (ø) 11.00% <0.00%> (?%)
.../src/main/java/io/opentelemetry/context/Scope.java 0.00% <0.00%> (ø) 0.00% <0.00%> (?%)
... and 231 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e599d0e...7eb048d. Read the comment docs.

@sfriberg
Copy link
Contributor Author

sfriberg commented Oct 22, 2020

For some reason this fails and even crashes the JVM when running the test. It seems like this is due to the build setup of the project as things are working if lift this out as a single separate Gradle project instead.

@anuraaga I believe you did some of the changes with the build process lately, anything you have a clue about what might be causing it?

@sfriberg
Copy link
Contributor Author

Span capturing is still less efficient than desirable, but works. The following issue to track it, #1105.

Jacoco has been disabled due to a bug in the JDK, which breaks the code coverage I believe.

Despite that I think this should be possible to merge now.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sfriberg - this seems like a cool implementation :)

import io.opentelemetry.context.ContextStorage;
import io.opentelemetry.context.Scope;

public class JfrContextStorageWrapper implements ContextStorage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this class need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since it needs to be possible to register for any user.

ContextStorage.addWrapper(JfrContextStorageWrapper::new);

ScopeEvent event = new ScopeEvent(Span.fromContext(toAttach).getSpanContext());
event.begin();
return () -> {
event.commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the event is never committed? Is there a way to detect it in JFR? Then it looks like it could be a nice scope debugging mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way to detect, it basically is just GC:ed never to be seen or heard from again :).

*/
public class JfrSpanProcessor implements SpanProcessor {

private final Map<SpanContext, SpanEvent> spanEvents = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use weak keys for this? It's bad for tracing if a bug means spans are never ended and leaked, but this would cause an actual memory leak I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I've been thinking of shading in https://github.com/raphw/weak-lock-free/blob/master/src/main/java/com/blogspot/mydailyjava/weaklockfree/WeakConcurrentMap.java for our use for a different Context-related use case, if that'll help I can prioritize it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. By using Guava there should be no need to shade another class


@Override
public CompletableResultCode shutdown() {
spanEvents.forEach((id, event) -> event.commit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - not really for this PR but this doesn't seem like how shutdown is supposed to work. For example, we would probably stop accepting spans, and wait for spans to end naturally in an exporter when it is shutting down I think. But can't think of any improvement here so just writing for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be good to improve the JavaDoc, not sure what 'not yet processed means'.

  • Processes all span events that have not yet been processed and closes used resources.

Similar for forceFlush

  • Processes all span events that have not yet been processed.

Sound more like this would be for finished spans rather than span that hasn't been closed yet.

Force flush would commit any ready events, as would shutdown but that would also clear up all resources, which I would take as drop all non-completed spans. Once something is shutdown I wouldn't expect things to continue to flow, unless the CompleteResultCode would wait for all spans, but that would potentially be indefinite.

will remove the event.commit part as that feels wrong here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we may need to improve docs, or otherwise expectations (maybe spec) for this (again not related to this PR). I would generally call this at the beginning of a graceful shutdown and expect pending requests, and exported spans, to complete before termination. /cc @jkwatson @Oberon00

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "pending requests" mean here? Un-ended spans? Or just spans that were in the process of being sent off to exporters?

@sfriberg
Copy link
Contributor Author

@anuraaga Thanks for the review. I believe I have fixed all your comments.

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just small comment idea

Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@jkwatson jkwatson merged commit 82cac7a into open-telemetry:master Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants