Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype for JFR events using Context interceptor #963

Merged
merged 14 commits into from
Nov 24, 2020
2 changes: 1 addition & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -370,7 +370,7 @@ configure(opentelemetryProjects) {
configProperties["rootDir"] = rootDir
}

jacoco { toolVersion = "0.8.5" }
jacoco { toolVersion = "0.8.6" }

spotless {
java {
Expand Down
19 changes: 19 additions & 0 deletions sdk-extensions/jfr-events/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
OpenTelemetry SDK Extension Java Flight Recorder (JFR) Events
======================================================

Create JFR events that can be recorded and viewed in Java Mission Control (JMC).
* Creates Open Telemetry Tracing/Span events for spans
* The thread and stracktrace will be of the thead ending the span which might be different than the thread creating the span.
* Has the fields
* Operation Name
* Trace ID
* Parent Span ID
* Span ID
* Creates Open Telemetry Tracing/Scope events for scopes
* Thread will match the thread the scope was active in and the stacktrace will be when scope was closed
* Multiple scopes might be collected for a single span
* Has the fields
* Trace ID
* Span ID
* Supports the Open Source version of JFR in Java 11.
* Might support back port to OpenJDK 8, but not tested and classes are built with JDK 11 bytecode.
24 changes: 24 additions & 0 deletions sdk-extensions/jfr-events/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
plugins {
id 'java'
}

description = 'OpenTelemetry SDK Extension JFR'
ext.moduleName = 'io.opentelemetry.sdk.extension.jfr'

dependencies {
implementation project(':opentelemetry-api'),
project(':opentelemetry-sdk')
}

tasks.withType(JavaCompile) {
it.options.release = 11
}

testJava8 {
enabled = false
}

test {
// Disabled due to https://bugs.openjdk.java.net/browse/JDK-8245283
jacoco.enabled = false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.sdk.extension.jfr;

import io.opentelemetry.api.trace.Span;
import io.opentelemetry.context.Context;
import io.opentelemetry.context.ContextStorage;
import io.opentelemetry.context.Scope;

public class JfrContextStorageWrapper implements ContextStorage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this class need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since it needs to be possible to register for any user.

ContextStorage.addWrapper(JfrContextStorageWrapper::new);


private final ContextStorage wrapped;

public JfrContextStorageWrapper(ContextStorage wrapped) {
this.wrapped = wrapped;
}

@Override
public Scope attach(Context toAttach) {
Scope scope = wrapped.attach(toAttach);
ScopeEvent event = new ScopeEvent(Span.fromContext(toAttach).getSpanContext());
event.begin();
return () -> {
event.commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the event is never committed? Is there a way to detect it in JFR? Then it looks like it could be a nice scope debugging mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No way to detect, it basically is just GC:ed never to be seen or heard from again :).

scope.close();
};
}

@Override
public Context current() {
return wrapped.current();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.sdk.extension.jfr;

import static java.util.Objects.nonNull;

import io.opentelemetry.api.trace.SpanContext;
import io.opentelemetry.context.Context;
import io.opentelemetry.sdk.common.CompletableResultCode;
import io.opentelemetry.sdk.trace.ReadWriteSpan;
import io.opentelemetry.sdk.trace.ReadableSpan;
import io.opentelemetry.sdk.trace.SpanProcessor;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
* Span processor to create new JFR events for the Span as they are started, and commit on end.
*
* <p>NOTE: JfrSpanProcessor must be running synchronously to ensure that duration is correctly
sfriberg marked this conversation as resolved.
Show resolved Hide resolved
* captured.
*/
public class JfrSpanProcessor implements SpanProcessor {

private final Map<SpanContext, SpanEvent> spanEvents = new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use weak keys for this? It's bad for tracing if a bug means spans are never ended and leaked, but this would cause an actual memory leak I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I've been thinking of shading in https://github.com/raphw/weak-lock-free/blob/master/src/main/java/com/blogspot/mydailyjava/weaklockfree/WeakConcurrentMap.java for our use for a different Context-related use case, if that'll help I can prioritize it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. By using Guava there should be no need to shade another class


@Override
public void onStart(Context parentContext, ReadWriteSpan span) {
if (span.getSpanContext().isValid()) {
SpanEvent event = new SpanEvent(span.toSpanData());
event.begin();
spanEvents.put(span.getSpanContext(), event);
}
}

@Override
public boolean isStartRequired() {
return true;
}

@Override
public void onEnd(ReadableSpan rs) {
SpanEvent event = spanEvents.remove(rs.getSpanContext());
if (nonNull(event) && event.shouldCommit()) {
sfriberg marked this conversation as resolved.
Show resolved Hide resolved
event.commit();
}
}

@Override
public boolean isEndRequired() {
return true;
}

@Override
public CompletableResultCode shutdown() {
spanEvents.forEach((id, event) -> event.commit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - not really for this PR but this doesn't seem like how shutdown is supposed to work. For example, we would probably stop accepting spans, and wait for spans to end naturally in an exporter when it is shutting down I think. But can't think of any improvement here so just writing for reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be good to improve the JavaDoc, not sure what 'not yet processed means'.

  • Processes all span events that have not yet been processed and closes used resources.

Similar for forceFlush

  • Processes all span events that have not yet been processed.

Sound more like this would be for finished spans rather than span that hasn't been closed yet.

Force flush would commit any ready events, as would shutdown but that would also clear up all resources, which I would take as drop all non-completed spans. Once something is shutdown I wouldn't expect things to continue to flow, unless the CompleteResultCode would wait for all spans, but that would potentially be indefinite.

will remove the event.commit part as that feels wrong here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we may need to improve docs, or otherwise expectations (maybe spec) for this (again not related to this PR). I would generally call this at the beginning of a graceful shutdown and expect pending requests, and exported spans, to complete before termination. /cc @jkwatson @Oberon00

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "pending requests" mean here? Un-ended spans? Or just spans that were in the process of being sent off to exporters?

spanEvents.clear();
return CompletableResultCode.ofSuccess();
}

@Override
public CompletableResultCode forceFlush() {
return CompletableResultCode.ofSuccess();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.sdk.extension.jfr;

import io.opentelemetry.api.trace.SpanContext;
import jdk.jfr.Category;
import jdk.jfr.Description;
import jdk.jfr.Event;
import jdk.jfr.Label;
import jdk.jfr.Name;

@Name("io.opentelemetry.context.Scope")
@Label("Scope")
@Category("Open Telemetry Tracing")
@Description(
"Open Telemetry trace event corresponding to the span currently "
+ "in scope/active on this thread.")
class ScopeEvent extends Event {

@Label("Trace Id")
sfriberg marked this conversation as resolved.
Show resolved Hide resolved
private final String traceId;

@Label("Span Id")
private final String spanId;

ScopeEvent(SpanContext spanContext) {
this.traceId = spanContext.getTraceIdAsHexString();
this.spanId = spanContext.getSpanIdAsHexString();
}

public String getTraceId() {
return traceId;
}

public String getSpanId() {
return spanId;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.sdk.extension.jfr;

import io.opentelemetry.sdk.trace.data.SpanData;
import jdk.jfr.Category;
import jdk.jfr.Description;
import jdk.jfr.Event;
import jdk.jfr.Label;
import jdk.jfr.Name;

@Label("Span")
@Name("io.opentelemetry.trace.Span")
@Category("Open Telemetry Tracing")
@Description("Open Telemetry trace event corresponding to a span.")
class SpanEvent extends Event {

@Label("Operation Name")
private final String operationName;

@Label("Trace Id")
private final String traceId;

@Label("Span Id")
private final String spanId;

@Label("Parent Id")
private final String parentId;

SpanEvent(SpanData spanData) {
this.operationName = spanData.getName();
this.traceId = spanData.getTraceId();
this.spanId = spanData.getSpanId();
this.parentId = spanData.getParentSpanId();
}

public String getOperationName() {
return operationName;
}

public String getTraceId() {
return traceId;
}

public String getSpanId() {
return spanId;
}

public String getParentId() {
return parentId;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

/**
* Capture Spans and Scopes as events in JFR recordings.
*
* @see io.opentelemetry.sdk.extension.jfr.JfrSpanProcessor
*/
@ParametersAreNonnullByDefault
package io.opentelemetry.sdk.extension.jfr;

import javax.annotation.ParametersAreNonnullByDefault;
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.sdk.extension.jfr;

import static org.junit.jupiter.api.Assertions.assertEquals;

import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.context.ContextStorage;
import io.opentelemetry.context.Scope;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import jdk.jfr.Recording;
import jdk.jfr.consumer.RecordedEvent;
import jdk.jfr.consumer.RecordingFile;
import org.junit.jupiter.api.Test;

public class JfrSpanProcessorTest {
sfriberg marked this conversation as resolved.
Show resolved Hide resolved

private static final String OPERATION_NAME = "Test Span";
private final Tracer tracer;

static {
ContextStorage.addWrapper(JfrContextStorageWrapper::new);
OpenTelemetrySdk.getGlobalTracerManagement().addSpanProcessor(new JfrSpanProcessor());
}

/** Simple test to validate JFR events for Span and Scope. */
public JfrSpanProcessorTest() {
tracer = OpenTelemetry.getGlobalTracer("JfrSpanProcessorTest");
}

/**
* Test basic single span.
*
* @throws java.io.IOException on io error
*/
@Test
public void basicSpan() throws IOException {
Path output = Files.createTempFile("test-basic-span", ".jfr");

try {
Recording recording = new Recording();
recording.start();
Span span;

try (recording) {

span = tracer.spanBuilder(OPERATION_NAME).setNoParent().startSpan();
span.end();

recording.dump(output);
}

List<RecordedEvent> events = RecordingFile.readAllEvents(output);
assertEquals(1, events.size());
events.stream()
.forEach(
e -> {
assertEquals(span.getSpanContext().getTraceIdAsHexString(), e.getValue("traceId"));
assertEquals(span.getSpanContext().getSpanIdAsHexString(), e.getValue("spanId"));
assertEquals(OPERATION_NAME, e.getValue("operationName"));
});

} finally {
Files.delete(output);
}
}

/**
* Test basic single span with a scope.
*
* @throws java.io.IOException on io error
* @throws java.lang.InterruptedException interrupted sleep
*/
@Test
public void basicSpanWithScope() throws IOException, InterruptedException {
Path output = Files.createTempFile("test-basic-span-with-scope", ".jfr");

try {
Recording recording = new Recording();
recording.start();
Span span;

try (recording) {
span = tracer.spanBuilder(OPERATION_NAME).setNoParent().startSpan();
try (Scope s = span.makeCurrent()) {
Thread.sleep(10);
}
span.end();

recording.dump(output);
}

List<RecordedEvent> events = RecordingFile.readAllEvents(output);
assertEquals(2, events.size());
events.stream()
.forEach(
e -> {
assertEquals(span.getSpanContext().getTraceIdAsHexString(), e.getValue("traceId"));
assertEquals(span.getSpanContext().getSpanIdAsHexString(), e.getValue("spanId"));
if ("Span".equals(e.getEventType().getLabel())) {
assertEquals(OPERATION_NAME, e.getValue("operationName"));
}
});

} finally {
Files.delete(output);
}
}
}
1 change: 1 addition & 0 deletions settings.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ include ":opentelemetry-all",
":opentelemetry-sdk-extension-resources",
":opentelemetry-sdk-extension-tracing-incubator",
":opentelemetry-sdk-extension-jaeger-remote-sampler",
":opentelemetry-sdk-extension-jfr-events",
":opentelemetry-sdk-extension-zpages",
":opentelemetry-sdk-testing",
":opentelemetry-bom",
Expand Down