Skip to content

Commit

Permalink
Merge pull request #1155 from newrelic/NR-84180-fix-httpurlconnection
Browse files Browse the repository at this point in the history
Nr 84180 fix httpurlconnection
  • Loading branch information
jtduffy authored Feb 21, 2023
2 parents b228300 + 29fd6eb commit 501d510
Show file tree
Hide file tree
Showing 6 changed files with 288 additions and 67 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ public class HttpURLConnectionTest {
private static final String POST_DATA = "post_data";
private static final String TEST_CLASS = "com.newrelic.agent.instrumentation.pointcuts.net.HttpURLConnectionTest";
// This timeout is required if connect is the only HttpURLConnection API called
private static final int TEST_SLEEP_TIME_MILLIS = 70_000;
private static final int TEST_SLEEP_TIME_MILLIS = 7_000;

@BeforeClass
public static void beforeClass() {
Expand Down
110 changes: 103 additions & 7 deletions instrumentation/httpurlconnection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,117 @@

This is a very old HTTP client and the APIs can be called in many combinations which can lead to unpredictable scenarios when trying to instrument it.

There is also the added complication that once the `connect` method has been called, whether explicitly or not, the `HttpURLConnection` headers become immutable, and we can no longer add our distributed tracing headers.
There is also the added complication that once the `connect` method has been called, whether explicitly or not, the `HttpURLConnection` headers become
immutable, and we can no longer add our distributed tracing headers.

Because of these things, we need to keep some state on the order that `HttpURLConnection` APIs are invoked and in some cases we need to defer making a decision on whether an external call should be recorded or not until we have a more complete picture on what APIs were called and in what order.
Furthermore, legacy CAT (Cross Application Tracing) can cause additional problems if `reportAsExternal` is called too early which causes exceptions.

Because of these things, we need to keep some state on the order that `HttpURLConnection` APIs are invoked and in some cases we need to defer making a decision
on whether an external call should be recorded or not until we have a more complete picture on what APIs were called and in what order.

## Usage scenarios/implementation

This instrumentation creates a segment to do timing of external calls, starting when the first `HttpURLConnection` API is called and lasting until the external call is recorded. Because of this the timing isn't as exact as tracing the individual methods. The reason for doing the timing in this manner is that it allows the instrumentation to call `addOutboundRequestHeaders` and `reportAsExternal` on the same segment regardless of the order that the `HttpURLConnection` APIs are called in.
This instrumentation creates a segment to do timing of external calls, starting when the first `HttpURLConnection` API is called and lasting until the external
call is recorded. Because of this the timing isn't as exact as tracing the individual methods. The reason for doing the timing in this manner is that it allows
the instrumentation to call `addOutboundRequestHeaders` and `reportAsExternal` on the same segment regardless of the order that the `HttpURLConnection` APIs are called in.

Calling `addOutboundRequestHeaders` will result in a span (which could be generated from a segment or a tracer) that links one APM entity to another APM entity in a distributed trace, whereas calling `reportAsExternal` will result in a span being marked as "External HTTP" that links one APM entity to any host (whether it is monitored by New Relic, or not). Calling both methods on a single segment instance ensures that both scenarios are associated with a single span in a distributed trace. This mitigates a bug in previous versions of this instrumentation that was using tracers instead of a segment. The bug was that a tracer associated with one method (typically `connect`) would call `addOutboundRequestHeaders` and a tracer for another method would call `reportAsExternal`, resulting in two different spans in a single distributed trace showing the same external call being made twice, one linked to an APM entity and the other linked to the hostname.
Calling `addOutboundRequestHeaders` will result in a span (which could be generated from a segment or a tracer) that links one APM entity to another APM entity
in a distributed trace, whereas calling `reportAsExternal` will result in a span being marked as "External HTTP" that links one APM entity to any host (whether
it is monitored by New Relic, or not). Calling both methods on a single segment instance ensures that both scenarios are associated with a single span in a
distributed trace. This mitigates a bug in previous versions of this instrumentation that was using tracers instead of a segment. The bug was that a tracer
associated with one method (typically `connect`) would call `addOutboundRequestHeaders` and a tracer for another method would call `reportAsExternal`, resulting
in two different spans in a single distributed trace showing the same external call being made twice, one linked to an APM entity and the other linked to an external host.

Much of this complication is due to the fact that once the `connect` method has been called, the `HttpURLConnection` header map can no longer be updated and any calls to `addOutboundRequestHeaders` will fail. This means that if `connect` is explicitly called first we have no choice but to call `addOutboundRequestHeaders` at that point and try to predict what sequence of events happens next and whether an external call should be recorded and, if so, how to call `reportAsExternal` on the same tracer/segment so that only a single external span gets generated.
Much of this complication is due to the fact that once the `connect` method has been called, the `HttpURLConnection` header map can no longer be updated and any
calls to `addOutboundRequestHeaders` will fail. This means that if `connect` is explicitly called first we have no choice but to call `addOutboundRequestHeaders`
at that point and try to predict what sequence of events happens next and whether an external call should be recorded and, if so, how to call `reportAsExternal`
on the same tracer/segment so that only a single external span gets generated.

Behavior expected when `HttpURLConnection` APIs are called in different combos and orders.
* If only `connect` is called, then NO request is made over the wire and NO external call is reported. The instrumentation starts a `TimerTask` if `connect` is called first, waits for a set period of time to determine if any further `HttpURLConnection` APIs are called before deciding how to proceed. If no other API is called, then the segment is just ignored and no external is reported. If any other method is called an external call will be recorded.
* Calling any of `getOutputStream`, `getInputStream`, `getResponseCode`, or `getHeaderFields` all result in an external call being recorded.
* If only `connect` is called, then NO request is made over the wire and NO external call is reported. The instrumentation starts a `SegmentCleanupTask` if`connect` is called first, waits for a set period of time to determine if any further `HttpURLConnection` APIs are called before deciding how to proceed. If no other API is called, then the segment is just ignored and no external is reported. If any other method is called an external call will be recorded.
* If `getOutputStream` is called alone or in combination with `connect`, then an external will only be recorded with DT (not CAT due to it causing exceptions when it reads from the input stream too early). When using CAT a `SegmentCleanupTask` will be started to determine if any methods were called after `getOutputStream` that would warrant reporting an external, or if the segment should be ignored.
* Calling any of `getInputStream`, `getResponseCode`, or `getHeaderFields` all result in an external call being recorded.

## SegmentCleanupTask and ScheduledThreadPoolExecutor tuning

For each external request where `connect` (or `getOutputStream` combined with CAT) is called first a `SegmentCleanupTask` `Runnable` gets scheduled to execute on a `ScheduledThreadPoolExecutor`. The
`SegmentCleanupTask` waits for a configurable period of time (default is 5 seconds) to determine if any other `HttpURLConnection` APIs are called that result in
a network request being issued. In most cases, another method will be called within this time period which will result in the scheduled task being cancelled and
removed from the executor's scheduled queue. In cases where no method is called that issues a network request, the `SegmentCleanupTask` `Runnable` will execute
and mark the Segment started when connect was called as ignored so that it doesn't appear as an external call in the related distributed trace.

The `ScheduledThreadPoolExecutor` is shared amongst all `HttpURLConnection` instances making external requests and the size of its managed threadpool can be
configured via `thread_pool_size`. The delay after which each `SegmentCleanupTask` queued up with the `ScheduledThreadPoolExecutor` will be executed can be
configured with `delay_ms`.

Config examples:

Yaml
```yaml
class_transformer:
com.newrelic.instrumentation.httpurlconnection:
enabled: true
segment_cleanup_task:
thread_pool_size: 5
delay_ms: 5000
```
System property
```
-Dnewrelic.config.class_transformer.com.newrelic.instrumentation.httpurlconnection.segment_cleanup_task.thread_pool_size=5
-Dnewrelic.config.class_transformer.com.newrelic.instrumentation.httpurlconnection.segment_cleanup_task.delay_ms=5000
```

Environment variable
```
NEW_RELIC_CLASS_TRANSFORMER_COM_NEWRELIC_INSTRUMENTATION_HTTPURLCONNECTION_SEGMENT_CLEANUP_TASK_THREAD_POOL_SIZE=5
NEW_RELIC_CLASS_TRANSFORMER_COM_NEWRELIC_INSTRUMENTATION_HTTPURLCONNECTION_SEGMENT_CLEANUP_TASK_DELAY_MS=5000
```

## Troubleshooting

Whenever a `SegmentCleanupTask` runs and marks the external `Segment` as ignored it means that no external call will be recorded for the `Segment`.
When this happens it will generate a supportability metric named `Supportability/HttpURLConnection/SegmentIgnore/connect`. This means that either one of two
things:
1. There is a scenario where `connect` is called and never followed by any other `HttpURLConnection` APIs that issue a network request. In this case, it is correct that the `Segment` was ignored and no external call was recorded.
2. There is a scenario where `connect` is called and any other `HttpURLConnection` APIs that issue a network request did not occur before the `SegmentCleanupTask` ran. In this case, you can increase the `delay_ms` so that the `SegmentCleanupTask` is delayed long enough for the external call to successfully take place.
3. There is a scenario where CAT is enabled and `getOutputStream` is called and never followed by any other `HttpURLConnection` APIs that issue a network request. In this case, it is correct that the `Segment` was ignored and no external call was recorded.

At `FINEST` level agent logs the number of queued tasks in the executor at the time of each external request will be logged. You can find these logs by
searching for the string `HttpURLConnection - number of queued cleanup tasks`. This can be useful to see if there are too many runnable tasks queued up at once
that could lead to memory pressure. Setting the `delay_ms` to a lower value will have the biggest impact on lowering the number of tasks that can be queued at
a given time, while increasing `thread_pool_size` can help remove queued tasks quicker.

Threads in the threadpool managed by the executor will be named `New Relic HttpURLConnection Segment Cleanup Task`.

Another issue to be aware of is that legacy CAT (Cross Application Tracing) can cause additional problems if `reportAsExternal` is called too early. As seen in the stacktrace below
it tries to process the inbound response headers which will trigger a call to `getHeaderField` on the `HttpURLConnection` instance which
forces it to connect and read from the input stream. Unfortunately, for users of `HttpURLConnection` this has the unexpected effect of rendering the
`HttpURLConnection` header map immutable as well as causing `ProtocolException: Cannot write output after reading input` and
`IOException: Stream is closed` exceptions when attempting to write to the output stream. Because the headers become immutable some customers saw the `Content-Length` header disappear from certain outgoing HTTP requests from their applications, which resulted in external servers rejecting requests with HTTP `411 Length Required` errors.

```java
"java.base/sun.net.www.protocol.http.HttpURLConnection$StreamingOutputStream.close(HttpURLConnection.java:3834)"
"java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1622)"
"java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)"
"java.base/sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:3221)"
"com.nr.agent.instrumentation.httpurlconnection.InboundWrapper.getHeader(InboundWrapper.java:26)"
"com.newrelic.agent.HeadersUtil.getAppDataHeader(HeadersUtil.java:122)"
"com.newrelic.agent.CrossProcessTransactionStateImpl.processInboundResponseHeaders(CrossProcessTransactionStateImpl.java:330)"
"com.newrelic.agent.tracers.DefaultTracer.recordInboundResponseHeaders(DefaultTracer.java:685)"
"com.newrelic.agent.tracers.DefaultTracer.recordExternalMetricsHttp(DefaultTracer.java:722)"
"com.newrelic.agent.tracers.DefaultTracer.recordExternalMetrics(DefaultTracer.java:664)"
"com.newrelic.agent.tracers.DefaultTracer.recordMetrics(DefaultTracer.java:469)"
"com.newrelic.agent.tracers.DefaultTracer.performFinishWork(DefaultTracer.java:284)"
"com.newrelic.agent.tracers.DefaultTracer.finish(DefaultTracer.java:236)"
"com.newrelic.agent.Transaction.finishTracer(Transaction.java:2370)"
"com.newrelic.agent.Transaction.finishSegment(Transaction.java:2365)"
"com.newrelic.agent.Segment$1.run(Segment.java:202)"
"com.newrelic.agent.ExpirationService.expireSegmentInline(ExpirationService.java:47)"
"com.newrelic.agent.Segment.finish(Segment.java:214)"
"com.newrelic.agent.Segment.end(Segment.java:144)"
"com.nr.agent.instrumentation.httpurlconnection.MetricState.reportExternalCall(MetricState.java:255)"
```

## Example HttpURLConnection usage

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
/*
*
* * Copyright 2023 New Relic Corporation. All rights reserved.
* * SPDX-License-Identifier: Apache-2.0
*
*/

package com.nr.agent.instrumentation.httpurlconnection;

import com.newrelic.agent.bridge.AgentBridge;
import com.newrelic.api.agent.NewRelic;

import java.util.logging.Level;

/**
* Provides some config options for tuning the instrumentation's use of a SegmentCleanupTask runnable.
* Specifically, thread_pool_size configures the number of threads used by the executor to execute
* the runnables and delay_ms configures the delay before a scheduled runnable task is executed.
* <p>
* See README for config examples.
*/
public class HttpURLConnectionConfig {
private static final String configPrefix = "class_transformer.com.newrelic.instrumentation.httpurlconnection.segment_cleanup_task.";
/*
* The following tests do a Thread.sleep to account for this delay. If this value is changed then the tests will also need to be updated.
* functional_test/src/test/java/com/newrelic/agent/instrumentation/pointcuts/net/HttpURLConnectionTest
* instrumentation/httpurlconnection/src/test/java/com/nr/agent/instrumentation/httpurlconnection/MetricStateConnectTest
*/
private static final int DEFAULT_TASK_DELAY_MS = 5_000;
private static final int DEFAULT_THREAD_POOL_SIZE = 5;
private static final boolean DEFAULT_DISTRIBUTED_TRACING_ENABLED = true;

private HttpURLConnectionConfig() {
}

public static int getThreadPoolSize() {
int threadPoolSize = DEFAULT_THREAD_POOL_SIZE;
try {
threadPoolSize = NewRelic.getAgent().getConfig().getValue(configPrefix + "thread_pool_size", DEFAULT_THREAD_POOL_SIZE);
} catch (Exception e) {
AgentBridge.getAgent().getLogger().log(Level.FINEST, "HttpURLConnection - using default thread_pool_size: " + threadPoolSize);
}
return threadPoolSize;
}

public static int getDelayMs() {
int delayMs = DEFAULT_TASK_DELAY_MS;
try {
delayMs = NewRelic.getAgent().getConfig().getValue(configPrefix + "delay_ms", DEFAULT_TASK_DELAY_MS);
} catch (Exception e) {
AgentBridge.getAgent().getLogger().log(Level.FINEST, "HttpURLConnection - using default task delay_ms: " + delayMs);
}
return delayMs;
}

public static boolean distributedTracingEnabled() {
boolean dtEnabled = DEFAULT_DISTRIBUTED_TRACING_ENABLED;
try {
dtEnabled = NewRelic.getAgent().getConfig().getValue("distributed_tracing.enabled", DEFAULT_DISTRIBUTED_TRACING_ENABLED);
} catch (Exception ignored) {
AgentBridge.getAgent().getLogger().log(Level.FINEST, "HttpURLConnection - using default distributed tracing enabled: " + dtEnabled);
}
return dtEnabled;
}
}
Loading

0 comments on commit 501d510

Please sign in to comment.