-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide blockhound integrations #4493
Conversation
Note: There is a known issue where a |
90e89e3
to
11e8e18
Compare
As of Netty 4.1.85, FastThreadLocalThread permits to perform blocking calls. If the flag is enabled, the Netty Blockhound integration module does not assume the thread is a non-blocking thread. netty/netty#12978 |
Hi, let me report the following after trying blockhound with Armeria 1.20.3 Zipkin tracing can block when flushing spans
Though it sounds ignorable, I'm not sure if armeria integration should be the one ignoring it, as opposed to each application using it. Reading Prometheus gauges (ExecutorServiceMetrics) can block
This looks ignorable. recording in a Hdr histogram can cause a sleep
The sleep is about 1ms at most, so I suppose it can be ignored, but let's be aware that it could happen for each histogram that is being updated in a non-blocking thread. Submitting a task to a blocking scheduler can block
The lock seems like it's never held for long. Guava's MapMaker's ConcurrentMap can block
Again, the lock seems never held for long. |
Is there any progress on this? |
Let me try working on this again when I have more time 😅 At the moment, the difficult part is figuring out why failed tests are failing. The pattern most often found is:
Let me look into this again next week 😅 Sorry about the delay |
If we introduce a general interface that marks all methods of its subclasses are safe to execute in an event loop, it would be easier to implement the BlockHound integration module. The following example was borrowed from the change in #4837 public interface EventLoopSafe {}
class DestroyHandler implements BiFunction<Object, Throwable, Void>, EventLoopSafe {
@Override
public Void apply(Object unused, Throwable throwable) {
// BlockHound will report this blocking operation without EventLoopSafe
lock.lock();
try {
...
} finally {
lock.unlock();
}
return null;
}
} |
I'm not sure if I understood the intention, but could it be easier if we just introduce a lock which is ignorable by Blockhound? |
In #4837, it is likely that Question: Is it feasible to extend class ReentrantShortLock extends ReentrantLock { ... }
BlockHound.Builder builder = ...;
builder.allowBlockingCallsInside("com.linecorp.armeria.internal.common.ReentrantShortLock",
"lock"); |
ddbdbc3
to
9967be7
Compare
Sorry about the delay 🙏 I believe this PR is (finally) ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! 🙇
brave/src/main/resources/META-INF/services/reactor.blockhound.integration.BlockHoundIntegration
Outdated
Show resolved
Hide resolved
public void applyTo(Builder builder) { | ||
// short locks | ||
builder.allowBlockingCallsInside("com.linecorp.armeria.client.HttpClientFactory", | ||
"pool"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain which part of the method is a blocking call? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the report from @mauhiz contains the stacktrace. I was able to reproduce this failure from our CI as well.
at java.base/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
at com.linecorp.armeria.internal.shaded.guava.collect.MapMakerInternalMap$Segment.put(MapMakerInternalMap.java:1476)
at com.linecorp.armeria.internal.shaded.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:2419)
at java.base/java.util.concurrent.ConcurrentMap.computeIfAbsent(ConcurrentMap.java:331)
at com.linecorp.armeria.client.HttpClientFactory.pool(HttpClientFactory.java:404)
at com.linecorp.armeria.client.HttpClientDelegate.acquireConnectionAndExecute0(HttpClientDelegate.java:169)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thanks! 😉
builder.allowBlockingCallsInside("org.HdrHistogram.ConcurrentHistogram", "getCountAtIndex"); | ||
builder.allowBlockingCallsInside("org.HdrHistogram.WriterReaderPhaser", "flipPhase"); | ||
|
||
// StreamMessageInputStream internally uses a blocking queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it uses add instead offer? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like LinkedBlockingQueue
internally delegates add
to offer
https://github.com/openjdk/jdk/blob/f5a6b7f7c03c00c96d0055f9be31057675205e13/src/java.base/share/classes/java/util/concurrent/LinkedBlockingQueue.java#LL81C45-L81C58
https://github.com/openjdk/jdk/blob/f5a6b7f7c03c00c96d0055f9be31057675205e13/src/java.base/share/classes/java/util/AbstractQueue.java#L94
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work. 🙇♂️ This should be very useful to diagnose event loop blocking problems.
core/src/test/java/com/linecorp/armeria/client/Http1HeaderNamingTest.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/common/util/AbstractListenable.java
Show resolved
Hide resolved
core/src/main/java/com/linecorp/armeria/common/CoreBlockhoundIntegration.java
Outdated
Show resolved
Hide resolved
@Override | ||
public void applyTo(Builder builder) { | ||
// UUID.randomUUID is called by default | ||
builder.allowBlockingCallsInside("graphql.execution.ExecutionId", "generate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of removing this rule and using ctx.id().text()
as the ExecutionId
?
- I don't see that users would want to use the
UUID.randomUUID
for the default execution ID. - It would be a better choice to correlate an HTTP request with a GraphQL query.
final ExecutionInput executionInput =
builder.context(ctx)
.executionId(ExecutionId.from(ctx.id().text()))
...
armeria/graphql/src/main/java/com/linecorp/armeria/server/graphql/DefaultGraphqlService.java
Lines 90 to 92 in e65da63
builder.context(ctx) | |
.graphQLContext(GraphqlServiceContexts.graphqlContext(ctx)) | |
.dataLoaderRegistry(dataLoaderRegistry) |
We may add an API that takes an
ExecutionId
factory from GraphqlServiceBuilder
later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue for this. #4867
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Seems like a simple enough workaround to include in this PR 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! 💯 👍
add4a98
to
7b6dd68
Compare
* A {@link BlockHoundIntegration} for the brave module. | ||
*/ | ||
@UnstableApi | ||
public final class BraveBlockHoundIntegration implements BlockHoundIntegration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any case a user needs to refer to our BlockHoundIntegration
implementations manually? If not, can we move all BlockHoundIntegration
into the internal
package, given that they are loaded automatically via SPI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought users may want to exclude some BlockHoundIntegration
s (not necessarily armeria's)
e.g.
BlockHound.install(BlockHoundIntegration... integrations) - same as BlockHound.install(), but adds user-provided integrations to the list.
BlockHound.builder().install() - will create a new builder, without discovering any integrations.
You may install them manually by using BlockHound.builder().with(new MyIntegration()).install().
https://github.com/reactor/BlockHound/blob/master/docs/customization.md#customization
If there is no need, I'm fine with hiding them. Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Then it's fine as it is :-)
a0deb15
to
7b6dd68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank a lot for tackling into this, @jrhee17. 🙇🙇🙇
Thank you! |
Motivation:
Armeria users often use
BlockHound
to determine whether there are any blocking calls in their applications.Providing a basic
BlockHound
integration can help de-duplicate some code and debugging efforts.I propose the following:
BlockHound
integration up-to-date with changes.ThreadPoolExecutor
). As long as these locks are verified to be short-living, we can add it to our integration.Micrometer
orHdrHistogram
to provideBlockhound
integrations. Since Armeria usesMicrometer
to record metrics from event loops, I think it is reasonable that we addHdrHistogram
related calls to ourBlockhound
integration.Also note that there are some blocking calls within the codebase, but this PR attempts to just introduce blockhound rather than remove such calls.
Modifications:
BlockHound
integrations forcore
,brave
,grpc
,retrofit2
,scala
,sangria
modulesBlockHound
integration which allows test-specific blocking callsblockhound
gradle property flag is specified, run tests withBlockHound
enabledBlockHound.install
is not invoked and not enabled in the code-baseReentrantShortLock
which Armeria'sBlockHound
whitelists and change all usages ofReentrantLock
BlockingFastThreadLocalThread
is now used for non-event loops since netty whitelistsFastThreadLocalThread
withpermitBlockingCalls==true
ShutdownSupport::shutdown
is now called fromstartStopExecutor
. Although this method returns a CF, the call can actually be blockingSTART_STOP_EXECUTOR
is now a non-eventloop daemon thread instead of using theGlobalEventExecutor
GlobalEventExecutor
is an event loop.BlockingUtils#blockingRun
in case users want to run blocking calls from testskotlinx-coroutines-debug
dependency so that CoroutinesBlockHound
integration is loadedThreadFactory
when creating aEventLoopExtension
orEventLoopGroupExtension
SamlService
,SamlDecorator
now runs some calls from the blocking executorZookeeperEndpointGroup
now uses a non-event-loop thread factoryResult:
BlockHound
integrations provided by armeria