Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try reuse existing Netty pooled allocator singleton (Fixes #5168) #5292

Closed
wants to merge 1 commit into from

Conversation

franz1981
Copy link
Contributor

@franz1981 franz1981 commented Sep 1, 2024

Fixes #5168

This is part of #5262

non-SSL connections configure PooledByteBufAllocator.DEFAULT as per

if (options.isSsl()) {
bootstrap.childOption(ChannelOption.ALLOCATOR, PartialPooledByteBufAllocator.INSTANCE);
} else {
bootstrap.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT);
}

In this scenario, hitting these call sites:

ByteBuf buffer = VertxByteBufAllocator.DEFAULT.heapBuffer(buf.readableBytes());

ByteBuf buffer = VertxByteBufAllocator.DEFAULT.heapBuffer(buf.readableBytes());

or

public BufferImpl(int initialSizeHint) {
buffer = VertxByteBufAllocator.DEFAULT.heapBuffer(initialSizeHint, Integer.MAX_VALUE);
}
public BufferImpl(byte[] bytes) {
buffer = VertxByteBufAllocator.DEFAULT.heapBuffer(bytes.length, Integer.MAX_VALUE).writeBytes(bytes);

referencing VertxByteBufAllocator.DEFAULT , it triggers the initialization of VertxByteBufAllocator.POOLED_ALLOCATOR as well and by consequence its inner structures - making it a GC Root.

It results into a duplication of allocators i.e. PooledByteBufAllocator.DEFAULT and VertxByteBufAllocator.POOLED_ALLOCATOR.

This patch aim to reuse the existing singleton i.e. PooledByteBufAllocator.DEFAULT if the user doesn't mess-up with -Dio.netty.noPreferDirect, in order to guarantee the original behaviour to be preserved.

@franz1981 franz1981 force-pushed the 4.x_unified_allocator branch from ca6c707 to a1fff79 Compare September 1, 2024 13:10
@cescoffier
Copy link
Contributor

Hold on before merging this one. We need to discuss versioning schemes.

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 3, 2024

@cescoffier @vietj did you decided what could be best to do? 🙏

@cescoffier
Copy link
Contributor

Right now, it would be Vertx 5 only, as we won't have a Vertx 4.6

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 3, 2024

@cescoffier
For quarkus - what is the expected time frame where vertx 5 can be integrated?
Remember that this change I did here could easily be backported too, given that it brings tangible advantage with zero effort, in quarkus

@cescoffier
Copy link
Contributor

For Quarkus, Vert.x 5 integration will start in spring 2025 and end (optimistically) in fall 2025.

The fact that we don't really know when useDirect is true or false (not the same value in tests, if you use JPMS, if you have access to usafe ... ) and that it can drastically change the behavior makes it hard to accept in a version that is intended to be used in the LTS.

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 3, 2024

The fact that we don't really know when useDirect is true or false (not the same value in tests, if you use JPMS, if you have access to usafe ... ) and that it can drastically change the behavior makes it hard to accept in a version that is intended to be used in the LTS.

This is the same that will do Netty under the hood actually; it's not at all related to Unsafe but is instead related the sys property -Dio.netty.noPreferDirect (unrelated to unsafe) - and accessible with PooledByteBufAllocator.defaultPreferDirect (which is agnostic to unsafe).

Currently Netty allocates its own singleton(s) reading the same property; the point of checking it upfront here is to preserve the same behaviour the vertx users has benefitted till now.

If this is still not enough to mitigate the risk; I can think of another patch which further narrow the problem - with clearly some tradeoffs, let me know your opinion 👍

@cescoffier
Copy link
Contributor

It's not what we saw with Julien yesterday, where we got prefer direct true in some cases and false in other cases - without setting any system property.

@cescoffier
Copy link
Contributor

I will only accept the risk if it does not change the current default except if explicitly configured (like with a system property) - but as I said - it seems to set the default value differently depending on various things (like unsafe, JVM version, JPMS, NEtty 4.1 vs. 4.2...)

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 3, 2024

I can see why @cescoffier thanks for sharing, that helped:

It is likely due to the presence of cleaner's API accessible, see https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L202

which transitively depends by Unsafe presence, see https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/CleanerJava9.java#L79

that raise some concern regardless this PR, because, right now in Vertx:

  1. without SSL: if unsafe is not present -> cleaner is noop -> Vertx uses Netty's allocator singleton which does prefer heap pooled buffers: this is bad, see
    https://github.com/netty/netty/blob/7893d5f150aefc99bc720387875395afad94d807/codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java#L328
    causing Heap arenas to be allocated (which cost heap, not off-heap) and additional on-the-flight direct buffers while interacting with NIO sockets i.e. more memory/heap used, more cpu used
  2. with SSL: if unsafe is not present -> cleaner is noop -> Vertx uses its own partial allocator which force using direct buffers (without a CLEANER! - can OOM a container if a direct arena is freed, cause we leave it at the mercy of the GC), but which doesn't pool heap buffers (and causing SSL JDK a performance issue)

In short: if Unsafe is not present, it is a disaster, from a performance perspective.
So, if you have any run which have no Cleaner API accessible due to lack on Unsafe, we need to address it regardless, because it is bad, performance, but most importantly, stability-wise, for our users.

I'll perform some deeper analysis tomorrow, but my current conclusion is, with the current PR:

  1. the cleaner won't be available - we will get in the case 2, but for non SSL as well: this is a change, but not really, given that with SSL is already like that - and we still save the memory because the Netty singleton won't be used (but I have to make sure it will NEVER be referenced - which can cause allocating it)
  2. if the cleaner will be available - nothing change from now, but we will save to allocate the vertx allocator, saving memory

This is what happen in detail with this PR, is it more clear? I wasn't fully aware of all the changes

@cescoffier
Copy link
Contributor

Well, and it will likely be different between JVM and native.
We need more investigation to understand all the consequences, even if today's solution is far from ideal.

As I said, we cannot change the behavior in an LTS.

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 3, 2024

@cescoffier

Well, and it will likely be different between JVM and native.

which is kind of surprising (to me - because I didn't profiled native image apps enough - mea culpa :( )

We need more investigation to understand all the consequences, even if today's solution is far from ideal.

+100
And I'll fill some issue with all the consequences and details: this stuff is NOT documented in Netty and only knowing how it works help - so better doing it once for "all" , if you agree

As I said, we cannot change the behavior in an LTS.

I'll pick the choice of using a sys property to not impact anyone - but I still need to dug into this enough to make sure what we get/loose and risks

@franz1981
Copy link
Contributor Author

franz1981 commented Sep 5, 2024

@cescoffier @zakkak

I'm a bit concerned about

It's not what we saw with Julien yesterday, where we got prefer direct true in some cases and false in other cases - without setting any system property.

In JDK 17, 21 and native image with 21-graal the behaviour is as expected, see

image

and at

image

where you can spot AbstractByteBufAllocator::buffer calling AbstractByteBufAllocator::directBuffer

and allocating io.netty.buffer.PooledUnsafeDirectByteBuf which is what we would expected.

in short: native image with latest quarkus and 21-graal is correctly finding both Unsafe and the Cleaner
and using the "right" buffers (using the Cleaners).

@cescoffier @vietj

can you help clarifying in which context both have observed defaultPreferDirect to return false?

I believe instead that you observed a different thing, let me show what's my hypothesis:

In short:

  • VertxByteBufAllocator.DEFAULT (of type VertxByteBufAllocator) is used to allocate unpooled heap buffer(s) without reference counting within vertx - they are not used within Netty nor exposed as singleton which Netty can use somehow. They have preferDirect to false, always (and I believe is what you have observed) - this allocator is used in vert.x/vertx-core/src/main/java/io/vertx/core/buffer/impl/BufferImpl.java
  • VertxByteBufAllocator.POOLED_ALLOCATOR instead is NOT of type VertxByteBufAllocator and is the subject of the changes in this PR; it can (with SSL on) be used by Vertx as a Netty allocator (i.e. within Netty) and it was hard-coding to preferDirect = true: the purpose of this PR is to avoid being created, if not necessary (by reusing the Netty already existing singleton) while respecting the expected behavior in the past (preferDirect == true)

@franz1981 franz1981 force-pushed the 4.x_unified_allocator branch from a1fff79 to 65e745d Compare September 5, 2024 20:41
@franz1981
Copy link
Contributor Author

franz1981 commented Sep 5, 2024

PTAL @cescoffier

I've added a new property vertx.reuseNettyAllocators, which is false by default, to preserve the existing behaviour.
When merged in Quarkus we can set this property to true saving the doubling of RSS mentioned at quarkusio/quarkus#42224

If the approach is sound I can add a similar sys property to allow Netty heap buffers to be pooled with JDK SSL engine, fixing quarkusio/quarkus#41880 (comment) TLDR poor performance with JDK SSL due to big heap allocations in the hot path.

@cescoffier
Copy link
Contributor

That looks more acceptable, but I do not see where it's documented and tested (mainly because we may detect behavior change between Vert.x 4.x and 5.x)

@franz1981
Copy link
Contributor Author

Given that it depends by a static final property, if I set it to try it, is going to impact all the tests after it, randomly (because it depends how junit order test execution).
I have tested that the default has not changed, instead. We can speak with @vietj on how to make it testable, without running the whole test suite again with this property in.
Consider that it doesn't change the classes used to perform the pooled allocations, but just which instance and currently there are no specific tests on vertx to cover what exist now.
I am opened to suggestions.

@cescoffier
Copy link
Contributor

We may have a bit more flexibility to test it in Quarkus (I don't care where the tests are, just that we need some coverage)

@franz1981
Copy link
Contributor Author

@vietj Any other concern for this @cescoffier @vietj ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants