HTTP pipelining causes resource leak #30801

danielmitterdorfer · 2018-05-23T06:15:59Z

Elasticsearch version: 7.0.0-alpha1-SNAPSHOT (distribution flavor OSS), commit 31251c9

Description of the problem including expected versus actual behavior:

Elasticsearch dies with OOME in our benchmarks. This is caused by a resource leak on the network layer.

Steps to reproduce:

Run the following benchmark with Rally (it will build the right revision of Elasticsearch already):

esrally --revision=1918a30 --challenge=append-no-conflicts-index-only --on-error=abort

After a few minutes (about 17%) into the benchmark, Elasticsearch will die with an OOME.

Provide logs (if relevant):

In the logs we see:

[2018-05-23T07:31:31,811][ERROR][i.n.u.ResourceLeakDetector] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.

If we start Elasticsearch with -Dio.netty.leakDetection.level=advanced we get:

Click arrow for details

[2018-05-23T07:31:31,811][ERROR][i.n.u.ResourceLeakDetector] LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
WARNING: 7 leak records were discarded because the leak record count is limited to 4. Use system property io.netty.leakDetection.maxRecords to increase the limit.
Recent access records: 4
#4:
	Hint: 'read_timeout' will handle the message from this point.
	io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)
	io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	java.base/java.lang.Thread.run(Thread.java:844)
#3:
	Hint: 'openChannels' will handle the message from this point.
	io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)
	io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	java.base/java.lang.Thread.run(Thread.java:844)
#2:
	Hint: 'DefaultChannelPipeline$HeadContext#0' will handle the message from this point.
	io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116)
	io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:345)
	io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	java.base/java.lang.Thread.run(Thread.java:844)
#1:
	io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:630)
	io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
	io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
	io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	java.base/java.lang.Thread.run(Thread.java:844)
Created at:
	io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:314)
	io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:162)
	io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:153)
	io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:135)
	io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:80)
	io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:122)
	io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
	io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
	io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	java.base/java.lang.Thread.run(Thread.java:844)

Can you please have a look at this @tbrooks8?

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-05-23T06:16:00Z

Pinging @elastic/es-core-infra

This is related to elastic#30801. When we calling the http pipelining aggregator on an inbound request, we retain the netty request. However, this is unnecessary as the pipelining aggregator does not store the request. This worked in the past as we manually release the request and netty internally automatically releases the request. At this point we do not implement the ref counter interface after the pipelining step which means that netty is no longer automatically handling this second retain. This commit removes that retain.

vineelyalamarthy · 2018-05-24T01:06:03Z

Is this issue still open ? Or with the patches submitted, this got closed?

Tim-Brooks · 2018-05-24T01:51:57Z

I believe it is fixed. But I think we should wait until the nightly benchmarks run to be sure.

jasontedor · 2018-05-25T11:15:44Z

Closed by #30820

danielmitterdorfer added >bug :Distributed Coordination/Network Http and internode communication implementations v7.0.0 labels May 23, 2018

danielmitterdorfer assigned Tim-Brooks May 23, 2018

danielmitterdorfer mentioned this issue May 23, 2018

Docs test cluster going out of memory on reference/mapping/types/geo-shape docs #30811

Closed

Tim-Brooks mentioned this issue May 23, 2018

Fix memory leak in http pipelining #30815

Closed

jasontedor closed this as completed May 25, 2018

danielmitterdorfer mentioned this issue May 31, 2018

Query latency increases with multiple shards #30994

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP pipelining causes resource leak #30801

HTTP pipelining causes resource leak #30801

danielmitterdorfer commented May 23, 2018

elasticmachine commented May 23, 2018

vineelyalamarthy commented May 24, 2018

Tim-Brooks commented May 24, 2018

jasontedor commented May 25, 2018

HTTP pipelining causes resource leak #30801

HTTP pipelining causes resource leak #30801

Comments

danielmitterdorfer commented May 23, 2018

elasticmachine commented May 23, 2018

vineelyalamarthy commented May 24, 2018

Tim-Brooks commented May 24, 2018

jasontedor commented May 25, 2018