TransportException when calling `_snapshot/repo/_all` on non-master #26906

patrobinson · 2017-10-06T00:53:25Z

Elasticsearch version : Version: 5.4.3, Build: eed30a8/2017-06-22T00:34:03.743Z, JVM: 1.8.0_131

Plugins installed: s3-repository

JVM version:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version: Linux ip-10-200-7-5 4.4.0-1020-aws #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Executing curl localhost:9200/_snapshot/prod/_all on a non-master node results in:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]"}],"type":"null_pointer_exception","reason":null},"status":500}

Expected it to return all snapshots. When run against the master node this returns fine.

Steps to reproduce:

Create an s3 repository.
Fill it with some 250 snapshots, each containing ~2,000 indices.
Attempt to call _snapshot/repository_name/_all on a non-master node

Provide logs (if relevant):

[2017-10-06T00:35:54,433][WARN ][r.suppressed             ] path: /_snapshot/prod/_all, params: {repository=prod, snapshot=_all}
org.elasticsearch.transport.RemoteTransportException: [8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]
Caused by: java.lang.NullPointerException

The text was updated successfully, but these errors were encountered:

jasontedor · 2017-10-06T01:02:33Z

The node 10.200.7.5 will hopefully have a stack trace in its logs. If so, can you share it here? If not, can you send the request again with the URL parameter ?error_trace=true?

patrobinson · 2017-10-06T01:05:50Z

No stack trace on the master node, here's the error_trace

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]",
                "stack_trace": "[[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]]; nested: RemoteTransportException[[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]]; nested: NullPointerException;
	at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:618)
	at org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:563)
	at org.elasticsearch.rest.BytesRestResponse.build(BytesRestResponse.java:144)
	at org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:101)
	at org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:92)
	at org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:58)
	at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:94)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$3.handleException(TransportMasterNodeAction.java:185)
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050)
	at org.elasticsearch.transport.TcpTransport.lambda$handleException$18(TcpTransport.java:1478)
	at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
	at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1476)
	at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1468)
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1412)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.lang.Thread.run(Thread.java:748)
Caused by: RemoteTransportException[[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]]; nested: NullPointerException;
Caused by: java.lang.NullPointerException
"
            }
        ],
        "type": "null_pointer_exception",
        "reason": null,
        "stack_trace": "java.lang.NullPointerException
"
    },
    "status": 500
}

patrobinson · 2017-10-06T01:06:45Z

My guess is this is because of the sheer size of the response body (over 10 million characters)

ywelsch · 2017-10-06T07:03:15Z

I think that you're encountering a bug fixed by #26127
Please upgrade to 5.6 and check if you still encounter the same issues.

tlrx · 2018-01-09T13:51:35Z

Closed by #26127

patrobinson changed the title ~~TransportException when requesting calling _snapshot/repo/_all on non-master~~ TransportException when calling _snapshot/repo/_all on non-master Oct 6, 2017

andyb-elastic added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Oct 10, 2017

tlrx closed this as completed Jan 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransportException when calling `_snapshot/repo/_all` on non-master #26906

TransportException when calling `_snapshot/repo/_all` on non-master #26906

patrobinson commented Oct 6, 2017

jasontedor commented Oct 6, 2017

patrobinson commented Oct 6, 2017

patrobinson commented Oct 6, 2017

ywelsch commented Oct 6, 2017

tlrx commented Jan 9, 2018

TransportException when calling _snapshot/repo/_all on non-master #26906

TransportException when calling _snapshot/repo/_all on non-master #26906

Comments

patrobinson commented Oct 6, 2017

jasontedor commented Oct 6, 2017

patrobinson commented Oct 6, 2017

patrobinson commented Oct 6, 2017

ywelsch commented Oct 6, 2017

tlrx commented Jan 9, 2018

TransportException when calling `_snapshot/repo/_all` on non-master #26906

TransportException when calling `_snapshot/repo/_all` on non-master #26906