Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransportException when calling _snapshot/repo/_all on non-master #26906

Closed
patrobinson opened this issue Oct 6, 2017 · 5 comments
Closed

TransportException when calling _snapshot/repo/_all on non-master #26906

patrobinson opened this issue Oct 6, 2017 · 5 comments
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs

Comments

@patrobinson
Copy link

Elasticsearch version : Version: 5.4.3, Build: eed30a8/2017-06-22T00:34:03.743Z, JVM: 1.8.0_131

Plugins installed: s3-repository

JVM version:

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

OS version: Linux ip-10-200-7-5 4.4.0-1020-aws #29-Ubuntu SMP Wed Jun 14 15:54:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Executing curl localhost:9200/_snapshot/prod/_all on a non-master node results in:

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]"}],"type":"null_pointer_exception","reason":null},"status":500}

Expected it to return all snapshots. When run against the master node this returns fine.

Steps to reproduce:

Create an s3 repository.
Fill it with some 250 snapshots, each containing ~2,000 indices.
Attempt to call _snapshot/repository_name/_all on a non-master node

Provide logs (if relevant):

[2017-10-06T00:35:54,433][WARN ][r.suppressed             ] path: /_snapshot/prod/_all, params: {repository=prod, snapshot=_all}
org.elasticsearch.transport.RemoteTransportException: [8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]
Caused by: java.lang.NullPointerException
@patrobinson patrobinson changed the title TransportException when requesting calling _snapshot/repo/_all on non-master TransportException when calling _snapshot/repo/_all on non-master Oct 6, 2017
@jasontedor
Copy link
Member

The node 10.200.7.5 will hopefully have a stack trace in its logs. If so, can you share it here? If not, can you send the request again with the URL parameter ?error_trace=true?

@patrobinson
Copy link
Author

No stack trace on the master node, here's the error_trace

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]",
                "stack_trace": "[[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]]; nested: RemoteTransportException[[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]]; nested: NullPointerException;
	at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:618)
	at org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:563)
	at org.elasticsearch.rest.BytesRestResponse.build(BytesRestResponse.java:144)
	at org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:101)
	at org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:92)
	at org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:58)
	at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:94)
	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$3.handleException(TransportMasterNodeAction.java:185)
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1050)
	at org.elasticsearch.transport.TcpTransport.lambda$handleException$18(TcpTransport.java:1478)
	at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110)
	at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1476)
	at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1468)
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1412)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.lang.Thread.run(Thread.java:748)
Caused by: RemoteTransportException[[8Q0KK9w][10.200.7.5:9300][cluster:admin/snapshot/get]]; nested: NullPointerException;
Caused by: java.lang.NullPointerException
"
            }
        ],
        "type": "null_pointer_exception",
        "reason": null,
        "stack_trace": "java.lang.NullPointerException
"
    },
    "status": 500
}

@patrobinson
Copy link
Author

My guess is this is because of the sheer size of the response body (over 10 million characters)

@ywelsch
Copy link
Contributor

ywelsch commented Oct 6, 2017

I think that you're encountering a bug fixed by #26127
Please upgrade to 5.6 and check if you still encounter the same issues.

@andyb-elastic andyb-elastic added the :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs label Oct 10, 2017
@tlrx
Copy link
Member

tlrx commented Jan 9, 2018

Closed by #26127

@tlrx tlrx closed this as completed Jan 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs
Projects
None yet
Development

No branches or pull requests

5 participants