Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [class_not_found_exception] during rolling upgrade on security enabled cluster #1259

Closed
skkosuri-amzn opened this issue Jun 16, 2021 · 25 comments

Comments

@skkosuri-amzn
Copy link
Contributor

Describe the bug

curl https://localhost:9202/_cluster/settings?pretty -u admin:admin --insecure
{
"error" : {
"root_cause" : [
{
"type" : "class_not_found_exception",
"reason" : "class_not_found_exception: org.opensearch.security.user.User"
}
],
"type" : "exception",
"reason" : "java.lang.ClassNotFoundException: org.opensearch.security.user.User",
"caused_by" : {
"type" : "class_not_found_exception",
"reason" : "class_not_found_exception: org.opensearch.security.user.User"
}
},
"status" : 500
}

To Reproduce
Steps to reproduce the behavior:

  1. Setup 3 node ODFE cluster (yml changes 9300, 9301, 9302)
  2. Ingest data to an index.
  3. Setup few cluster settings.

curl -X PUT -H 'Content-Type:application/json' https://localhost:9200/_cluster/settings?pretty -u admin:admin --insecure -d'
{
"persistent" : {
"indices.recovery.max_bytes_per_sec" : "50mb"
}
}
'
curl -X PUT -H 'Content-Type:application/json' https://localhost:9200/_cluster/settings?pretty -u admin:admin --insecure -d'
{
"transient" : {
"indices.recovery.max_bytes_per_sec" : "10mb"
}
}
'

  1. Shutdown one node and replace with OpenSearch node.
    Copy the data folder and yml config to OpenSearch
  2. Do a GET /_cluster/settings on the OpenSearch node.
  3. The above class_not_found_exception is shown.

Host/Environment (please complete the following information):
Linux AL2

Additional context

  1. After OpenSearch node is added, I see that OpenSearch-security shard is not initialized on OpenSearch node. That could be the main reason for this error.

curl https://localhost:9200/_cluster/health?pretty -u admin:admin --insecure
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 8,
"active_shards" : 16,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 1,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 94.11764705882352
}

@skkosuri-amzn
Copy link
Contributor Author

All calls to OpenSearch node will fail due to this.

@saratvemulapalli saratvemulapalli transferred this issue from opensearch-project/OpenSearch Jun 16, 2021
@saratvemulapalli
Copy link
Member

I did a take a look at it.
OpenSearch works as expected for rolling upgrade and seems like security plugin some how cannot find the user information on the new node.

Moving this to security to understand the problem.

@dblock dblock added bug Something isn't working Severity-Critical v1.0.0 labels Jun 16, 2021
@cliu123
Copy link
Member

cliu123 commented Jun 17, 2021

We were able to reproduce the issue. Investigating the root cause.

@cliu123
Copy link
Member

cliu123 commented Jun 17, 2021

The issue only happens in mixed cluster scenario(Cluster has both ES and OpenSearch nodes). The renamed namespace(opensearch) is not recognized by the old node(ODFE).
In mixed cluster, when the new node receives a REST request, it sends a transport request to other nodes(including old and new nodes), and the opensearch name space sent as a part of the transport request cannot be deserialized on the old node, which causes the error.
By reproducing the issue, the issue doesn't happen when REST requests are send to old nodes.
There was a discussion on the other issue.

@dblock
Copy link
Member

dblock commented Jun 17, 2021

How do we fix this? cc: @nknize

@cliu123
Copy link
Member

cliu123 commented Jun 17, 2021

The issue only happens in mixed cluster. With all nodes upgraded to OpenSearch from ES, the issue doesn't happen.

@saratvemulapalli
Copy link
Member

The issue only happens in mixed cluster. With all nodes upgraded to OpenSearch from ES, the issue doesn't happen.

This definitely helps. Worst case we could say the new nodes are not reachable during the upgrade but after all the nodes are upgraded it works.

On the other hand we need to understand why the classname identifier is passed through in the transport while serializing and de-serializing.
Potentially we might see these symptoms in other plugins.

@andy840314
Copy link
Contributor

andy840314 commented Jun 18, 2021

I tested a mixed cluster with 1 Opensearch master node and 2 Elasticsearch data nodes.

node1: Opensearch (localhost:9200)
node2: Elasticsearch (localhost:9201)
node3: Elasticsearch (localhost:9202)

REST request to master node is working fine.

dev-dsk-ndylin-2b-47a4b34b % curl -k -i -XGET https://localhost:9200/_cat/nodes\?v=true -u admin:admin
HTTP/1.1 200 OK
content-type: text/plain; charset=UTF-8
content-length: 347

ip        heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1           42          24   0    0.00    0.02     0.00 dimr      *      node1
127.0.0.1                                                       dimr      -      node2
127.0.0.1                                                       dimr      -      node3

REST request to data node that involves transport request will fail

dev-dsk-ndylin-2b-47a4b34b % curl -k -i -XGET https://localhost:9201/_cat/nodes\?v=true -u admin:admin
HTTP/1.1 500 Internal Server Error
content-type: application/json; charset=UTF-8
content-length: 436

{"error":{"root_cause":[{"type":"class_not_found_exception","reason":"class_not_found_exception: com.amazon.opendistroforelasticsearch.security.user.User"}],"type":"exception","reason":"java.lang.ClassNotFoundException: com.amazon.opendistroforelasticsearch.security.user.User","caused_by":{"type":"class_not_found_exception","reason":"class_not_found_exception: com.amazon.opendistroforelasticsearch.security.user.User"}},"status":500}

As we can see the exception class_not_found_exception: com.amazon.opendistroforelasticsearch.security.user.User is different. Seems that REST requests thats involves transport request will always send to master node. In this case the data node is ODFE, the master node is Opensearch, so the Opensearch node can not recognize the class com.amazon.opendistroforelasticsearch.security.user.User

Stack trace

[2021-06-17T17:08:23,414][WARN ][r.suppressed             ] [node2] path: /_cat/nodes, params: {v=true}
org.elasticsearch.transport.RemoteTransportException: [node1][127.0.0.1:9300][cluster:monitor/state]
Caused by: org.elasticsearch.ElasticsearchException: java.lang.ClassNotFoundException: com.amazon.opendistroforelasticsearch.security.user.User
	at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:185) ~[?:?]
	at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:629) ~[?:?]
	at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:234) ~[?:?]
	at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:158) ~[?:?]
	at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:639) ~[?:?]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:85) ~[?:?]
	at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:220) ~[?:?]
	at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:120) ~[?:?]
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) ~[?:?]
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:713) ~[?:?]
	at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:155) ~[?:?]
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:130) ~[?:?]
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:95) ~[?:?]
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:87) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1533) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1282) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1329) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: class_not_found_exception: com.amazon.opendistroforelasticsearch.security.user.User
	at java.net.URLClassLoader.findClass(URLClassLoader.java:435) ~[?:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[?:?]
	at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:855) ~[?:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?]
	at java.lang.Class.forName0(Native Method) ~[?:?]
	at java.lang.Class.forName(Class.java:468) ~[?:?]
	at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:782) ~[?:?]
	at org.opensearch.security.support.Base64Helper$SafeObjectInputStream.resolveClass(Base64Helper.java:198) ~[?:?]
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2028) ~[?:?]
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1895) ~[?:?]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2202) ~[?:?]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1712) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:519) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:477) ~[?:?]
	at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:183) ~[?:?]
	at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:629) ~[?:?]
	at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:234) ~[?:?]
	at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:158) ~[?:?]
	at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:639) ~[?:?]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:85) ~[?:?]
	at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:220) ~[?:?]
	at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:120) ~[?:?]
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) ~[?:?]
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:713) ~[?:?]
	at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:155) ~[?:?]
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:130) ~[?:?]
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:95) ~[?:?]
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:87) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) ~[netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1533) ~[netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1282) ~[netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1329) ~[netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508) ~[netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447) ~[netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:832) ~[?:?]

The root cause comes from here
https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/support/Base64Helper.java#L176

@cliu123
Copy link
Member

cliu123 commented Jun 18, 2021

Another test case:
I tested a mixed cluster scenario during upgrade testing.
node 1: Master node, ES
node 2: Data node, ES
node 3: Data node, OpenSearch

On node 1 and node 2:

% curl -XGET https://localhost:9200/_cat/nodes -u 'admin:admin' -k
ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
172.22.0.3           41          48   0    0.10    0.05     0.01 dimr      *      elasticsearch-1
172.22.0.2                                                       dimr      -      elasticsearch-2
172.22.0.4            7          48   0    0.10    0.05     0.01 dimr      -      opensearch-3

Only on node 3, the error happens. And it is OPENSEARCH user class not found. In comparison, the error message in the test case above is OPENDISTRO user class not found:

org.opensearch.transport.RemoteTransportException: [elasticsearch-1][172.22.0.3:9300][cluster:monitor/state]
Caused by: org.opensearch.OpenSearchException: java.lang.ClassNotFoundException: org.opensearch.security.user.User
	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:185) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:627) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:235) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:158) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:639) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[?:?]
	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:207) ~[?:?]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:107) ~[?:?]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89) ~[?:?]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700) ~[?:?]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) ~[?:?]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) ~[?:?]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) ~[?:?]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) [netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) [netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) [netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) [netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) [netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.59.Final.jar:4.1.59.Final]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.opensearch.common.io.stream.NotSerializableExceptionWrapper: class_not_found_exception: org.opensearch.security.user.User
	at java.net.URLClassLoader.findClass(URLClassLoader.java:435) ~[?:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[?:?]
	at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:855) ~[?:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?]
	at java.lang.Class.forName0(Native Method) ~[?:?]
	at java.lang.Class.forName(Class.java:468) ~[?:?]
	at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:782) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper$SafeObjectInputStream.resolveClass(Base64Helper.java:198) ~[?:?]
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2028) ~[?:?]
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1895) ~[?:?]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2202) ~[?:?]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1712) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:519) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:477) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:183) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:627) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:235) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:158) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:639) ~[?:?]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:72) ~[?:?]
	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:207) ~[?:?]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:107) ~[?:?]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89) ~[?:?]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700) ~[?:?]
	at org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142) ~[?:?]
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117) ~[?:?]
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82) ~[?:?]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) ~[netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1518) ~[netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1267) ~[netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1314) ~[netty-handler-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[netty-codec-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.59.Final.jar:4.1.59.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578) ~[?:?]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) ~[?:?]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[?:?]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
	at java.lang.Thread.run(Thread.java:832) ~[?:?]

@skkosuri-amzn
Copy link
Contributor Author

Could you create a test build of OpenSearch security plugin with org.opendistro.security.user.User and try it.

@nknize
Copy link

nknize commented Jun 18, 2021

The renamed namespace(opensearch) is not recognized by the old node(ODFE).

Why would an old node even need to be aware of the new namespace at runtime? This is in the security plugin implementation so outside my knowledge sphere... is reflection being used somewhere where the namespace is being passed over the wire?

@cliu123
Copy link
Member

cliu123 commented Jun 18, 2021

  • Need to support transport request OpenSearch node -> ODFE node & ODFE node -> OpenSearch node

  • Trying to confirm 2 approaches to fix this issue:

  1. On OpenSearch node, replace org.opensearch with com.amazon.opendistroforelasticsearch in the namespace/classname right before serialization on the sender node. & On OpenSearch node, if the receiver node receives com.amazon.opendistroforelasticsearch, replace com.amazon.opendistroforelasticsearch with org.opensearch in deserialization.
  2. Revert back the namespace/classname renaming changes in the classes in security plugin involved in deserialization.

@nknize
Copy link

nknize commented Jun 18, 2021

is reflection being used somewhere

	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper$SafeObjectInputStream.resolveClass(Base64Helper.java:198) ~[?:?]
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2028) ~[?:?]
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1895) ~[?:?]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2202) ~[?:?]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1712) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:519) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:477) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:183) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:627) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:235) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:158) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:639) ~[?:?]

Oh..of course it is.

@nknize
Copy link

nknize commented Jun 18, 2021

On OpenSearch node, replace org.opensearch with com.amazon.opendistroforelasticsearch in the namespace/classname right before serialization on the sender node. & On OpenSearch node, if the receiver node receives com.amazon.opendistroforelasticsearch, replace com.amazon.opendistroforelasticsearch with org.opensearch in deserialization.

gah, the full fix would be to include replacing the InputStream variable here with StreamInput so transport layer compatibility is handled correctly across breaking changes. That might be a longer term fix, though. Either way we really need to wrap the core logic changes with backward and forward compatibility version checks.

@vrozov
Copy link
Contributor

vrozov commented Jun 18, 2021

On OpenSearch node, replace org.opensearch with com.amazon.opendistroforelasticsearch in the namespace/classname right before serialization on the sender node. & On OpenSearch node, if the receiver node receives com.amazon.opendistroforelasticsearch, replace com.amazon.opendistroforelasticsearch with org.opensearch in deserialization.

gah, the full fix would be to include replacing the InputStream variable here with StreamInput so transport layer compatibility is handled correctly across breaking changes. That might be a longer term fix, though. Either way we really need to wrap the core logic changes with backward and forward compatibility version checks.

@nknize see #1278 for POC that I did. The change from InputStream to StreamInput will break wire protocol for the security plugin between nodes. Unfortunately support within security plugin for versioning is quite limited and was not designed properly in the first place.

@vrozov
Copy link
Contributor

vrozov commented Jun 18, 2021

is reflection being used somewhere

	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper$SafeObjectInputStream.resolveClass(Base64Helper.java:198) ~[?:?]
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2028) ~[?:?]
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1895) ~[?:?]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2202) ~[?:?]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1712) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:519) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:477) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:183) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:627) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:235) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.ssl.transport.OpenDistroSecuritySSLRequestHandler.messageReceived(OpenDistroSecuritySSLRequestHandler.java:158) ~[?:?]
	at com.amazon.opendistroforelasticsearch.security.OpenDistroSecurityPlugin$7$1.messageReceived(OpenDistroSecurityPlugin.java:639) ~[?:?]

Oh..of course it is.

It is pure java serialization/deserialization issue with java deserialization using Reflection. Should there be any other java serialization incompatible changes, we would see the same problem.

@vrozov
Copy link
Contributor

vrozov commented Jun 18, 2021

  • Need to support transport request OpenSearch node -> ODFE node & ODFE node -> OpenSearch node
  • Trying to confirm 2 approaches to fix this issue:
  1. On OpenSearch node, replace org.opensearch with com.amazon.opendistroforelasticsearch in the namespace/classname right before serialization on the sender node. & On OpenSearch node, if the receiver node receives com.amazon.opendistroforelasticsearch, replace com.amazon.opendistroforelasticsearch with org.opensearch in deserialization.
  2. Revert back the namespace/classname renaming changes in the classes in security plugin involved in deserialization.

@cliu123 it would be good to give me a credit when presenting option 1. For the open source projects it is a common practice to wait for an author to present a solution or at least to give the author a credit.

@vrozov
Copy link
Contributor

vrozov commented Jun 21, 2021

Fixed by #1278. If not, please reopen.

@vrozov vrozov closed this as completed Jun 21, 2021
@skkosuri-amzn
Copy link
Contributor Author

Fixed by #1278. If not, please reopen.

Will do.

@skkosuri-amzn
Copy link
Contributor Author

skkosuri-amzn commented Jun 22, 2021

Re-opening this.

[2021-06-21T21:11:16,444][WARN ][o.e.a.b.TransportShardBulkAction] [node-0] [[alerting-test][0]] failed to perform indices:data/write/bulk[s] on replica [alerting-test][0], node[TgIrGyPSS4-PyQfXhw6I9Q], [R], s[STARTED], a[id=WusIBP8eSS2gZnoEqQuwBw]
org.elasticsearch.transport.RemoteTransportException: [node-1][127.0.0.1:9301][indices:data/write/bulk[s][r]]
Caused by: org.elasticsearch.ElasticsearchException: java.lang.ClassNotFoundException: com.amazon.opendistroforelasticsearch.security.user.User
	at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:185) ~[?:?]
	at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:629) ~[?:?]
	at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:234) ~[?:?]
	at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:176) ~[?:?]
	at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:77) ~[?:?]
	at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:158) ~[?:?]
	at org.opensearch.security.OpenSearchSecurityPlugin$7$1.messageReceived(OpenSearchSecurityPlugin.java:639) ~[?:?]
	at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:64) ~[?:?]
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:85) ~[?:?]
	at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:220) ~[?:?]
	at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:120) ~[?:?]
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) ~[?:?]
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:713) ~[?:?]
	at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:155) ~[?:?]
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:130) ~[?:?]
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:95) ~[?:?]
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:87) ~[?:?]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1533) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1282) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1329) [netty-handler-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) [netty-codec-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:620) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:583) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-transport-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.49.Final.jar:4.1.49.Final]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: class_not_found_exception: com.amazon.opendistroforelasticsearch.security.user.User
	at java.net.URLClassLoader.findClass(URLClassLoader.java:435) ~[?:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[?:?]
	at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:855) ~[?:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?]
	at java.lang.Class.forName0(Native Method) ~[?:?]
	at java.lang.Class.forName(Class.java:427) ~[?:?]
	at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:762) ~[?:?]
	at org.opensearch.security.support.Base64Helper$SafeObjectInputStream.resolveClass(Base64Helper.java:198) ~[?:?]
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1980) ~[?:?]
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1866) ~[?:?]
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2159) ~[?:?]
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1685) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:499) ~[?:?]
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:457) ~[?:?]

@skkosuri-amzn
Copy link
Contributor Author

Looks like I dont have reopen permissions :-(

@andy840314 andy840314 reopened this Jun 22, 2021
@andy840314
Copy link
Contributor

Can you provide the test case?

@skkosuri-amzn
Copy link
Contributor Author

skkosuri-amzn commented Jun 22, 2021

Looks like this what happened:
node-0 is OpenDistro , node-1 is OpenSearch. (mixed cluster, rolling upgrade is in-progress).
I tried to index a doc on node-0 to test alerting, it went to [alerting-test][0] shard. [0] shard primary is on node-0 and replica on node-1.

@vrozov
Copy link
Contributor

vrozov commented Jun 22, 2021

Please double check that the fix from #1278 is applied on the node. The stack trace points to line 185 and it should be line 250 for the security plugin with the applied fix.

@skkosuri-amzn
Copy link
Contributor Author

Fix was missing one node. After adding the fix to that node, we are good. Closing this 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants