Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master not discovered yet, this node has not previously joined a bootstrapped #68

Closed
ypogeios opened this issue Sep 27, 2021 · 13 comments · Fixed by #69
Closed

master not discovered yet, this node has not previously joined a bootstrapped #68

ypogeios opened this issue Sep 27, 2021 · 13 comments · Fixed by #69
Labels
bug Something isn't working untriaged Issues that have not yet been triaged

Comments

@ypogeios
Copy link

Deployment Error

On a Kubernetes Cluster Version 1.21.3 i tried to deploy OpenSearch with Helm.

All pods deploy.

When i run
kubectl exec -it opensearch-cluster-master-0 -- /bin/bash
i get at first the message: Defaulted container "opensearch" out of: opensearch, fsgroup-volume (init)

After that im in the Pod and i run
curl -XGET https://localhost:9200 -u 'admin:admin' --insecure
which gives following output: OpenSearch Security not initialized.[opensearch@opensearch-cluster-master-0 ~]$

The logs of Pods after command
kubectl logs opensearch-cluster-master-0
[2021-09-27T12:11:26,745][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-0] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{opensearch-cluster-master-0}{ghk7yaqUTmu3zU4qS3HEEQ}{SGHYqXbrRMS4EsLfPdNDkQ}{172.16.73.110}{172.16.73.110:9300}{dimr}, {opensearch-cluster-master-2}{HwXoAsOuQBescAuj1Iglbw}{IYGWumyjQquN8GOOA8tZZw}{172.16.149.176}{172.16.149.176:9300}{dimr}, {opensearch-cluster-master-1}{I6xUeOjVScuzp4lhdMMgdg}{qIJxNrfXTdG0oCqTYKI2pg}{172.16.100.221}{172.16.100.221:9300}{dimr}]; discovery will continue using [172.16.100.221:9300, 172.16.149.176:9300] from hosts providers and [{opensearch-cluster-master-0}{ghk7yaqUTmu3zU4qS3HEEQ}{SGHYqXbrRMS4EsLfPdNDkQ}{172.16.73.110}{172.16.73.110:9300}{dimr}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2021-09-27T12:11:26,841][ERROR][o.o.s.c.ConfigurationLoaderSecurity7] [opensearch-cluster-master-0] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT] (index=.opendistro_security)
org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:203) ~[opensearch-1.0.0.jar:1.0.0]
at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:189) ~[opensearch-1.0.0.jar:1.0.0]
at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:72) ~[opensearch-1.0.0.jar:1.0.0]
at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:53) ~[opensearch-1.0.0.jar:1.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:192) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:141) [opensearch-index-management-1.0.0.0.jar:1.0.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:99) [opensearch-performance-analyzer-1.0.0.0.jar:1.0.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:234) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:154) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:168) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:96) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:99) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:88) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:428) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:546) [opensearch-1.0.0.jar:1.0.0]
at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:211) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:102) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:375) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:321) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:306) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:166) [opensearch-security-1.0.0.0.jar:1.0.0.0]
at java.lang.Thread.run(Thread.java:832) [?:?]

@ypogeios ypogeios added bug Something isn't working untriaged Issues that have not yet been triaged labels Sep 27, 2021
@ypogeios ypogeios changed the title [BUG][Chart Name] master not discovered yet, this node has not previously joined a bootstrapped Sep 27, 2021
@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 27, 2021

@ypogeios We ran into this exact issue and had quite a discussion on it in in PR #54. See this comment, in particular, and the related reply chain.

@GaneshbabuRamamoorthy
Copy link

GaneshbabuRamamoorthy commented Sep 27, 2021

Hi @mprimeaux

I am also facing this same issue in helm charts. I have tried followed YOUR comment in PR #54 below is the way I have configured in values.yaml,

by setting the major version to "7" instead of "" and set the values of cluster initial master nodes & discovery seed hosts

majorVersion: "7"

# Allows you to add any config files in {{ .Values.opensearchHome }}/config
opensearchHome: /usr/share/opensearch
# such as opensearch.yml and log4j2.properties
config:
  opensearch.yml:
    cluster.name: opensearch-cluster

    # Bind to all interfaces because we don't know what IP address Docker will assign to us.
    network.host: 0.0.0.0

    # # minimum_master_nodes need to be explicitly set when bound on a public IP
    # # set to 1 to allow single node clusters
    # discovery.zen.minimum_master_nodes: 1
    cluster.initial_master_nodes: "opensearch-cluster-master-0,opensearch-cluster-master-1,opensearch-cluster-master-2"

    discovery.seed_hosts: "opensearch-cluster-master-headless"

But I am getting the below response in logs,

master-0 pod logs

[2021-09-27T18:28:30,859][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-0] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [opensearch-cluster-master-0, opensearch-cluster-master-1, opensearch-cluster-master-2] to bootstrap a cluster: have discovered [{opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{9M5jc0puQoKdT09wRGo-Ng}{127.0.0.1}{127.0.0.1:9300}{dimr}]; discovery will continue using [[fd74:ca9b:3a09:868c:172:18:0:474e]:9300, [fd74:ca9b:3a09:868c:172:18:0:42a4]:9300, [fd74:ca9b:3a09:868c:172:18:0:4baa]:9300] from hosts providers and [{opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{9M5jc0puQoKdT09wRGo-Ng}{127.0.0.1}{127.0.0.1:9300}{dimr}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2021-09-27T18:28:31,009][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-0] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:42a4]:9300]] completed handshake with [{opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{Xgt-CQFmS_qwshwPwJDOKg}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed

master-1 pod logs,

[2021-09-27T18:28:31,009][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-0] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:42a4]:9300]] completed handshake with [{opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{Xgt-CQFmS_qwshwPwJDOKg}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-1][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{9M5jc0puQoKdT09wRGo-Ng}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
^C
[LRB_346_PCAA@k8s-rmp-master-0 opensearch]$
[LRB_346_PCAA@k8s-rmp-master-0 opensearch]$
[LRB_346_PCAA@k8s-rmp-master-0 opensearch]$ kubectl logs opensearch-cluster-master-1 -f --tail=100
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-27T18:29:17,650][ERROR][o.o.s.s.t.SecuritySSLNettyTransport] [opensearch-cluster-master-1] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
        at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:369) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:312) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:307) ~[?:?]
        at sun.security.ssl.SSLTransport.decode(SSLTransport.java:133) ~[?:?]
        at sun.security.ssl.SSLEngineImpl.decode(SSLEngineImpl.java:736) ~[?:?]
        at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:691) ~[?:?]
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:506) ~[?:?]
        at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:482) ~[?:?]
        at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:637) ~[?:?]
        at io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:282) ~[netty-handler-4.1.59.Final.jar:4.1.59.Final]

master-2 pod logs,

[2021-09-27T18:38:43,916][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-2] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:42a4]:9300]] completed handshake with [{opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{gG8kpsvpRXWFgjHBVLYBcA}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-1][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{ZsmdzoPdTQmh3cVriGYcjA}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]

Below is the response I got when I exec into the pod,

[LRB_346_PCAA@k8s-rmp-master-0 opensearch]$ kubectl exec -it opensearch-cluster-master-0 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[opensearch@opensearch-cluster-master-0 ~]$ curl -k -u admin:admin https://opensearch-cluster-master-headless:9200
OpenSearch Security not initialized.[opensearch@opensearch-cluster-master-0 ~]$

Please correct me if I am doing anything wrong.

Attached the values.yaml for reference

values.txt

Thanks,
Ganeshbabu R

@mprimeaux
Copy link
Contributor

@GaneshbabuRamamoorthy Hi there. I am reviewing your values.yaml file. Would you please comment out the imageTag: "1.0.1" attribute, retry, and let me know?

@GaneshbabuRamamoorthy
Copy link

GaneshbabuRamamoorthy commented Sep 28, 2021

Hi @mprimeaux

Yes I tried comment out the imageTag and redeployed.

image: "X.X.X.X:8000/opensearch"
# override image tag, which is .Chart.AppVersion by default
#imageTag: ""
imagePullPolicy: "IfNotPresent"

Below is the responses I got from log pods,

master-0 pod logs

[2021-09-28T02:56:52,933][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-0] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:4baa]:9300]] completed handshake with [{opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-2][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-28T02:56:52,933][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-0] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:42a4]:9300]] completed handshake with [{opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{piblLGsrQHS4f408zWUpqA}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-1][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-28T02:56:53,778][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-0] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [opensearch-cluster-master-0, opensearch-cluster-master-1, opensearch-cluster-master-2] to bootstrap a cluster: have discovered [{opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}]; discovery will continue using [[fd74:ca9b:3a09:868c:172:18:0:474e]:9300, [fd74:ca9b:3a09:868c:172:18:0:42a4]:9300, [fd74:ca9b:3a09:868c:172:18:0:4baa]:9300] from hosts providers and [{opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2021-09-28T02:56:53,878][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-0] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:4baa]:9300]] completed handshake with [{opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-2][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]

master-1 pod logs

[2021-09-28T03:18:39,618][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-1] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:4baa]:9300]] completed handshake with [{opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-2][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{piblLGsrQHS4f408zWUpqA}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-28T03:18:40,612][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-1] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:4baa]:9300]] completed handshake with [{opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-2][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{piblLGsrQHS4f408zWUpqA}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]

master-2 pod logs,

[2021-09-28T03:21:01,309][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-2] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:42a4]:9300]] completed handshake with [{opensearch-cluster-master-1}{qnJjFQZnSxKmbXCQj8xLNQ}{piblLGsrQHS4f408zWUpqA}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-1][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-28T03:21:01,321][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-2] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:474e]:9300]] completed handshake with [{opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-0][127.0.0.1:9300] handshake failed. unexpected remote node {opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}
        at org.opensearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:405) ~[opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListener$4.onResponse(ActionListener.java:170) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:492) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.TransportService$5.onResponse(TransportService.java:482) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:67) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleResponse(SecurityInterceptor.java:288) [opensearch-security-1.0.1.0.jar:1.0.1.0]
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1207) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:266) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:260) [opensearch-1.0.0.jar:1.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:697) [opensearch-1.0.0.jar:1.0.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-28T03:21:02,028][WARN ][o.o.c.c.ClusterFormationFailureHelper] [opensearch-cluster-master-2] master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [opensearch-cluster-master-0, opensearch-cluster-master-1, opensearch-cluster-master-2] to bootstrap a cluster: have discovered [{opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}]; discovery will continue using [[fd74:ca9b:3a09:868c:172:18:0:4baa]:9300, [fd74:ca9b:3a09:868c:172:18:0:474e]:9300, [fd74:ca9b:3a09:868c:172:18:0:42a4]:9300] from hosts providers and [{opensearch-cluster-master-2}{IyFMJqhART6N6fYWhvGr5g}{MMUWJ6IUSEWouoiGxHcXxQ}{127.0.0.1}{127.0.0.1:9300}{dimr}] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2021-09-28T03:21:02,334][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-2] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:474e]:9300]] completed handshake with [{opensearch-cluster-master-0}{QKY3GiZeRlCKYWL2nnbdHg}{8TU40F-_RImVPkM5eiMlOg}{127.0.0.1}{127.0.0.1:9300}{dimr}] but followup connection failed

Attaching the helm package which I tried with values.yaml

opensearch.zip

Please let me know if I am doing anything wrong.

values-latest.txt

Thanks,
Ganeshbabu R

@GaneshbabuRamamoorthy
Copy link

Hi @mprimeaux

I have tested the same in my other k8s environment where it was working fine without any issues. I was able to see the cluster health status.

[opensearch@opensearch-cluster-master-0 ~]$ curl -k -u admin:admin https://opensearch-cluster-master:9200/_cluster/health?pretty=true
{
  "cluster_name" : "opensearch-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "discovered_master" : true,
  "active_primary_shards" : 1,
  "active_shards" : 3,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

This is my k8s nodes details where its working fine and without any issues

[root@k8master1 ganesh]# kubectl get nodes -o wide
NAME          STATUS     ROLES    AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
10.69.2.150   Ready      master   131d   v1.17.9   10.69.2.150   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.151   Ready      master   131d   v1.17.9   10.69.2.151   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.152   Ready      master   131d   v1.17.9   10.69.2.152   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.156   Ready      worker   131d   v1.17.9   10.69.2.156   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.157   Ready      worker   131d   v1.17.9   10.69.2.157   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.158   Ready      worker   131d   v1.17.9   10.69.2.158   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.159   NotReady   worker   131d   v1.17.9   10.69.2.159   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://Unknown
10.69.2.160   NotReady   worker   131d   v1.17.9   10.69.2.160   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.161   Ready      worker   131d   v1.17.9   10.69.2.161   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.162   Ready      worker   131d   v1.17.9   10.69.2.162   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.163   Ready      worker   131d   v1.17.9   10.69.2.163   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11
10.69.2.164   Ready      worker   131d   v1.17.9   10.69.2.164   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.11

and when I tested the same in other k8s environment with IPv6 cluster I am getting SSL error and below is the k8s node details,

[LRB_346_PCAA@k8s-rmp-master-0 kafka]$ sudo kubectl get nodes -o wide
NAME                            STATUS   ROLES    AGE    VERSION   INTERNAL-IP                        EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-rmp-master-0                Ready    master   503d   v1.20.5   fcff:69:70:0:faf2:1eff:fe89:b8f0   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-master-1                Ready    master   503d   v1.20.5   fcff:69:70:0:faf2:1eff:fe89:cce0   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-master-2                Ready    master   503d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:a9a0   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-0                Ready    <none>   494d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:d210   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-1                Ready    <none>   494d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:af70   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-10               Ready    <none>   370d   v1.20.5   fcff:69:70:0:faf2:1eff:fe65:2f68   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-11               Ready    <none>   341d   v1.20.5   fcff:69:70:0:faf2:1eff:feb1:e0c0   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-12               Ready    <none>   342d   v1.20.5   fcff:69:70:0:faf2:1eff:fe98:79c4   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-13               Ready    <none>   301d   v1.20.5   fcff:69:70::90                     <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-14               Ready    <none>   301d   v1.20.5   fcff:69:70::91                     <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-15               Ready    <none>   301d   v1.20.5   fcff:69:70::92                     <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-16               Ready    <none>   104d   v1.20.5   fcff:69:70::93                     <none>        CentOS Linux 7 (Core)   3.10.0-1160.21.1.el7.x86_64   robin://19.3.6
k8s-rmp-worker-17               Ready    <none>   71d    v1.20.5   fcff:69:70::94                     <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-18               Ready    <none>   69d    v1.20.5   fcff:69:70::95                     <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-2                Ready    <none>   494d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:a320   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-3                Ready    <none>   494d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:b2d0   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-4                Ready    <none>   494d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:bb40   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-5                Ready    <none>   441d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:aab0   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-6                Ready    <none>   467d   v1.20.5   fcff:69:70:0:faf2:1eff:fe84:b450   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-7                Ready    <none>   456d   v1.20.5   fcff:69:70:0:faf2:1eff:fe8a:cf0    <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-8                Ready    <none>   456d   v1.20.5   fcff:69:70:0:faf2:1eff:fe89:9b70   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6
k8s-rmp-worker-9                Ready    <none>   369d   v1.20.5   fcff:69:70:0:faf2:1eff:fe65:2dc4   <none>        CentOS Linux 7 (Core)   3.10.0-1062.el7.x86_64        robin://19.3.6

and below is the SSL error I am getting in the IPv6 k8s environment pod logs,

[2021-09-29T05:20:46,769][WARN ][o.o.d.HandshakingTransportAddressConnector] [opensearch-cluster-master-1] [connectToRemoteMasterNode[**[fd74:ca9b:3a09:868c:172:18:0:4f52]:9300**]] completed handshake with [{​​​​​opensearch-cluster-master-0}​​​​​{​​​​​iRiB2GoKTtiXE5uSCsb-pw}​​​​​{​​​​​_sJJFLKBToGMK5fK00ugHQ}​​​​​{​​​​​127.0.0.1}​​​​​{​​​​​127.0.0.1:9300}​​​​​{​​​​​dimr}​​​​​] but followup connection failed
org.opensearch.transport.ConnectTransportException: [opensearch-cluster-master-0][127.0.0.1:9300] handshake failed. unexpected remote node {​​​​​opensearch-cluster-master-1}​​​​​{​​​​​56Z5v568SMuOVmLk4QMhKg}​​​​​{​​​​​2VL7vr5rSjyoFdZyg5Pruw}​​​​​{​​​​​127.0.0.1}​​​​​{​​​​​127.0.0.1:9300}​​​​​{​​​​​dimr}​​​​​

master-1 IPv6 is resolving but followup connection with master-0 is failing bcs of the Ipv4 address (127.0.0.1) and this is the same behaviour for master-2 when its trying to talk to master-0

also I tried netstat inside the master-0 pod below is the response and its trying to connect IPv4

[opensearch@opensearch-cluster-master-0 ~]$ /tmp/netstat -anp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp6       0      0 :::9600                 :::*                    LISTEN      17/java
tcp6       0      0 :::9200                 :::*                    LISTEN      9/java
tcp6       0      0 :::9650                 :::*                    LISTEN      17/java
tcp6       0      0 :::9300                 :::*                    LISTEN      9/java
tcp6       0      0 127.0.0.1:42684         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:35322         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:60987         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:43450         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:40486         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:58494         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:34366         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:40803         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 127.0.0.1:44544         127.0.0.1:9300          TIME_WAIT   -
tcp6       0      0 fd74:ca9b:3a09:86:33982 fd74:ca9b:3a09:868:9300 TIME_WAIT   -

Is this errors were due to the k8s cluster setup with IPv6?

Is opensearch supported with IPv6?

please share your thoughts.

Thanks,
Ganeshbabu R

@smlx
Copy link
Contributor

smlx commented Sep 29, 2021

Is opensearch supported with IPv6?

Not sure about upstream Opensearch, but this chart doesn't. You'll need to use dual stack or IPv4 for now.

@mprimeaux
Copy link
Contributor

Thanks, @smlx. I was just about to reply with the dual stack option. FWIW, we currently just use single stack (IPv4).

@GaneshbabuRamamoorthy
Copy link

@smlx @mprimeaux

Thanks for your response.

So I tried deploying opendistro helm chart in the same IPv6 k8s cluster and I am getting the below responses in pod logs

pod status were showing as running

[root@k8s-rmp-master-0 opendistro-es]$ kubectl get pods -w -n elastic
NAME                                                  READY   STATUS    RESTARTS   AGE
elasticsearch-opendistro-es-client-7fbc9b877-h8jjx    1/1     Running   0          8m18s
elasticsearch-opendistro-es-data-0                    1/1     Running   0          8m18s
elasticsearch-opendistro-es-kibana-5c454cb6bc-k6j4t   1/1     Running   0          8m17s
elasticsearch-opendistro-es-master-0                  1/1     Running   0          8m18s

below is the logs,

data pod logs

[2021-09-30T07:00:19,318][WARN ][o.e.c.c.ClusterFormationFailureHelper] [elasticsearch-opendistro-es-data-0] master not discovered yet: have discovered [{elasticsearch-opendistro-es-data-0}{g9vpjnGhQQSZXm7TlzqHdA}{UNIUsbx3SEaXPduXxEXdIw}{127.0.0.1}{127.0.0.1:9300}{dr}]; discovery will continue using [[fd74:ca9b:3a09:868c:172:18:0:4fce]:9300] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0
[2021-09-30T07:00:19,550][WARN ][o.e.d.HandshakingTransportAddressConnector] [elasticsearch-opendistro-es-data-0] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:4fce]:9300]] completed handshake with [{elasticsearch-opendistro-es-master-0}{OqYEghRrTByIBtH3cdIulQ}{2P7FkFCdRbiAKWKVRlResw}{127.0.0.1}{127.0.0.1:9300}{mr}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [elasticsearch-opendistro-es-master-0][127.0.0.1:9300] handshake failed. unexpected remote node {elasticsearch-opendistro-es-data-0}{g9vpjnGhQQSZXm7TlzqHdA}{UNIUsbx3SEaXPduXxEXdIw}{127.0.0.1}{127.0.0.1:9300}{dr}
        at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:389) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:157) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:476) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.10.2.jar:7.10.2]
        at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityInterceptor$RestoringTransportResponseHandler.handleResponse(OpenDistroSecurityInterceptor.java:278) [opendistro_security-1.13.1.0.jar:1.13.1.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:247) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-30T07:00:19,565][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-opendistro-es-data-0] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
        at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:369) ~[?:?]

master pod logs,

[2021-09-30T06:49:44,180][INFO ][o.e.h.AbstractHttpServerTransport] [elasticsearch-opendistro-es-master-0] publish_address {127.0.0.1:9200}, bound_addresses {[::]:9200}
[2021-09-30T06:49:44,180][INFO ][o.e.n.Node               ] [elasticsearch-opendistro-es-master-0] started
[2021-09-30T06:49:44,181][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-opendistro-es-master-0] Node started
[2021-09-30T06:49:44,182][INFO ][c.a.o.s.c.ConfigurationRepository] [elasticsearch-opendistro-es-master-0] Will attempt to create index .opendistro_security and default configs if they are absent
[2021-09-30T06:49:44,182][INFO ][c.a.o.s.c.ConfigurationRepository] [elasticsearch-opendistro-es-master-0] Background init thread started. Install default config?: true
[2021-09-30T06:49:44,183][INFO ][c.a.o.s.OpenDistroSecurityPlugin] [elasticsearch-opendistro-es-master-0] 0 Open Distro Security modules loaded so far: []
[2021-09-30T06:49:44,211][INFO ][o.e.g.GatewayService     ] [elasticsearch-opendistro-es-master-0] recovered [0] indices into cluster_state
[2021-09-30T06:49:44,353][INFO ][o.e.c.m.MetadataCreateIndexService] [elasticsearch-opendistro-es-master-0] [.opendistro_security] creating index, cause [api], templates [], shards [1]/[1]
[2021-09-30T06:49:44,371][INFO ][o.e.c.r.a.AllocationService] [elasticsearch-opendistro-es-master-0] Cluster health status changed from [YELLOW] to [RED] (reason: [index [.opendistro_security] created]).
[2021-09-30T06:50:14,471][INFO ][c.a.o.s.c.ConfigurationRepository] [elasticsearch-opendistro-es-master-0] Index .opendistro_security created?: true
[2021-09-30T06:50:14,481][INFO ][c.a.o.s.s.ConfigHelper   ] [elasticsearch-opendistro-es-master-0] Will update 'config' with /usr/share/elasticsearch/plugins/opendistro_security/securityconfig/config.yml and populate it with empty doc if file missing and populateEmptyIfFileMissing=false
[2021-09-30T06:54:43,588][INFO ][c.a.o.j.s.JobSweeper     ] [elasticsearch-opendistro-es-master-0] Running full sweep
[2021-09-30T06:59:43,592][INFO ][c.a.o.j.s.JobSweeper     ] [elasticsearch-opendistro-es-master-0] Running full sweep

client pod logs

[2021-09-30T06:49:36,120][DEPRECATION][o.e.d.c.s.Settings       ] [elasticsearch-opendistro-es-client-7fbc9b877-h8jjx] [node.master] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version.
[2021-09-30T06:49:36,422][INFO ][o.e.b.BootstrapChecks    ] [elasticsearch-opendistro-es-client-7fbc9b877-h8jjx] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-09-30T06:49:44,911][WARN ][o.e.d.HandshakingTransportAddressConnector] [elasticsearch-opendistro-es-client-7fbc9b877-h8jjx] [connectToRemoteMasterNode[[fd74:ca9b:3a09:868c:172:18:0:4fce]:9300]] completed handshake with [{elasticsearch-opendistro-es-master-0}{OqYEghRrTByIBtH3cdIulQ}{2P7FkFCdRbiAKWKVRlResw}{127.0.0.1}{127.0.0.1:9300}{mr}] but followup connection failed
org.elasticsearch.transport.ConnectTransportException: [elasticsearch-opendistro-es-master-0][127.0.0.1:9300] handshake failed. unexpected remote node {elasticsearch-opendistro-es-client-7fbc9b877-h8jjx}{FJuEhVeiQ7OrP9re1Zrx5A}{A_lQTVXVSQWFQkQOsfj22A}{127.0.0.1}{127.0.0.1:9300}{ir}
        at org.elasticsearch.transport.TransportService.lambda$connectionValidator$5(TransportService.java:389) ~[elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.action.ActionListener$4.onResponse(ActionListener.java:157) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:476) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.TransportService$5.onResponse(TransportService.java:466) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-7.10.2.jar:7.10.2]
        at com.amazon.opendistroforelasticsearch.security.transport.OpenDistroSecurityInterceptor$RestoringTransportResponseHandler.handleResponse(OpenDistroSecurityInterceptor.java:278) [opendistro_security-1.13.1.0.jar:1.13.1.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.transport.InboundHandler.lambda$handleResponse$1(InboundHandler.java:247) [elasticsearch-7.10.2.jar:7.10.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:684) [elasticsearch-7.10.2.jar:7.10.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
[2021-09-30T06:49:45,002][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-opendistro-es-client-7fbc9b877-h8jjx] Exception during establishing a SSL connection: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
        at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
        at sun.security.ssl.TransportContext.fatal(TransportContext.java:369) ~[?:?]

checked the cluster status inside the master pod and below is the response I got,

[root@elasticsearch-opendistro-es-master-0 elasticsearch]# curl -k -u admin:admin https://elasticsearch-opendistro-es-client-service:9200
Open Distro Security not initialized.[root@elasticsearch-opendistro-es-master-0 elasticsearch]#
[root@elasticsearch-opendistro-es-master-0 elasticsearch]#
[root@elasticsearch-opendistro-es-master-0 elasticsearch]#
[root@elasticsearch-opendistro-es-master-0 elasticsearch]# curl -k -u admin:admin https://elasticsearch-opendistro-es-client-service:9200/_cluster/health?pretty=true
Open Distro Security not initialized.

Attaching the opendistro-es helm which I tried in my IPv6 environment and the same was working in IPv4 k8s cluster.

opendistro-es.zip

Does opendistro elasticsearch helm chart not supported IPv6?

Please share your thoughts.

Thanks,
Ganeshbabu R

@smlx
Copy link
Contributor

smlx commented Sep 30, 2021

Does opendistro elasticsearch helm chart not supported IPv6?

No, it currently does not.

@ashish1099
Copy link

the majorVersion: 7 fixed it for me :)

@mprimeaux
Copy link
Contributor

This appears to be an issue with auto-discovery in OpenSearch. Not sure if it is addressed in the latest release of OpenSearch but not an issue with the helm chart IMO.

@sastorsl
Copy link
Contributor

Something to do with #141 ?

@vkc12uec
Copy link

vkc12uec commented Sep 20, 2024

the majorVersion: 7 fixed it for me :)

What does this mean? We are using opendistro 7.10.2 and seeing this issue. @ashish1099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working untriaged Issues that have not yet been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants