Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Opensearch SSL transport error, master not discovered or elected yet #54

Closed
alborotogarcia opened this issue Sep 20, 2021 · 22 comments
Labels
question Further information is requested

Comments

@alborotogarcia
Copy link
Contributor

Describe the bug
Can't reproduce default demo setup on kubernetes.

To Reproduce
Steps to reproduce the behavior:

  1. Install helm chart with defaults (optional) from https://github.com/opensearch-project/helm-charts
  2. Copy all configuration yaml from /usr/share/opensearch/plugins/opensearch-security/securityconfig to local
  3. Paste contents to securityConfig.config.data file templates
  4. See error
SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
[opensearch-cluster-master-0] master not discovered or elected yet

Expected behavior
Cluster gets GREEN state

Plugins
Please list all plugins currently enabled.

    cluster.name: opensearch-cluster

    # Bind to all interfaces because we don't know what IP address Docker will assign to us.
    network.host: 0.0.0.0

    # # minimum_master_nodes need to be explicitly set when bound on a public IP
    # # set to 1 to allow single node clusters
    discovery.zen.minimum_master_nodes: 1
    plugins:
      security:
        ssl:
          transport:
            pemcert_filepath: esnode.pem
            pemkey_filepath: esnode-key.pem
            pemtrustedcas_filepath: root-ca.pem
            enforce_hostname_verification: false
          http:
            enabled: false
            pemcert_filepath: esnode.pem
            pemkey_filepath: esnode-key.pem
            pemtrustedcas_filepath: root-ca.pem
        allow_unsafe_democertificates: true
        allow_default_init_securityindex: true
        authcz:
          admin_dn:
            - CN=kirk,OU=client,O=client,L=test, C=de
        audit.type: internal_opensearch
        enable_snapshot_restore_privilege: true
        check_snapshot_restore_write_privileges: true
        restapi:
          roles_enabled: ["all_access", "security_rest_api_access"]
        system_indices:
          enabled: true
          indices:
            [
              ".opendistro-alerting-config",
              ".opendistro-alerting-alert*",
              ".opendistro-anomaly-results*",
              ".opendistro-anomaly-detector*",
              ".opendistro-anomaly-checkpoints",
              ".opendistro-anomaly-detection-state",
              ".opendistro-reports-*",
              ".opendistro-notifications-*",
              ".opendistro-notebooks",
              ".opendistro-asynchronous-search-response*",
            ]

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@CEHENKLE CEHENKLE assigned CEHENKLE and unassigned CEHENKLE Sep 21, 2021
@peterzhuamazon peterzhuamazon transferred this issue from opensearch-project/OpenSearch Sep 21, 2021
@peterzhuamazon
Copy link
Member

Never seen this issue before from me, @DandyDeveloper @TheAlgo any idea on this issue from @alborotogarcia ?
Thanks.

@peterzhuamazon peterzhuamazon added the question Further information is requested label Sep 21, 2021
@DandyDeveloper
Copy link
Collaborator

DandyDeveloper commented Sep 22, 2021

That specific bug can be ignored (Insufficient buffer remaining for AEAD cipher fragment). Its a known thing in Search Guard: https://bugs.openjdk.java.net/browse/JDK-8221218

Shouldn't have any impact on the cluster working.

[opensearch-cluster-master-0] master not discovered or elected yet

Is this actually causing problem? You mention the cluster being green.

If you are just trying to run a single cluster;

    # # minimum_master_nodes need to be explicitly set when bound on a public IP
    # # set to 1 to allow single node clusters
    # discovery.zen.minimum_master_nodes: 1

    # Setting network.host to a non-loopback address enables the annoying bootstrap checks. "Single-node" mode disables them again.
    #discovery.type: single-node

Uncomment these it'll work.

If not, we need the full log.

@alborotogarcia
Copy link
Contributor Author

I meant green state as desired, not really reached unfortunately, as securityconfig doesn't get started
I've created a gist with the values.yaml and the full trace of the three nodes here , please could you take a look @DandyDeveloper ?
If there's something else I'm missing let me know :)

Thanks for the help @peterzhuamazon @DandyDeveloper !

@alborotogarcia
Copy link
Contributor Author

alborotogarcia commented Sep 22, 2021

FWIW @DandyDeveloper @peterzhuamazon , I forgot to mention, internal users and other configurations added work if they're kept in their volumes and I redeploy the helm chart one more time with no securityConfig.config.data. Including ldap users.

@smlx
Copy link
Contributor

smlx commented Sep 22, 2021

this seems to be the problem

opensearch-cluster-master-0 opensearch java.nio.file.FileSystemException: /usr/share/opensearch/data/nodes/0/.opensearch_temp_file: Read-only file system

@alborotogarcia
Copy link
Contributor Author

@smlx I see, since kubernetes version 1.9.6 and forth, volumeMounts behavior on secret, configMap, downwardAPI and projected have changed to Read-Only by default as stated here kubernetes/kubernetes#62099 But I don't understand why just leaving as the default chart template it doesn't complain about RO filesystem.. is it another UID that initiates the process? the current fsGroup is set to user 1000 and so it is set on #9
How can this be solved?

@DandyDeveloper
Copy link
Collaborator

DandyDeveloper commented Sep 24, 2021

@alborotogarcia

I just deployed locally with your exact values and its working for me and able to write to that directory.

[opensearch@opensearch-cluster-master-0 ~]$ cd data/
[opensearch@opensearch-cluster-master-0 data]$ ls -l
total 20
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 batch_metrics_enabled.conf
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 logging_enabled.conf
drwxrwxr-x 3 opensearch opensearch 4096 Sep 24 01:29 nodes
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 performance_analyzer_enabled.conf
-rw-rw-r-- 1 opensearch opensearch    5 Sep 24 01:29 rca_enabled.conf
[opensearch@opensearch-cluster-master-0 data]$ ls -l nodes/
total 4
drwxrwxr-x 3 opensearch opensearch 4096 Sep 24 01:39 0
[opensearch@opensearch-cluster-master-0 data]$ ls -l nodes/0
total 4
drwxrwxr-x 2 opensearch opensearch 4096 Sep 24 01:29 _state
-rw-rw-r-- 1 opensearch opensearch    0 Sep 24 01:29 node.lock

Edit: I had a bunch of info here that was redundant and incorrect. I misread volumes :)

What k8s version are you running? I'm running latest in my test cluster here.

@alborotogarcia
Copy link
Contributor Author

Sorry for the delay @DandyDeveloper, I had some issues with my IdP and had to spent time on it.. I am running a 3 node k3s cluster and yes I am aware that all config files are needed otherwise it will complain.. I run longhorn as storage class.. but IMHO I suspect that If I turn it to subpaths for each file mounts it may be less error prone.. as you said earlier it may affect to the folder that it gets mounted on.. will report back

@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 26, 2021

@DandyDeveloper We are running into what I perceive as the same or similar issue with the 1.0.0 charts with a similar config as @alborotogarcia, though we are using Keycloak as our idP.

Would you mind reviewing the permissions in the /usr/share/opensearch/plugins/opensearch-security folder? It appears the securityconfig is owned by root and not opensearch, which might be the cause.

-rw-r--r-- 1 opensearch opensearch  452868 Jul  8 22:32 saaj-impl-1.5.2.jar
drwxrwsrwt 3 root       opensearch     260 Sep 26 12:24 securityconfig
-rw-r--r-- 1 opensearch opensearch   41203 Jul  8 22:32 slf4j-api-1.7.25.jar
[opensearch@opensearch-cluster-master-0 securityconfig]$ ls -l
total 0
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:24 action_groups.yml -> ..data/action_groups.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:24 audit.yml -> ..data/audit.yml
lrwxrwxrwx 1 root opensearch 17 Sep 26 12:24 config.yml -> ..data/config.yml
lrwxrwxrwx 1 root opensearch 25 Sep 26 12:24 internal_users.yml -> ..data/internal_users.yml
lrwxrwxrwx 1 root opensearch 19 Sep 26 12:24 nodes_dn.yml -> ..data/nodes_dn.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:24 roles.yml -> ..data/roles.yml
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:24 roles_mapping.yml -> ..data/roles_mapping.yml
lrwxrwxrwx 1 root opensearch 18 Sep 26 12:24 tenants.yml -> ..data/tenants.yml
lrwxrwxrwx 1 root opensearch 20 Sep 26 12:24 whitelist.yml -> ..data/whitelist.yml

Not sure if this is the issue but the content of each of the above files looks correct as per these examples.

Of note, we have an older version of the OpenSearch charts that do work using the same values file but with the material difference being this block.

@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 26, 2021

It seems my previous assumption is incorrect. Applying an older version of the OpenSearch Helm chart with the same values file works even with the same folder and file permissions as above.

-rw-r--r-- 1 opensearch opensearch  452868 Jul  8 22:32 saaj-impl-1.5.2.jar
drwxrwsrwt 3 root       opensearch     260 Sep 26 12:44 securityconfig
-rw-r--r-- 1 opensearch opensearch   41203 Jul  8 22:32 slf4j-api-1.7.25.jar
[opensearch@opensearch-cluster-master-0 securityconfig]$ ls -l
total 0
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:44 action_groups.yml -> ..data/action_groups.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:44 audit.yml -> ..data/audit.yml
lrwxrwxrwx 1 root opensearch 17 Sep 26 12:44 config.yml -> ..data/config.yml
lrwxrwxrwx 1 root opensearch 25 Sep 26 12:44 internal_users.yml -> ..data/internal_users.yml
lrwxrwxrwx 1 root opensearch 19 Sep 26 12:44 nodes_dn.yml -> ..data/nodes_dn.yml
lrwxrwxrwx 1 root opensearch 16 Sep 26 12:44 roles.yml -> ..data/roles.yml
lrwxrwxrwx 1 root opensearch 24 Sep 26 12:44 roles_mapping.yml -> ..data/roles_mapping.yml
lrwxrwxrwx 1 root opensearch 18 Sep 26 12:44 tenants.yml -> ..data/tenants.yml
lrwxrwxrwx 1 root opensearch 20 Sep 26 12:44 whitelist.yml -> ..data/whitelist.yml
[opensearch@opensearch-cluster-master-0 securityconfig]$

I'll continue debugging.

@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 26, 2021

When using the latest version of the OpenSearch chart with the same values file as above, these are the exceptions we receive, which prevent securityadmin.sh from succeeding:

Error

opensearch [2021-09-26T12:26:49,688[][DEBUG[][o.o.s.c.ConfigurationRepository[] [opensearch-cluster-master-0[] Try to load config ...
opensearch [2021-09-26T12:26:49,689[][DEBUG[][o.o.s.c.ConfigurationRepository[] [opensearch-cluster-master-0[] security index not exists (yet)
opensearch [2021-09-26T12:26:49,689[][ERROR[][o.o.s.c.ConfigurationLoaderSecurity7[] [opensearch-cluster-master-0[] Exception while retrieving configuration for [INTERNALUSERS, ACTIONGROUPS, CONFIG, ROLES, ROLESMAPPING, TENANTS, NODESDN, WHITELIST, AUDIT[] (index=.opendistro_security)
opensearch org.opensearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
opensearch     at org.opensearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:203) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:189) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:72) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.get.TransportMultiGetAction.doExecute(TransportMultiGetAction.java:53) ~[opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:192) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:141) [opensearch-index-management-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.security.filter.SecurityFilter.apply0(SecurityFilter.java:234) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.filter.SecurityFilter.apply(SecurityFilter.java:154) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:99) [opensearch-performance-analyzer-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:190) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.support.TransportAction.execute(TransportAction.java:168) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.action.support.TransportAction.execute(TransportAction.java:96) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:99) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:88) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:428) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.client.support.AbstractClient.multiGet(AbstractClient.java:546) [opensearch-1.0.0.jar:1.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.loadAsync(ConfigurationLoaderSecurity7.java:211) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationLoaderSecurity7.load(ConfigurationLoaderSecurity7.java:102) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository.getConfigurationsFromIndex(ConfigurationRepository.java:375) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration0(ConfigurationRepository.java:321) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository.reloadConfiguration(ConfigurationRepository.java:306) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at org.opensearch.security.configuration.ConfigurationRepository$1.run(ConfigurationRepository.java:166) [opensearch-security-1.0.0.0.jar:1.0.0.0[]
opensearch     at java.lang.Thread.run(Thread.java:832) [?:?]

If this turns out to be a different issue than the issue that's the topic of this thread then I'll open a separate issue.

@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 26, 2021

I believe I found the issue or, at least, a workaround.

It appears the behavior of the majorVersion chart value changed from computing the value 7 to a value of 1 as per PR #21 merge. The workaround (for me, anyway) was to explicitly set the majorVersion in the values file to 7. i.e. majorVersion: 7

If the majorVersion attribute remains at its default of "", then the stateful set computes the env: stanza as:

- name: discovery.zen.minimum_master_nodes
  value: "1"
- name: discovery.zen.ping.unicast.hosts
  value: "opensearch-cluster-master-headless"

...rather than:

- name: cluster.initial_master_nodes
  value: "opensearch-cluster-master-0,opensearch-cluster-master-1,opensearch-cluster-master-2,"
- name: discovery.seed_hosts
  value: "opensearch-cluster-master-headless"

When using "discovery", the failures as per above are present and the security indexes are never created thus resulting in a red cluster status.

I am not very familiar with zen discovery but likely prefer it so new nodes can discovery the cluster state. However, it does not appear to work.

All thoughts and support are welcome.

UPDATE 1: It appears that we should be using discovery.seed_hosts rather than discovery.zen.ping.unicast.hosts as per SettingsBasedSeedHostsProvider.java.

UPDATE 2: I modified the StatefulSet to use discovery.seed_hosts and discovery.seed_providers and it still fails with majorVersion: "". Regardless, the workaround of specifying majorVersion: 7 still succeeds.

@alborotogarcia
Copy link
Contributor Author

@mprimeaux @DandyDeveloper I followed your suggestions, and here's what it worked for me

discovery.zen.minimum_master_nodes: 1
discovery.seed_hosts: "opensearch-cluster-master-headless"

and let majorVersion: ""

though I can't still login with my IdP

@TheAlgo
Copy link
Member

TheAlgo commented Sep 26, 2021

@mprimeaux @alborotogarcia I did not try out the config and installation as I am away from work for some time. But I am thinking out loud. Can this be something related to the core engine and not the chart? Maybe we might need to look at the security repository to understand more because ideally 7 should not fix the issue as OpenSearch starts with 1.

@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 26, 2021

@TheAlgo Here is logic. It appears to be, in part, an issue with the chart logic since we SHOULD be using different discovery env: values.

However, I agree with you that something deeper might be going on and so I will also research the security repository.

Related to the OpenDistro docs, it seems they are stale given the discovery attributes are discovery.zen.ping.unicast.hosts and discovery.seed_hosts as per this in the OpenSearch docs.

@mprimeaux
Copy link
Contributor

@alborotogarcia Thanks, mate. I will try your suggestion in the above reply.

@TheAlgo
Copy link
Member

TheAlgo commented Sep 26, 2021

@TheAlgo Here is logic. It appears to be, in part, an issue with the chart logic since we SHOULD be using different discovery env: values.

However, I agree with you that something deeper is likely going on and so I will also research the security repository.

Related to the OpenDistro docs, it seems they are stale given the discovery attributes are discovery.zen.ping.unicast.hosts and discovery.see_hosts as per this in the OpenSearch docs.

@mprimeaux We need to change the Helm logic for sure. As part of #21 we changed it at 1 place and did not change the others which seemed to breaking.

Coming to the OpenDistro docs , yes it is stale and we should follow the official OpenSearch docs as much as possible.

@mprimeaux
Copy link
Contributor

mprimeaux commented Sep 26, 2021

It appears the setting discovery.zen.minimum_master_nodes used in the env: stanza in OpenSearch at ZenDiscoveryUnitTests.java is being deprecated.

See the cluster settings logic here. I believe this might be a point of focus for the chart logic.

@alborotogarcia
Copy link
Contributor Author

@DandyDeveloper Also an ingress api upgrade from networking.k8s.io/v1beta1 to networking.k8s.io/v1 on kubernetes 1.22+

@mprimeaux
Copy link
Contributor

@alborotogarcia Coincidentally, I noticed this also, and, in addition it seems IngressClassName support was removed from the latest OpenSearch charts when migrated from the old repository. The ingress template should add this back as per Kubernetes 1.18+:

  {{- if and .Values.ingress.ingressClassName }}
  ingressClassName: {{ .Values.ingress.ingressClassName | quote }}
  {{- end }}

I will create a new issue and related PR today for the IngressClassName to be supported. But this is unrelated to this current issue.

@peterzhuamazon
Copy link
Member

Close this for now as it seems to be resolved by community.
Please feel free to re-open if you still have questions.

Thanks.

@dss010101
Copy link

dss010101 commented May 12, 2023

Im seeing this error when trying to stand up this infrastructure with the latest images: https://github.com/opensearch-project/data-prepper/tree/main/examples/log-ingestion

And using this fluentbit.conf:


[SERVICE]
  Flush                 1
  Log_Level             debug
  parsers_file          parsers.conf
  trace_error           On

[INPUT]
  name                  tail
  tag                   app_svc_logs
  refresh_interval      5
  path                  /var/log/test.log
  path_key              file_name
  read_from_head        true
  multiline.parser      python

[OUTPUT]
  Name opensearch
  Match *
  host opensearch
  tls On

It is not clear to me from this thread what modifications i can make to the docker-compose or other settings to resovle this. this is the error:

opensearch | [2023-05-12T23:08:01,314][WARN ][o.o.h.AbstractHttpServerTransport] [0248e6704e54] caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/xxx.xxx.xx.x:9200, remoteAddress=/xxx.xxx.xx.x:35794}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

8 participants