Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Request decompression was broken in 2.11 #10802

Closed
kinseii opened this issue Oct 20, 2023 · 18 comments
Closed

[Regression] Request decompression was broken in 2.11 #10802

kinseii opened this issue Oct 20, 2023 · 18 comments
Labels
bug Something isn't working

Comments

@kinseii
Copy link

kinseii commented Oct 20, 2023

Describe the bug

I updated OS from 2.9.0 to 2.11.0 and binary information appeared in the indexes:

[2023/10/20 09:53:38] [error] [output:opensearch:opensearch.0] HTTP status=400 URI=/_bulk, response:
{"error":{"root_cause":[{"type":"json_parse_exception","reason":"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��Vmo�6\\u0010��_a�si�]�1\rs\\u001D%\\u0015�7XJ�n)\\u000C��<.\\u0012����4�\\u007F�Qr\\u001Ags\\u0002\\u000CӇ��{���!�\\u00147t8}\\u0018�\\u0018���t�SQ��7(c\r�)jq��������UT5��\\u0001e\\u0019��L\\u0003YFjL��=����{�\r�\\u000Fw\\u001F��t\\u0017_\\u0000ʇ+��؛L|�0��7݀\\u0016�\"; line: 1, column: 2]"}],"type":"json_parse_exception","reason":"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��Vmo�6\\u0010��_a�si�]�1\rs\\u001D%\\u0015�7XJ�n)\\u000C��<.\\u0012����4�\\u007F�Qr\\u001Ags\\u0002\\u000CӇ��{���!�\\u00147t8}\\u0018�\\u0018���t�SQ��7(c\r�)jq��������UT5��\\u0001e\\u0019��L\\u0003YFjL��=����{�\r�\\u000Fw\\u001F��t\\u0017_\\u0000ʇ+��؛L|�0��7݀\\u0016�\"; line: 1, column: 2]"},"status":400}

We use fluent-bit and it has a compression option: Compress gzip. Turning it off solved the problem. However, we can't permanently disable it because we have a lot of traffic and need to reduce its cost.

Expected behavior
Compression should work on version 2.11.0.

Plugins

opensearch-alerting 2.11.0.0
opensearch-anomaly-detection 2.11.0.0
opensearch-asynchronous-search 2.11.0.0
opensearch-cross-cluster-replication 2.11.0.0
opensearch-custom-codecs 2.11.0.0
opensearch-geospatial 2.11.0.0
opensearch-index-management 2.11.0.0
opensearch-job-scheduler 2.11.0.0
opensearch-knn 2.11.0.0
opensearch-ml 2.11.0.0
opensearch-neural-search 2.11.0.0
opensearch-notifications 2.11.0.0
opensearch-notifications-core 2.11.0.0
opensearch-observability 2.11.0.0
opensearch-performance-analyzer 2.11.0.0
opensearch-reports-scheduler 2.11.0.0
opensearch-security 2.11.0.0
opensearch-security-analytics 2.11.0.0
opensearch-sql 2.11.0.0
repository-s3 2.11.0

Host/Environment (please complete the following information):

  • OS: Azure K8s Service (AKS) v.1.26.6 with Ubuntu nodes v.22.04
  • Version: 2.11.0
@kinseii kinseii added bug Something isn't working untriaged labels Oct 20, 2023
@reta
Copy link
Collaborator

reta commented Oct 20, 2023

@kinseii thank you for reporting, @nknize I have only this suspect at the moment #9367

@peternied
Copy link
Member

Thanks for filing @kinseii - can you see if there are warnings or errors in the log at the time that occurs and post them here?

Can you provide more information about the how to reproduce the issue (request + response with full headers)? With the change that @reta mentioned OpenSearch will skip a decompressing requests that are unauthenticated to preserve system resources.

You can DM me @peternied on our slack instance if you'd prefer not to share the request data on this public issue.

@reta
Copy link
Collaborator

reta commented Oct 20, 2023

I think this would be the one suspect as well #10261, thanks @peternied

@kinseii
Copy link
Author

kinseii commented Oct 20, 2023

I want to clarify that the problem occurs when Merge_Log is enabled in fluent-bit:

    [FILTER]
        Name kubernetes
        Match kube.*
        Buffer_Size 256KB
        Merge_Log On
        Merge_Log_Log_Key log_parsed
        Merge_Log_Trim Off
        Keep_Log On

@kinseii
Copy link
Author

kinseii commented Oct 20, 2023

For some reason, the body of the log field contains binary information (most likely uncompressed).

@kinseii
Copy link
Author

kinseii commented Oct 20, 2023

Here is an example of a normal recording, compression on the fluent-bit side is enabled. The log field does not contain any json data. Everything works fine.

{
  "_index": "fluent-bit-***************-******-000021",
  "_id": "3eUoT4sB*****************",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2023-10-20T22:15:41.952Z",
    "cluster": "***********-********-*******",
    "environment": "development",
    "time": "2023-10-20T22:15:41.95244023Z",
    "kubernetes": {
      "annotations": {
        "checksum/config": "b2386bce5cde330898627a78e8b113a6c2a51433410a16b01419d090b21cfd54",
        "checksum/luascripts": "7f6f6d0f7c8dd33d93fe7c6a497183a8f217ae310734907e5c30fa7c0c68eecc"
      },
      "pod_id": "25051086-1bf4-41cf********************",
      "container_hash": "cr.fluentbit.io/fluent/fluent-bit@sha256:37eab2f80e4d1ae58b77b9be1662c1a0fc4b771b7006180e30b4581c3418b4f8",
      "labels": {
        "app_kubernetes_io/instance": "fluent-bit",
        "pod-template-generation": "2",
        "app_kubernetes_io/name": "fluent-bit",
        "controller-revision-hash": "66f7c55b4f"
      },
      "container_image": "cr.fluentbit.io/fluent/fluent-bit:2.1.5",
      "docker_id": "ec1bff137a662e9d2589307be16ce06b598190679d99b8c6ea21e03638516221",
      "container_name": "fluent-bit",
      "pod_name": "fluent-bit-2gf2l",
      "host": "aks-stdd8**************-12439882-vmss000007",
      "namespace_name": "service"
    },
    "stream": "stderr",
    "_p": "F",
    "log": ""
  },
  "fields": {
    "@timestamp": [
      "2023-10-20T22:15:41.952Z"
    ],
    "time": [
      "2023-10-20T22:15:41.952Z"
    ]
  },
  "sort": [
    1697840141952
  ]
}

@kinseii
Copy link
Author

kinseii commented Oct 20, 2023

And here's an example of a record with binary data. Compression on fluent-bit is enabled: Compress gzip. The log field contains json with binary information (most likely with compressed data).

{
  "_index": "fluent-bit-***************-*****-000021",
  "_id": "3OUoT4sBp*************",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2023-10-20T22:15:41.952Z",
    "environment": "development",
    "cluster": "****************-***********-***********",
    "log_parsed": {
      "status": 400,
      "error": {
        "root_cause": [
          {
            "type": "json_parse_exception",
            "reason": "Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��_o�6\\u0010���)\\u0008c��&Q��\\u0004EѬu:\\u0003�[4�\\u001E\\u0016\\u0004\\u0006ER�\\u0016Y�H�m\\u0010��(�qڤE\\u001F+�\\u000C\\u0004��\\u0013���pw\\u0011�ۉ0��jr|;YT�T�'Ǔ\\u0017eݩ������zݚ�[yRm^N��~���j���-_�A�\\u00064�����9��!;��\\u001F\\u0006�?����;�*\\u0003B\\u000F+����:� ��Met���@\"; line: 1, column: 2]"
          }
        ],
        "type": "json_parse_exception",
        "reason": "Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\r, \\n, \\t) is allowed between tokens\n at [Source: (byte[])\"\\u001F�\\u0008\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000��_o�6\\u0010���)\\u0008c��&Q��\\u0004EѬu:\\u0003�[4�\\u001E\\u0016\\u0004\\u0006ER�\\u0016Y�H�m\\u0010��(�qڤE\\u001F+�\\u000C\\u0004��\\u0013���pw\\u0011�ۉ0��jr|;YT�T�'Ǔ\\u0017eݩ������zݚ�[yRm^N��~���j���-_�A�\\u00064�����9��!;��\\u001F\\u0006�?����;�*\\u0003B\\u000F+����:� ��Met���@\"; line: 1, column: 2]"
      }
    },
    "time": "2023-10-20T22:15:41.95243462Z",
    "kubernetes": {
      "annotations": {
        "checksum/config": "b2386bce5cde330898627a78e8b113a6c2a51433410a16b01419d090b21cfd54",
        "checksum/luascripts": "7f6f6d0f7c8dd33d93fe7c6a497183a8f217ae310734907e5c30fa7c0c68eecc"
      },
      "pod_id": "25051086-1bf4-41cf-********************",
      "container_hash": "cr.fluentbit.io/fluent/fluent-bit@sha256:37eab2f80e4d1ae58b77b9be1662c1a0fc4b771b7006180e30b4581c3418b4f8",
      "labels": {
        "app_kubernetes_io/instance": "fluent-bit",
        "pod-template-generation": "2",
        "app_kubernetes_io/name": "fluent-bit",
        "controller-revision-hash": "66f7c55b4f"
      },
      "container_image": "cr.fluentbit.io/fluent/fluent-bit:2.1.5",
      "docker_id": "ec1bff137a662e9d2589307be16ce06b598190679d99b8c6ea21e03638516221",
      "container_name": "fluent-bit",
      "pod_name": "fluent-bit-2gf2l",
      "host": "aks-stdd8**************-12439882-vmss000007",
      "namespace_name": "service"
    },
    "stream": "stderr",
    "_p": "F",
    "log": "{\"error\":{\"root_cause\":[{\"type\":\"json_parse_exception\",\"reason\":\"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\\\r, \\\\n, \\\\t) is allowed between tokens\\n at [Source: (byte[])\\\"\\\\u001F�\\\\u0008\\\\u0000\\\\u0000\\\\u0000\\\\u0000\\\\u0000\\\\u0000��_o�6\\\\u0010���)\\\\u0008c��&Q��\\\\u0004EѬu:\\\\u0003�[4�\\\\u001E\\\\u0016\\\\u0004\\\\u0006ER�\\\\u0016Y�H�m\\\\u0010��(�qڤE\\\\u001F+�\\\\u000C\\\\u0004��\\\\u0013���pw\\\\u0011�ۉ0��jr|;YT�T�'Ǔ\\\\u0017eݩ������zݚ�[yRm^N��~���j���-_�A�\\\\u00064�����9��!;��\\\\u001F\\\\u0006�?����;�*\\\\u0003B\\\\u000F+����:� ��Met���@\\\"; line: 1, column: 2]\"}],\"type\":\"json_parse_exception\",\"reason\":\"Illegal character ((CTRL-CHAR, code 31)): only regular white space (\\\\r, \\\\n, \\\\t) is allowed between tokens\\n at [Source: (byte[])\\\"\\\\u001F�\\\\u0008\\\\u0000\\\\u0000\\\\u0000\\\\u0000\\\\u0000\\\\u0000��_o�6\\\\u0010���)\\\\u0008c��&Q��\\\\u0004EѬu:\\\\u0003�[4�\\\\u001E\\\\u0016\\\\u0004\\\\u0006ER�\\\\u0016Y�H�m\\\\u0010��(�qڤE\\\\u001F+�\\\\u000C\\\\u0004��\\\\u0013���pw\\\\u0011�ۉ0��jr|;YT�T�'Ǔ\\\\u0017eݩ������zݚ�[yRm^N��~���j���-_�A�\\\\u00064�����9��!;��\\\\u001F\\\\u0006�?����;�*\\\\u0003B\\\\u000F+����:� ��Met���@\\\"; line: 1, column: 2]\"},\"status\":400}"
  },
  "fields": {
    "@timestamp": [
      "2023-10-20T22:15:41.952Z"
    ],
    "time": [
      "2023-10-20T22:15:41.952Z"
    ]
  },
  "sort": [
    1697840141952
  ]
}

@kinseii kinseii changed the title [BUG] After update from 2.9.0 to 2.11.0, OpenSearch is unable to work with gzip compressed data anymore [BUG] After update from 2.9.0 to 2.11.0, with fluent-bit compression enabled, binary information (compressed data) in the log field Oct 20, 2023
@kinseii
Copy link
Author

kinseii commented Oct 20, 2023

I am willing to provide any information you need, please specify what you will need, as I myself have no idea how compression works in OpenSearch.

@cwperks
Copy link
Member

cwperks commented Oct 23, 2023

Thank you for providing the configuration @kinseii. I was able to reproduce the issue and found the likely culprit. In 2.11 a change was made to keep unauthenticated request bodies compressed to avoid a tax of decompressing the request bodies. In opensearch-project/security#3418, the decompressor is replaced with a subclass of the Netty HttpContentDecompressor that overrides the content encoding if the request body should remain compressed. The Decompressor in that PR adds a @Sharable annotation and uses the same instance in multiple channels, but should not since its a stateful handler.

Thank you for reporting this issue.

Link to PR to address the issue: opensearch-project/security#3583

@dblock
Copy link
Member

dblock commented Oct 24, 2023

Linking #10747 which looks like the same problem.

@dblock
Copy link
Member

dblock commented Oct 24, 2023

@cwperks Is there a workaround for users or do they have to wait for 2.12?

@cwperks
Copy link
Member

cwperks commented Oct 24, 2023

@dblock The workaround in 2.11.0 for the python client is to set http_compress = False when instantiating the python client. On the documentation site, the default for the value shows True: https://opensearch.org/docs/latest/clients/python-low-level/#connecting-to-opensearch

@peternied peternied changed the title [BUG] After update from 2.9.0 to 2.11.0, with fluent-bit compression enabled, binary information (compressed data) in the log field [Regression] Request decompression was broken in 2.11 Oct 24, 2023
@peternied
Copy link
Member

I think a fix should be included in a 2.11.1 patch release. @CEHENKLE @bbarani What do you think & what does the timeline around a patch release like this look like?

cwperks added a commit to opensearch-project/security that referenced this issue Oct 25, 2023
### Description

Resolves an issue with decompression that can lead to concurrent gzipped
requests failing. This removes the `@Sharable` annotation from the
`Netty4ConditionalDecompressor` and creates a new instance of the
decompressor on channel initialization.

`Netty4ConditionalDecompressor` is an `HttpContentDecompressor` which is
a subclass of `HttpContentDecoder` - a stateful handler. Netty docs on
`@Sharable` annotation:
https://netty.io/4.0/api/io/netty/channel/ChannelHandler.Sharable.html

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

- opensearch-project/OpenSearch#10802

### Testing

Tested by running OpenSearch w fluentbit and Merge_Log on. See files
below which can reproduce the issue from the linked error.

I opened this PR as draft pending an integration test to validate the
behavior.

`docker-compose.yml`

```
version: '3'
services:
  opensearch: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
    image: opensearchproject/opensearch:latest # Specifying the latest available image - modify if you want a specific version
    container_name: opensearch
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - node.name=opensearch # Name the node that will run in this container
      - discovery.type=single-node
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
    ulimits:
      memlock:
        soft: -1 # Set memlock to unlimited (no soft or hard limit)
        hard: -1
      nofile:
        soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data # Creates volume called opensearch-data1 and mounts it to the container
      # - /Users/craigperkins/Projects/OpenSearch/security/build/distributions/opensearch-security-2.11.0.0-SNAPSHOT.jar:/usr/share/opensearch/plugins/opensearch-security/opensearch-security-2.11.0.0.jar
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    networks:
      - opensearch-net # All of the containers will join the same Docker bridge network
  fluent-bit:
    image: fluent/fluent-bit
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
    depends_on:
      - opensearch
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:
```

`fluent-bit.conf`

```
[INPUT]
  Name dummy
  Dummy {"top": {".dotted": "value"}}

[OUTPUT]
  Name es
  Host opensearch
  Port 9200
  HTTP_User admin
  HTTP_Passwd admin
  Replace_Dots On
  Suppress_Type_Name On
  Compress gzip
  tls On
  tls.verify Off
  net.keepalive Off

[FILTER]
  Name kubernetes
  Match kube.*
  Buffer_Size 256KB
  Merge_Log On
  Keep_Log On
```

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
opensearch-trigger-bot bot pushed a commit to opensearch-project/security that referenced this issue Oct 25, 2023
### Description

Resolves an issue with decompression that can lead to concurrent gzipped
requests failing. This removes the `@Sharable` annotation from the
`Netty4ConditionalDecompressor` and creates a new instance of the
decompressor on channel initialization.

`Netty4ConditionalDecompressor` is an `HttpContentDecompressor` which is
a subclass of `HttpContentDecoder` - a stateful handler. Netty docs on
`@Sharable` annotation:
https://netty.io/4.0/api/io/netty/channel/ChannelHandler.Sharable.html

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

- opensearch-project/OpenSearch#10802

### Testing

Tested by running OpenSearch w fluentbit and Merge_Log on. See files
below which can reproduce the issue from the linked error.

I opened this PR as draft pending an integration test to validate the
behavior.

`docker-compose.yml`

```
version: '3'
services:
  opensearch: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
    image: opensearchproject/opensearch:latest # Specifying the latest available image - modify if you want a specific version
    container_name: opensearch
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - node.name=opensearch # Name the node that will run in this container
      - discovery.type=single-node
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
    ulimits:
      memlock:
        soft: -1 # Set memlock to unlimited (no soft or hard limit)
        hard: -1
      nofile:
        soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data # Creates volume called opensearch-data1 and mounts it to the container
      # - /Users/craigperkins/Projects/OpenSearch/security/build/distributions/opensearch-security-2.11.0.0-SNAPSHOT.jar:/usr/share/opensearch/plugins/opensearch-security/opensearch-security-2.11.0.0.jar
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    networks:
      - opensearch-net # All of the containers will join the same Docker bridge network
  fluent-bit:
    image: fluent/fluent-bit
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
    depends_on:
      - opensearch
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:
```

`fluent-bit.conf`

```
[INPUT]
  Name dummy
  Dummy {"top": {".dotted": "value"}}

[OUTPUT]
  Name es
  Host opensearch
  Port 9200
  HTTP_User admin
  HTTP_Passwd admin
  Replace_Dots On
  Suppress_Type_Name On
  Compress gzip
  tls On
  tls.verify Off
  net.keepalive Off

[FILTER]
  Name kubernetes
  Match kube.*
  Buffer_Size 256KB
  Merge_Log On
  Keep_Log On
```

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
(cherry picked from commit 499db78)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit to opensearch-project/security that referenced this issue Oct 25, 2023
### Description

Resolves an issue with decompression that can lead to concurrent gzipped
requests failing. This removes the `@Sharable` annotation from the
`Netty4ConditionalDecompressor` and creates a new instance of the
decompressor on channel initialization.

`Netty4ConditionalDecompressor` is an `HttpContentDecompressor` which is
a subclass of `HttpContentDecoder` - a stateful handler. Netty docs on
`@Sharable` annotation:
https://netty.io/4.0/api/io/netty/channel/ChannelHandler.Sharable.html

* Category (Enhancement, New feature, Bug fix, Test fix, Refactoring,
Maintenance, Documentation)

Bug fix

### Issues Resolved

- opensearch-project/OpenSearch#10802

### Testing

Tested by running OpenSearch w fluentbit and Merge_Log on. See files
below which can reproduce the issue from the linked error.

I opened this PR as draft pending an integration test to validate the
behavior.

`docker-compose.yml`

```
version: '3'
services:
  opensearch: # This is also the hostname of the container within the Docker network (i.e. https://opensearch-node1/)
    image: opensearchproject/opensearch:latest # Specifying the latest available image - modify if you want a specific version
    container_name: opensearch
    environment:
      - cluster.name=opensearch-cluster # Name the cluster
      - node.name=opensearch # Name the node that will run in this container
      - discovery.type=single-node
      - bootstrap.memory_lock=true # Disable JVM heap memory swapping
      - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # Set min and max JVM heap sizes to at least 50% of system RAM
    ulimits:
      memlock:
        soft: -1 # Set memlock to unlimited (no soft or hard limit)
        hard: -1
      nofile:
        soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
        hard: 65536
    volumes:
      - opensearch-data1:/usr/share/opensearch/data # Creates volume called opensearch-data1 and mounts it to the container
      # - /Users/craigperkins/Projects/OpenSearch/security/build/distributions/opensearch-security-2.11.0.0-SNAPSHOT.jar:/usr/share/opensearch/plugins/opensearch-security/opensearch-security-2.11.0.0.jar
    ports:
      - 9200:9200 # REST API
      - 9600:9600 # Performance Analyzer
    networks:
      - opensearch-net # All of the containers will join the same Docker bridge network
  fluent-bit:
    image: fluent/fluent-bit
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
    depends_on:
      - opensearch
    networks:
      - opensearch-net

volumes:
  opensearch-data1:
  opensearch-data2:

networks:
  opensearch-net:
```

`fluent-bit.conf`

```
[INPUT]
  Name dummy
  Dummy {"top": {".dotted": "value"}}

[OUTPUT]
  Name es
  Host opensearch
  Port 9200
  HTTP_User admin
  HTTP_Passwd admin
  Replace_Dots On
  Suppress_Type_Name On
  Compress gzip
  tls On
  tls.verify Off
  net.keepalive Off

[FILTER]
  Name kubernetes
  Match kube.*
  Buffer_Size 256KB
  Merge_Log On
  Keep_Log On
```

### Check List
- [ ] New functionality includes testing
- [ ] New functionality has been documented
- [x] Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and
signing off your commits, please check
[here](https://github.com/opensearch-project/OpenSearch/blob/main/CONTRIBUTING.md#developer-certificate-of-origin).

---------

Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Craig Perkins <[email protected]>
Signed-off-by: Peter Nied <[email protected]>
Co-authored-by: Peter Nied <[email protected]>
(cherry picked from commit 499db78)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@peternied
Copy link
Member

This issue has been fixed and is waiting on the next release to be available.

@CEHENKLE
Copy link
Member

I'll let @bbarani comment on timeline feasibility.

Quick clarifying question for me: is compression on by default?

@bbarani
Copy link
Member

bbarani commented Oct 26, 2023

@peternied @cwperks Can you please confirm if http_compress = False is the only workaround for users using Python clients? Does this impact only certain clients? Is there any other workaround available for other clients? Also, based on the discussion above, it looks like the workaround is not an optimal solution as well and it needs to be fixed in the server side. Am I right? I am trying to understand the severity of this issue to take a call on patch release.

@peternied
Copy link
Member

compression on by default

The compression setting is from the external http clients - I don't know if our OpenSearch clients have this enabled or not, but OpenSearch is called by the wealth of http clients out there with different default configurations and settings - I would recommend everyone enable compression on their clients by default (sans this issue).

Does this impact only certain clients?

All clients are impacted which includes 3rd party clients. The way to enable/disable compression will vary, but the basic change will be required on all clients that call the OpenSearch cluster. This general approach is the only way to work around the problem.

workaround is not an optimal solution

Yes, disabling compression is a work around but this creates many classes of problems;

  1. Cost - bandwidth usage is going to go up which will increase latency, depending on hosting services this could be a direct monetary impact.
  2. Rejected payloads, a 99.9mb compressed payload would be allowed, but uncompressed it will be larger than default 100MiB limit and rejected.
  3. Unrelated deployment - the work around must be applied on the clients / companion services that call OpenSearch. There are many scenarios where this would be non-trivial and also need to be rolled back to avoid the above issues post fix.

This was referenced Nov 4, 2023
@kinseii
Copy link
Author

kinseii commented Nov 7, 2023

I have to admit that I am amazed at how complex, confusing and most importantly time consuming the workflow is and this coupled with a critical issue. I'm very grateful to the team for the product, but I would like to see critical issues resolved a little faster in the future)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants