Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client can't receive configurations larger than 4MiB #17

Closed
adriansr opened this issue Sep 23, 2020 · 19 comments
Closed

Client can't receive configurations larger than 4MiB #17

adriansr opened this issue Sep 23, 2020 · 19 comments
Labels
bug Something isn't working question Further information is requested Team:Elastic-Agent Label for the Agent team

Comments

@adriansr
Copy link

adriansr commented Sep 23, 2020

When testing with huge pipelines for Filebeat, I get the following error in elastic-agent:

elastic-agent-client got error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (7608090 vs. 4194304)

This is a problem when working with large parsers from rsa2elk, or enabling a lot of the not-so-big integrations.

packages/snort% du -chs 0.1.0/dataset/log/agent/stream/*
3.3M    0.1.0/dataset/log/agent/stream/stream.yml.hbs
3.3M    0.1.0/dataset/log/agent/stream/tcp.yml.hbs
3.3M    0.1.0/dataset/log/agent/stream/udp.yml.hbs

Can we make this larger, say 16 or 32MiB, or configurable?

The setting would be grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(myMaxSize))

@adriansr adriansr changed the title Client can receive configurations larger than 4MiB Client can't receive configurations larger than 4MiB Sep 23, 2020
@adriansr adriansr added bug Something isn't working question Further information is requested labels Sep 23, 2020
@ph ph added the Team:Elastic-Agent Label for the Agent team label Sep 30, 2020
@ph
Copy link
Contributor

ph commented Sep 30, 2020

Oh good catch adriansr, @blakerouse @michalpristas Do you see any problem into increasing that, is the data compressed?

@ph ph assigned blakerouse and ruflin and unassigned blakerouse Sep 30, 2020
@ph
Copy link
Contributor

ph commented Sep 30, 2020

reassign to @ruflin I think this could expose a bigger issue in our system that just the protocol between Agent and the processes.

@blakerouse
Copy link
Contributor

@ph I don't see why it could not be increased. Seems like its a general default in GRPC which can be used for many use-cases so makes sense why the value is lower. Being that the communication between GRPC is secured through mutual-TLS the only way to connect is having the correct information, so I do not see a technical issue with increasing this.

@ph
Copy link
Contributor

ph commented Sep 30, 2020

@blakerouse thanks, I think we have to check elsewhere before doing that change.

@ruflin
Copy link
Contributor

ruflin commented Oct 15, 2020

I would like to push this forward. In general, I think we should have a sensible limit for max policy sizes to keep sanity in the system. This also helps to bubble up these kind of errors. As the above error bubbles up in GRPC, I wonder what our limit from the Kibana side is? I would expect that this is where it more becomes a problem when the config is sent over the network.

One thing we need, is to make sure this error is properly bubbled up to Kibana to give the user good feedback on what kind of error happened. In addition, we could make the max size a configuration option so in edge cases it could be overwritten.

@michalpristas @blakerouse @nchaulet

  • Do we have any limits today on the Kibana side?
  • Do we compress the policy sent down?

In parallel we should work with @adriansr to see if there are options to not have to ship down these massive parsers ;-)

@ph
Copy link
Contributor

ph commented Oct 15, 2020

Just for completeness, I've mentioned on slack, could we minify these JS parsers and could we compress them when we package the integration?

@andrewkroh
Copy link
Member

andrewkroh commented Oct 29, 2020

I ran a minifier on the JS files to get an idea of the potential savings. Looks like about 330K.

$ cat pipeline.js | minify --type=js > pipeline-min.js
$ cat liblogparser.js | minify --type=js > liblogparser-min.js
$ gzip -k pipeline.js 
$ gzip -k liblogparser.js
$ cat pipeline.js.gz | base64 > pipeline.js.gz.base64
$ cat liblogparser.js.gz | base64> liblogparser.js.gz.base64
$ ls -lah
total 20184
drwxr-xr-x@ 13 akroh  staff   416B Oct 29 09:32 .
drwxr-xr-x@  7 akroh  staff   224B Oct 22 09:58 ..
-rw-r--r--@  1 akroh  staff   849B Oct 22 09:58 input.yml
-rw-r--r--   1 akroh  staff    82K Oct 29 09:17 liblogparser-min.js
-rw-r--r--@  1 akroh  staff   114K Oct 22 09:58 liblogparser.js
-rw-r--r--@  1 akroh  staff    21K Oct 22 09:58 liblogparser.js.gz
-rw-r--r--   1 akroh  staff    28K Oct 29 09:32 liblogparser.js.gz.base64
drwxr-xr-x   2 akroh  staff    64B Oct 29 09:15 out
-rw-r--r--   1 akroh  staff   3.2M Oct 29 09:28 pipeline-base64.js
-rw-r--r--   1 akroh  staff   2.1M Oct 29 09:16 pipeline-min.js
-rw-r--r--@  1 akroh  staff   2.4M Oct 22 09:58 pipeline.js
-rw-r--r--@  1 akroh  staff   423K Oct 22 09:58 pipeline.js.gz
-rw-r--r--   1 akroh  staff   565K Oct 29 09:31 pipeline.js.gz.base64

@ph
Copy link
Contributor

ph commented Oct 29, 2020

@jfsiii Hey that's the integration I was thinking yesterday.

@ph
Copy link
Contributor

ph commented Oct 29, 2020

@andrewkroh Could we gzip them?

winterfell~/tmp/exp(:|✔) % ls -sh
total 440K
 24K liblogparser.js.gz  416K pipeline.js.gz

@andrewkroh
Copy link
Member

I see two areas where we could optimize size.

  1. The full script gets included multiple times because it must be part of each streams' configuration. This compounds the size.
  • Offer a way to fetch the asset separately from the config. Configs would no longer need to inline the asset.
  • Allow the script to be included once and reference it via YAML anchors everywhere it's used in the policy.
  1. The script content is large.
  • We could compress it, base64 encode it, and include it in the YAML. But something must be responsible for decompressing it on the client (either the agent before passing it to the beat or the beat itself).
  • The agent could compress the whole policy before sending it over GRPC. This would be useful anyways since policies can grow as more integrations are enabled.

@ph ph unassigned ruflin Oct 29, 2020
@ruflin
Copy link
Contributor

ruflin commented Oct 30, 2020

If we switch to the minifier approach now, would things work again as a short term solution?

@andrewkroh

  • 1: Is each stream here using the same input? Could the script be put on the input level?
  • 2: I'm less worried about Agent -> Process communication as it is all local than Fleet -> Agent communication. If we do compression, we should do it already on the Fleet side.

@andrewkroh
Copy link
Member

If we switch to the minifier approach now, would things work again as a short term solution?

No, I think the savings on the minified version are not enough. For the short term we can delay shipping the affected packages.

1: Is each stream here using the same input? Could the script be put on the input level?

They are different input types so that it can accept files, udp, or tcp logs. But other than the input type the rest of the configuration attached to the input is duplicated (like the same fields, fields_under_root, and processors which includes the large script).

@ruflin
Copy link
Contributor

ruflin commented Nov 3, 2020

Unfortunately at the moment we only support grouping of configs per input type. I wonder if there is a "low hanging fruit" we could solve this with?

@urso With your input grouping proposal, I assume the above would be possible?

@urso
Copy link

urso commented Nov 3, 2020

With your input grouping proposal, I assume the above would be possible?

Yes, it could help, as all the settings like fields, fields_under_root, and processors could become shared settings. E.g. the config could become:

- id: ...
  defaults: // however we name it :)
    fields: ...
    fields_under_root: true
    processors:
      ...
  streams:
  - type: logs
    paths: ...
  - type: udp
    ...
  - type: tcp
    ...

But this requires Fleet to actually construct this kind of config.

In case we have loads of redundancy we might also consider some 'dedup' support, so we can reference configurations that are common o multiple configured inputs. E.g. if we find 2 datastreams configured, but with different namespace we might end up with a redundant configuration. Besides compression/minification we could provide common/default config blocks like:

defaults:
  integration_with_big_config:
    setting1: ...
    setting2: ...

inputs:
- ...
  use_defaults: integration_with_big_config
- ...
  use_defaults: integration_with_big_config

All in all it sounds like the main issue is that we have too big/complex processing chain we push to the Beat. We should try to replace those with Ingest Pipelines instead.

@ph
Copy link
Contributor

ph commented Nov 3, 2020

I think @urso is on something.

What he proposes is partially supported by the local vars provider from composable inputs. I say partially because from package you cannot define providers related data.

providers:
  local:
    vars:
      foo: bar

inputs:
- ...
  value: ${foo}

@ruflin
Copy link
Contributor

ruflin commented Nov 4, 2020

@ph Interesting idea, so the processor content could be used as a variable?

Overall I agree with @urso that long term we must solve the root cause which is shipping down processing rules ...

@ph
Copy link
Contributor

ph commented Nov 4, 2020

@ruflin What I've described is already working today.

But this requires more long term thinking, maybe by a custom fleet provider where you can define namespace. I was just throwing ideas.

++ on solving the root cause.

@ph ph removed the v7.12.0 label Nov 5, 2020
@ph
Copy link
Contributor

ph commented Nov 5, 2020

We have discussed this over zoom and we decided the following:

  • Sending 4mb configuration is way too big.
  • Maybe the package is an outlier.
  • @andrewkroh and the security team will look for either ingest pipeline improvement or out of band delivery.
  • We would keep 4mb for now.

@ph
Copy link
Contributor

ph commented Nov 19, 2020

@andrewkroh @adriansr I am going to close this issue, we don't consider this change to be the right solution for this problem, if you take any suggestion above into another proposal we are happy to discuss it with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

6 participants