Client can't receive configurations larger than 4MiB #17

adriansr · 2020-09-23T09:56:45Z

When testing with huge pipelines for Filebeat, I get the following error in elastic-agent:

elastic-agent-client got error: rpc error: code = ResourceExhausted desc = grpc: received message larger than max (7608090 vs. 4194304)

This is a problem when working with large parsers from rsa2elk, or enabling a lot of the not-so-big integrations.

packages/snort% du -chs 0.1.0/dataset/log/agent/stream/*
3.3M    0.1.0/dataset/log/agent/stream/stream.yml.hbs
3.3M    0.1.0/dataset/log/agent/stream/tcp.yml.hbs
3.3M    0.1.0/dataset/log/agent/stream/udp.yml.hbs

Can we make this larger, say 16 or 32MiB, or configurable?

The setting would be grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(myMaxSize))

The text was updated successfully, but these errors were encountered:

ph · 2020-09-30T13:23:05Z

Oh good catch adriansr, @blakerouse @michalpristas Do you see any problem into increasing that, is the data compressed?

ph · 2020-09-30T13:40:14Z

reassign to @ruflin I think this could expose a bigger issue in our system that just the protocol between Agent and the processes.

blakerouse · 2020-09-30T13:47:00Z

@ph I don't see why it could not be increased. Seems like its a general default in GRPC which can be used for many use-cases so makes sense why the value is lower. Being that the communication between GRPC is secured through mutual-TLS the only way to connect is having the correct information, so I do not see a technical issue with increasing this.

ph · 2020-09-30T13:48:57Z

@blakerouse thanks, I think we have to check elsewhere before doing that change.

ruflin · 2020-10-15T07:25:47Z

I would like to push this forward. In general, I think we should have a sensible limit for max policy sizes to keep sanity in the system. This also helps to bubble up these kind of errors. As the above error bubbles up in GRPC, I wonder what our limit from the Kibana side is? I would expect that this is where it more becomes a problem when the config is sent over the network.

One thing we need, is to make sure this error is properly bubbled up to Kibana to give the user good feedback on what kind of error happened. In addition, we could make the max size a configuration option so in edge cases it could be overwritten.

@michalpristas @blakerouse @nchaulet

Do we have any limits today on the Kibana side?
Do we compress the policy sent down?

In parallel we should work with @adriansr to see if there are options to not have to ship down these massive parsers ;-)

ph · 2020-10-15T14:00:04Z

Just for completeness, I've mentioned on slack, could we minify these JS parsers and could we compress them when we package the integration?

andrewkroh · 2020-10-29T13:21:08Z

I ran a minifier on the JS files to get an idea of the potential savings. Looks like about 330K.

$ cat pipeline.js | minify --type=js > pipeline-min.js
$ cat liblogparser.js | minify --type=js > liblogparser-min.js
$ gzip -k pipeline.js 
$ gzip -k liblogparser.js
$ cat pipeline.js.gz | base64 > pipeline.js.gz.base64
$ cat liblogparser.js.gz | base64> liblogparser.js.gz.base64
$ ls -lah
total 20184
drwxr-xr-x@ 13 akroh  staff   416B Oct 29 09:32 .
drwxr-xr-x@  7 akroh  staff   224B Oct 22 09:58 ..
-rw-r--r--@  1 akroh  staff   849B Oct 22 09:58 input.yml
-rw-r--r--   1 akroh  staff    82K Oct 29 09:17 liblogparser-min.js
-rw-r--r--@  1 akroh  staff   114K Oct 22 09:58 liblogparser.js
-rw-r--r--@  1 akroh  staff    21K Oct 22 09:58 liblogparser.js.gz
-rw-r--r--   1 akroh  staff    28K Oct 29 09:32 liblogparser.js.gz.base64
drwxr-xr-x   2 akroh  staff    64B Oct 29 09:15 out
-rw-r--r--   1 akroh  staff   3.2M Oct 29 09:28 pipeline-base64.js
-rw-r--r--   1 akroh  staff   2.1M Oct 29 09:16 pipeline-min.js
-rw-r--r--@  1 akroh  staff   2.4M Oct 22 09:58 pipeline.js
-rw-r--r--@  1 akroh  staff   423K Oct 22 09:58 pipeline.js.gz
-rw-r--r--   1 akroh  staff   565K Oct 29 09:31 pipeline.js.gz.base64

ph · 2020-10-29T13:26:29Z

@jfsiii Hey that's the integration I was thinking yesterday.

ph · 2020-10-29T13:27:29Z

@andrewkroh Could we gzip them?

winterfell~/tmp/exp(:|✔) % ls -sh
total 440K
 24K liblogparser.js.gz  416K pipeline.js.gz

andrewkroh · 2020-10-29T14:09:15Z

I see two areas where we could optimize size.

The full script gets included multiple times because it must be part of each streams' configuration. This compounds the size.

Offer a way to fetch the asset separately from the config. Configs would no longer need to inline the asset.
Allow the script to be included once and reference it via YAML anchors everywhere it's used in the policy.

The script content is large.

We could compress it, base64 encode it, and include it in the YAML. But something must be responsible for decompressing it on the client (either the agent before passing it to the beat or the beat itself).
The agent could compress the whole policy before sending it over GRPC. This would be useful anyways since policies can grow as more integrations are enabled.

ruflin · 2020-10-30T08:34:36Z

If we switch to the minifier approach now, would things work again as a short term solution?

@andrewkroh

1: Is each stream here using the same input? Could the script be put on the input level?
2: I'm less worried about Agent -> Process communication as it is all local than Fleet -> Agent communication. If we do compression, we should do it already on the Fleet side.

andrewkroh · 2020-10-30T19:19:28Z

If we switch to the minifier approach now, would things work again as a short term solution?

No, I think the savings on the minified version are not enough. For the short term we can delay shipping the affected packages.

1: Is each stream here using the same input? Could the script be put on the input level?

They are different input types so that it can accept files, udp, or tcp logs. But other than the input type the rest of the configuration attached to the input is duplicated (like the same fields, fields_under_root, and processors which includes the large script).

ruflin · 2020-11-03T09:10:37Z

Unfortunately at the moment we only support grouping of configs per input type. I wonder if there is a "low hanging fruit" we could solve this with?

@urso With your input grouping proposal, I assume the above would be possible?

urso · 2020-11-03T10:32:57Z

With your input grouping proposal, I assume the above would be possible?

Yes, it could help, as all the settings like fields, fields_under_root, and processors could become shared settings. E.g. the config could become:

- id: ...
  defaults: // however we name it :)
    fields: ...
    fields_under_root: true
    processors:
      ...
  streams:
  - type: logs
    paths: ...
  - type: udp
    ...
  - type: tcp
    ...

But this requires Fleet to actually construct this kind of config.

In case we have loads of redundancy we might also consider some 'dedup' support, so we can reference configurations that are common o multiple configured inputs. E.g. if we find 2 datastreams configured, but with different namespace we might end up with a redundant configuration. Besides compression/minification we could provide common/default config blocks like:

defaults:
  integration_with_big_config:
    setting1: ...
    setting2: ...

inputs:
- ...
  use_defaults: integration_with_big_config
- ...
  use_defaults: integration_with_big_config

All in all it sounds like the main issue is that we have too big/complex processing chain we push to the Beat. We should try to replace those with Ingest Pipelines instead.

ph · 2020-11-03T14:49:44Z

I think @urso is on something.

What he proposes is partially supported by the local vars provider from composable inputs. I say partially because from package you cannot define providers related data.

providers:
  local:
    vars:
      foo: bar

inputs:
- ...
  value: ${foo}

ruflin · 2020-11-04T14:56:04Z

@ph Interesting idea, so the processor content could be used as a variable?

Overall I agree with @urso that long term we must solve the root cause which is shipping down processing rules ...

ph · 2020-11-04T14:59:55Z

@ruflin What I've described is already working today.

But this requires more long term thinking, maybe by a custom fleet provider where you can define namespace. I was just throwing ideas.

++ on solving the root cause.

ph · 2020-11-05T13:09:20Z

We have discussed this over zoom and we decided the following:

Sending 4mb configuration is way too big.
Maybe the package is an outlier.
@andrewkroh and the security team will look for either ingest pipeline improvement or out of band delivery.
We would keep 4mb for now.

ph · 2020-11-19T14:54:27Z

@andrewkroh @adriansr I am going to close this issue, we don't consider this change to be the right solution for this problem, if you take any suggestion above into another proposal we are happy to discuss it with you.

adriansr changed the title ~~Client can receive configurations larger than 4MiB~~ Client can't receive configurations larger than 4MiB Sep 23, 2020

adriansr added bug Something isn't working question Further information is requested labels Sep 23, 2020

ph added the Team:Elastic-Agent Label for the Agent team label Sep 30, 2020

ph assigned blakerouse and ruflin and unassigned blakerouse Sep 30, 2020

ph added v7.11.0 v7.12.0 and removed v7.11.0 labels Oct 29, 2020

ph unassigned ruflin Oct 29, 2020

ph removed the v7.12.0 label Nov 5, 2020

jfsiii mentioned this issue Nov 16, 2020

[Fleet] Store installed packages in cluster elastic/kibana#83426

Closed

ph closed this as completed Nov 19, 2020

alvarezmelissa87 mentioned this issue Oct 21, 2021

[FLEET] Increase asset size limit for installing ML model packages elastic/kibana#115890

Merged

2 tasks

v1v pushed a commit to v1v/elastic-agent-client that referenced this issue Sep 5, 2022

Move api and api/npipe and monitoring (elastic#17)

9d07e6f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client can't receive configurations larger than 4MiB #17

Client can't receive configurations larger than 4MiB #17

adriansr commented Sep 23, 2020 •

edited

Loading

ph commented Sep 30, 2020

ph commented Sep 30, 2020

blakerouse commented Sep 30, 2020

ph commented Sep 30, 2020

ruflin commented Oct 15, 2020

ph commented Oct 15, 2020

andrewkroh commented Oct 29, 2020 •

edited

Loading

ph commented Oct 29, 2020

ph commented Oct 29, 2020 •

edited

Loading

andrewkroh commented Oct 29, 2020

ruflin commented Oct 30, 2020

andrewkroh commented Oct 30, 2020

ruflin commented Nov 3, 2020

urso commented Nov 3, 2020

ph commented Nov 3, 2020

ruflin commented Nov 4, 2020

ph commented Nov 4, 2020

ph commented Nov 5, 2020

ph commented Nov 19, 2020

Client can't receive configurations larger than 4MiB #17

Client can't receive configurations larger than 4MiB #17

Comments

adriansr commented Sep 23, 2020 • edited Loading

ph commented Sep 30, 2020

ph commented Sep 30, 2020

blakerouse commented Sep 30, 2020

ph commented Sep 30, 2020

ruflin commented Oct 15, 2020

ph commented Oct 15, 2020

andrewkroh commented Oct 29, 2020 • edited Loading

ph commented Oct 29, 2020

ph commented Oct 29, 2020 • edited Loading

andrewkroh commented Oct 29, 2020

ruflin commented Oct 30, 2020

andrewkroh commented Oct 30, 2020

ruflin commented Nov 3, 2020

urso commented Nov 3, 2020

ph commented Nov 3, 2020

ruflin commented Nov 4, 2020

ph commented Nov 4, 2020

ph commented Nov 5, 2020

ph commented Nov 19, 2020

adriansr commented Sep 23, 2020 •

edited

Loading

andrewkroh commented Oct 29, 2020 •

edited

Loading

ph commented Oct 29, 2020 •

edited

Loading