-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Control protocol checkin payloads can exceed the gRPC maximum message size when using autodiscovery #2460
Comments
The Could we just go with a larger default maximum message size? Being that this is an internal GRPC protocol between the components is there any danger in doing that? I don't think so, its not a limitation for DDOS or something. |
We can increase the default size, and it will only have performance implications when the configs we transport are very large. The question with just adjusting the maximum size is determining how big it needs to be. We can make it configurable to minimize complexity here. I'm not actually sure how common this problem will be, it seems simpler to just increase the max message size and allow configuring it as an escape hatch when the default isn't large enough. Another option would be compressing the messages when we send them, that might help mitigate it as well but with a permanent CPU cost. |
@cmacknz @blakerouse
We have a support case confirming that setting the max message size does not seem to work/have any effect. Thus the result is, this issue is blocking the rollout of the Agent to these medium K8s clusters at customers/users. We have expectations/clusters that are much larger 3K-4K endpoints and these numbers will only increase over time. |
I think the problem here is that we can adjust the maximum message size for the server side (agent) via the agent configuration, but this limit also applies on the client side (Filebeat) where we don't expose it. See https://pkg.go.dev/google.golang.org/grpc#MaxCallRecvMsgSize.
This is a good point and pushes me towards trying to solve this in a way that doesn't involve manual configuration changes each time the agent is deployed on k8s. There's no real way to know how big to make the max message size ahead of time. We could consider changing the protocol to send each configuration unit in a separate message, right now they are each sent as an repeated array in a single message. message CheckinExpected {
// Units is the expected units the component should be running.
repeated UnitExpected units = 1;
// Agent info is provided only on first CheckinExpected response to the component.
CheckinAgentInfo agent_info = 2;
// Features are the expected feature flags configurations.
// Added on Elastic Agent v8.7.1.
Features features = 3;
// Index of the either current features configuration or new the configuration provided.
uint64 features_idx = 4;
} @bvader do you have diagnostics or a sample agent policy that was experiencing this that we could use as a reference? |
@cmacknz Apologies I will get these for you. |
The diagnostics were shared privately. I can confirm at least one instance contains 525 unique units. The structure of the components.yaml in this case had each unit starting with: units:
- config:
datastream: Searching for rg '\- config:' components.yaml --count
525 |
I believe we can get creative and get this working without having to change the protocol. The Elastic Agent only sends a unit configuration when something has changed, I would expect the main issue here is that on startup with a large number of containers the Elastic Agent is trying to send all the new units at one time. That one time send will hit the limit of the GRPC. In the case that a large number of units are being added to the component (larger than the GRPC limit) we add the units in increments (they all don't have to be created in one shot). That staggered roll out of the units would allow the Elastic Agent to stay under the GRPC units. Same logic above can be applied to updating units. If we need to rollout a configuration change that affects all units and would result in every unit to get an updated configuration that roll out of the configuration could be staggered so not to send every configuration for every unit in one shot. In the very rare chance that the base information for all units without a configuration (which should be very small) is more than we would ever be able to send over the protocol we could split that into two separate components. Each running a set of units to keep it below the threshold of the GRPC limit. |
The protobuf messages all have a Size() method we could use to check how large they are before attempting to send them to help with that idea. We would have to be very careful to avoid bugs in client implementations where the staggered rollout leads to an accidentally removed unit. For example Beats is always looking for units that were present but aren't anymore (see code). Really this is a just a different type of protocol change, but it is harder for clients to know about the semantic change because the wire format hasn't actually changed. It might actually be easier to also change the message definitions at the same time, because with either approach each client implementation needs to be thoroughly retested anyway. I think I am still biased towards making the change that solves this permanently and just changing the RPC definition. We know about all the client implementations today and this change won't get easier as time passes. |
We have to be careful of that always, that is the contract on how the protocol works. My suggestion would change nothing in the contract or in the protocol. From the stand point of the component it all works the same. It would be no different then someone adding a single integration one at a time of a set of time.
Not true, its not a protocol change at all and not a change to the component at all. The change would only need to be done in the internal of the Elastic Agent.
This requires the change to happen in every component that Elastic Agent supports. Where as my change allows it to be fixed in the Elastic Agent without having to change the protocol or the contract with the spawned components. |
@blakerouse and I spoke about this today and agreed the best solution will be to implement an optional, chunked transfer protocol for sending the
This will be conceptually similar to the HTTP chunked transfer encoding. |
My thinking on this problem is now that we may be better off changing how we generate configurations when using autodiscovery. Regardless of if we change the control protocol, generating these giant configurations is inefficient and will still cause problems with some recent changes we have coming where we'd like to store these configurations in ConfigMaps or Secrets which have fixed size limit. |
How do you see us addressing this topic? A kinda lightened configuration for autodiscovery? |
We probably still need to change the protocol, but in a different way. We could add a way to transport an input template and then the list of template variables, rather than having the agent pre-render everything which is what it does today. |
I significantly expanded the description to add more context to the problem, and provide a few more candidate solutions that are able to work in situations where we don't manage input configurations with the control protocol. |
Here is the current plan after investigation and discussion with @cmacknz:
|
Just wanted to note that this can happen even when running a very low pod count per node. E.g. we're currently running 30 pods per node max, but we hit this because there are a bunch of pods that are completed but still not cleaned up fully.
For some pods we need to keep them around in this terminated phase to e.g. debug, retry etc. |
Intro
Kubernetes Autodiscovery allows the agent to automatically add and update inputs to its policy as containers, pods, and nodes in a Kubernetes cluster come and go.
This works well, but when autodiscovery is configured such that it will generate inputs for each pod on a node or each node in a cluster the resulting agent policies can be so large they cause problems. For example, the officially recommended limit for the number of pods on a node is 110 but individual Kubernetes runtimes can allow more than this. Amazon EKS allows 737 pods per node for the largest node types, and we have seen individual nodes with 400+ pods in support cases.
Problem
We have a recent example where the agent failed in this case when a user was attempting to monitor 700+ pods. The agent logs were flooded with errors like:
The biggest problem caused by these large agent policies is that they begin to exceed the default size limits for the messages in the control protocol. This prevents the system from functioning by default.
This problem also affects diagnostics, see #1808. All communication between the agent and its subprocesses is affected by this problem, and it may also affect communication with Fleet server since the Fleet checkin payload contains an entry for each input.
Solution 1: Configurable Message Size Limits
There is a way to configure the maximum message size for the Elastic Agent itself, but there is a similar limit that must be configured for each client.
elastic-agent/elastic-agent.reference.yml
Lines 111 to 113 in b60b8b0
Even if we could allow changing the size limits everywhere, this solution to the problem is not automatic and requires users to experience a failure and diagnose policy size as the problem.
Solution 2: Chunked gRPC Transfers
For a more transparent solution, we can introduce configuration chunking into the agent control protocol as described in #2460 (comment). This would solve the problem for the agent as it exists today.
However we are also working to implement a Kubernetes operator for the Elastic Agent, and in this case the agent policy would not be transported using the control protocol but rather stored in a Kubernetes primitive like a ConfigMap or a Secret, both of which have fixed 1 MB size limits. Changing the control protocol would not solve the problem in the case of an agent Kubernetes operator or other Kubernetes native technology.
Solution 3: Require Components to Render Input Templates
Yet another alternative approach is to change the way we generate inputs when using autodiscovery entirely. Today each discovered input is rendered completely in the agent policy as the need for them is discovered. This is convenient as it requires no logic in each input, but problematic because the generated agent policies can be enormous.
For a single example, it is common to generate a filestream input to collect logs from each container on a node using an input like (see the Dynamic Logs Path documentation):
The autodiscovery logic will further expand this configuration to add processors resulting in an even larger configuration, which is then repeated in the policy 100s of times:
Rather than having the agent render these inputs from templates itself, we could introduce the concept of a templated input directly into the control protocol.
I am imaging that we introduce a message type that contains a base template, which could similar to the unrendered agent input in the agent policy (repeated below for clarity) but in the same message type we also include the list of variables to substitute.
This configuration require us to provide
${kubernetes.pod.name}
,${kubernetes.pod.uid}
, and${kubernetes.container.name}
for each pod on the node. A representative set of messages could look like:This would compress the configuration down to the minimum set of information necessary to transport to each sub-process, however it would require each component support templated input rendering in their implementation language. This solution is not as easily generalizable as introducing a chunked control protocol, but it also solves the problem in the case where the agent policy needs to be stored in
Solution 4: Use YAML Anchors to Avoid Repetition
Similar to the approach above to avoid duplicating information in the control protocol, we could leave the control protocol as is and attempt to eliminate the duplication using YAML anchors or custom YAML syntax in the agent policy itself.
For an example, see Gitlab's documentation on simplifying configurations: https://docs.gitlab.com/ee/ci/yaml/yaml_optimization.html#anchors
Those examples use anchors in combination with map merging to attempt to perform the templating described above using the more advanced parts of the YAML syntax.
This will make the computed agent policy harder to read, and also requires that the YAML features we use be well supported in multiple implementation languages (at least Go and C++).
Scope
Provide a recommended solution to this problem to ensure that the Elastic Agent can scale on to monitor any size of Kubernetes cluster without arbitrary internal limits. Consider solutions beyond those proposed here, each of which as their own set of pros and cons.
Keep in mind that this problem so far only affects Kubernetes deployments, and solutions that add a fixed resource cost to all use cases (enabling compression in the control protocol for example) should be avoided if possible.
The text was updated successfully, but these errors were encountered: