-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Testing] Input reload not working as expected under Elastic-Agent #35178
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Do we need this to be an integration test, or to even test it under the Elastic agent at all? Can this behaviour be isolated to just Filebeat itself? The agent is just configuring Filebeat inputs using essentially the same logic used when configuration file reloading is enabled. Can this be a Filebeat system test, or even just a unit test? |
I based myself on the issue's description. |
I believe it might be possible if we mock the ManagerV2 or create a similar situation. The main difference from running under Agent and the configuration file reload is that the Agent can send pieces of configuration at any time in any order, and that seems to be the root cause of the issues. When reloading from the configuration file, Filebeats read the whole config and reloads it in "a single step". I've been thinking about the acceptance criteria and the first two are not really feasible:
Here is why:
With that all said, the 3rd one is pretty good to describe on a high level what we should be testing. |
Here's an example of a unit test that has complete control over the sequence of messages that would be sent by the agent:
I am not convinced we actually need to test at this level though, and we can probably trigger this bug by just working with the beat configuration directly. If we replace a filestream input with ID A with a copy of that input with ID B I think we'd trigger the same behaviour, this is what the agent units are likely doing and it is the exact behaviour you'd get if you reassigned the agent policy to another one with the same input defined in it. Only the ID would change. |
Discussed today and we believe we have isolated the behaviour using only Filebeat itself. @belimawr if we need another round of discussion and brainstorming on this let me know and I'll set it up. |
@belimawr don't hesitate to update the issue following your investigation and discussion with Craig. |
Yes, we isolated it, the main thing is that a standalone Filebeat doing config reload handles this issue gracefully by having a debounce when reloading config + infinity retry until all inputs are successfully started, hence we don't see the effects of this issue. I'll add a quick brain dump of how to reproduce/see it happening here. Given:
filebeat.ymlfilebeat.inputs:
filebeat.config:
inputs:
enabled: true
path: config/*.yml
reload.enabled: true
reload.period: 10s
output.file:
enabled: true log1.yml
- type: log
id: log-input-1
paths:
- /tmp/flog.log log2.yml- type: log
id: log-input-2
paths:
- /tmp/flog.log Start Filebeat with debug logs
Keep replacing the configCopy one of the configs so Filebeat can use them
Wait for the harvester to startYou should see logs like
Keep replacing the input configHere is an example, after every command wait for a little while, until you see some logs informing of the reload/start of the new harvester
Eventually (it happens rather quickly) you'll see logs like:
Will there be any observable issue on the output/data harvested?No, there will not. The config file reload knows how to handle this situation and Filebeat will, eventually, be running all configured inputs successfully. What about when running under Elastic-AgentWell, if there is any issue when applying the new configuration sent by the Elastic-Agent Filebeat will report the error, which makes the Elastic-Agent to report unhealthy. |
I spoke with @belimawr today and we came up with the follow proposal to test this:
beats/x-pack/libbeat/management/config.go Lines 12 to 17 in f4374dc
beats/x-pack/libbeat/management/managerV2.go Lines 118 to 125 in f4374dc
It will look something like the insecure configuration used in the existing unit tests: beats/x-pack/libbeat/management/managerV2_test.go Lines 202 to 209 in f4374dc
The test will only need to implement the This will give us the foundation to write integration tests that directly control configuration changes sent to Beats without having to rely on editing the agent policy in the required way, and more importantly does not introduce a dependency on the Elastic Agent build system by requiring use of the under development Elastic Agent integration testing framework |
Related Issue
#33653
Description
Implement some Integration Test using our new Framework to catch the input reload bug before trying to do the implementation.
By doing it, we will be able to ensure that the fix we will provide is covering correctly every workflow.
More details in the related issue.
Acceptance Criterias
The text was updated successfully, but these errors were encountered: