-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display which integration is causing the agent to become unhealthy #100
Comments
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@ph is this something can be handled as part of the V2 control protocol design? |
I think there are 3 parts:
|
@blakerouse and I have been discussing 1. |
Yes, I think this is (or will be) part of the input V2 proposal. |
We are looking at the inputs too, we will have to sync with you and @kvch. |
@ph should I keep it in 8.2 or should I rather reconsider it for another release? |
@jlind23 Agree, we need changes in v2 and in the input to be able to better report their state, it's a joined effort from data and control plane. |
Will the agent go unhealthy only if the integration is not "installable", or will it also catch if an integration is writing error.message fields since it cannot connect to xyz? (e.g. elastic/integrations#3074) |
@philippkahr the goal is to catch also when the integration is failing. But then it's going to be up to the integrations developer to properly report statuses. |
I've got the same abnormal behavior when I change the output. From ES default to a copy if my defaut + dead letters enable |
While we are deploying Agent at Elastic, we notice that we have an increasing need to have better logging between Agent and Fleet and within Fleet itself. Please let me know if this is the correct place to put these FR's.
|
Hi @jlind23, As per the feedback from @kevinlog, we have created test case scenarios for CC: @joshdover Thanks! |
@prachigupta-qasource Sounds good, thank you! Also note that @muskangulati-qasource and @harshitgupta-qasource are testing this functionality in Fleet as a part of OLM testing efforts. I think its a good idea for both teams to be aware of the functionality and have a test plan, but we may also be able to work together and not duplicate too much effort. |
Hi Team, We have executed 02 testcases for this feature under our Fleet Test run at Fleet 8.4.0-BC3 Feature test plan and found that it's working fine. Build details:
Thanks! |
I pushed the feature-arch-v2 branch to fleet server with initial cut of propagating the detailed status information The corresponding agent side PR is posted against the agent branch This should cover the agent/fleet-server side of things for: @joshdover @blake.rouse @ph let me know if there is anything else that needs to be addressed related to this feature, can iterate and update things as needed. The .fleet-agent document looks like the following at the moment: |
The fleet server and the elastic agent work is complete at the moment, the agent posts the new extended health information to the stack. The work is merged to the fleet server and the elastic agent branches respectively: It's expected that we will iterate on this feature few more times before release. |
When an Elastic Agent becomes unhealthy due to an integration, the only way to understand which integration is causing is to remove integrations one by one and/or check logs for a particular error.
Elastic Agent should be able to catch when an integration is failing and must be able to log this within the status command and the diagnostics command.
This is a design tasks between Elastic Agent Data Plane and Elastic Agent Control Plane.
The text was updated successfully, but these errors were encountered: