Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate java-attacher errors to Kibana #7832

Closed
axw opened this issue Apr 11, 2022 · 11 comments
Closed

Propagate java-attacher errors to Kibana #7832

axw opened this issue Apr 11, 2022 · 11 comments

Comments

@axw
Copy link
Member

axw commented Apr 11, 2022

When using the java-attacher, an error (e.g. failure to execute java) should be indicated in Kibana somehow. For example, this might be done by setting the status of the APM integration to degraded.

@simitt
Copy link
Contributor

simitt commented Apr 11, 2022

@joshdover are there any plans for adding a more fine grained health check UI to Fleet where this might fit? I believe in the past @ruflin mentioned some vague ideas for a health state per agent, listing all the processes that are supposed to be running.

@eyalkoren
Copy link
Contributor

I think that if a policy contains both APM Server and APM Agent configurations (probably only relevant to Java agent now, but hopefully will be relevant to others in the future), we can assume this APM Server is only used for local purposes and simply consider the entire APM integration unhealthy if there is an indication that the agent is unhealthy.

@ruflin
Copy link
Contributor

ruflin commented Apr 11, 2022

@ph @jlind23 @cmacknz Can you chime in on the status and plans on health.

@felixbarny
Copy link
Member

After APM Server has discovered the Java installation and before it calls the attacher, it should also validate that the Java installation is working as expected.

Currently, APM Server logs this message when invoking the attacher fails: failed to run java attacher: exit status 1.

Checking whether the Java installation is working by invoking java -version (and ideally logging the output to the server logs), helps to see if there's a general issue with the Java setup or if there was something wrong specifically with the attacher.

@jackshirazi
Copy link
Contributor

Tested on Windows, I get same - or slightly worse as it can't download the requested version too

13:24:09.146
elastic_agent.apm_server
[elastic_agent.apm_server][error] failed to run java attacher: exit status 1
13:24:09.785
elastic_agent.apm_server
[elastic_agent.apm_server][error] Failed to download requested agent version 1.27.1, please double-check your --download-agent-version setting.
13:24:09.824
elastic_agent.apm_server
[elastic_agent.apm_server][error] failed to run java attacher: exit status 1

@cmacknz
Copy link
Member

cmacknz commented Apr 14, 2022

@ph @jlind23 @cmacknz Can you chime in on the status and plans on health.

Improving the agent integration health reporting is tracked under elastic/elastic-agent#100. We are just starting to design what this looks like.

@simitt
Copy link
Contributor

simitt commented May 30, 2022

Regarding #7832 (comment), it is not yet clear to me whether an integration is supposed to also signal whether or not the Elastic Agent should try to restart the process when reported unhealthy or if there will be more fine granular indication. A restart by the Elastic Agent would not make sense in the described cases. @cmacknz can you already share any more details on how this will look like or expected timelines for the definitions for the healthcheck work?

@cmacknz
Copy link
Member

cmacknz commented May 30, 2022

@simitt We have been iterating on the design details. The proposal is Integration Status Health Reporting. It was being reworked a bit last week but the high level details are right. I added you to the stakeholder list to make sure you are notified of changes.

The new error reporting mechanism needs to be supported in the agent control protocol, @ph can comment on the timeline for implementing this but I suspect implementation will start in 8.4 sometime.

@simitt
Copy link
Contributor

simitt commented Jun 3, 2022

@felixbarny given the above conversation, I don't think it makes sense to implement something in the apm-server before the healthcheck endpoint in the Elastic Agent is defined. What do you think?

@felixbarny
Copy link
Member

Yes, I agree.
FYI @eyalkoren

@axw
Copy link
Member Author

axw commented Nov 15, 2022

@eyalkoren is looking into splitting the attacher off into its own integration, which would naturally enable surfacing errors. I don't think it makes sense to invest in a lot of changes to Elastic Agent, Fleet, and APM Server in the interim, when we plan to provide a more dedicated integration in the hopefully not too distant future. If needed we can reopen this.

@axw axw closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants