Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add periodic logs for agent status #140014

Closed
joshdover opened this issue Sep 5, 2022 · 19 comments · Fixed by #144037 or #148635
Closed

[Fleet] Add periodic logs for agent status #140014

joshdover opened this issue Sep 5, 2022 · 19 comments · Fixed by #144037 or #148635
Assignees
Labels
estimate:small Small Estimated Level of Effort Project:FleetScaling QA:Validated Issue has been validated by QA Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. Team:Fleet Team label for Observability Data Collection Fleet team v8.6.1

Comments

@joshdover
Copy link
Contributor

It'd be helpful in troubleshooting cases to have some amount of logging around the status of the Fleet. We currently have no live visibility around this without requesting information from the end-user but we could make this available in the logs.

We can likely re-use the same code path we use to collect this for product telemetry: https://github.com/elastic/kibana/blob/94b1267c389c6cf0451781ad2194c52c4075510c/x-pack/plugins/fleet/server/collectors/agent_collectors.ts/#L22

I'd propose we:

  • Add a info-level log that reports this every 15 minutes
  • Add a debug-level log that reports this 5 minutes

One thing to consider when implementing this would be that it'd be best to minimize queries on this index if possible. For instance, it'd be better to not run the query make the log.debug call at all if debug logging is disabled.

@joshdover joshdover added Team:Fleet Team label for Observability Data Collection Fleet team estimate:small Small Estimated Level of Effort labels Sep 5, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@joshdover joshdover added the Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. label Sep 5, 2022
@joshdover
Copy link
Contributor Author

We can probably safely backport this as well to 8.5.x.

@nchaulet
Copy link
Member

@elastic/kibana-core I tried to look at this but I did not find an easy way to do so, it's possible to know what log level is currently enabled for a plugin?

@pgayvallet
Copy link
Contributor

pgayvallet commented Oct 26, 2022

it's possible to know what log level is currently enabled for a plugin?

No, we currently don't have an equivalent to log4j's Logger.isLevelEnabled / Logger.isInfoEnabled

Having it would make sense though, and shouldn't be too complex to implement in theory

@pgayvallet
Copy link
Contributor

pgayvallet commented Oct 26, 2022

I created #144002. Will try to take a preliminary look this week

@joshdover
Copy link
Contributor Author

@nchaulet We should test how necessary this optimization would be to get this out. For instance, how fast would this query be on an index of 250k agent documents? My guess is it's actually quite fast and very easily cacheable.

@nchaulet
Copy link
Member

@joshdover Looking at the trace for agent status (it's the same ES queries here) for 300k agents this seems pretty fast and maybe we do not have to do that optimization.
Screen Shot 2022-10-26 at 8 22 50 AM

@pgayvallet
Copy link
Contributor

FWIW the PR is ready: #144033, so you may be able to use it very soon if you need to.

@kpollich
Copy link
Member

kpollich commented Nov 7, 2022

Validation steps for QA:

  1. Verify that every 15 minutes, a "Fleet Usage" message is logged to the INFO log level by running agents
  2. Verify that, when debug log level is enabled in agent settings, the same "Fleet Usage" messaged is logged every 5 minutes instead

@ghost
Copy link

ghost commented Nov 9, 2022

Hi @nchaulet,

We have re-validated this issue on the latest 8.6.0 SNAPSHOT Kibana Staging environment

Build details:

Version: 8.6.0 SNAPSHOT
Build: 58038
Commit: b02d976a3b8466ae4e0b1ec9840175dce1790719

Observations:

  • We have observed for debug level agent logs, logs are generated periodically after 5 minutes.

image

  • Further, we haven't observed any "Fleet Usage" message under
  1. Debug log level every 5 minutes.
  2. Info log level every 15 minutes.

Further, CC: @kpollich we have few queries:

  1. Could you please confirm if we are missing any action to generate "Fleet Usage" message.

I'd propose we:

  • Add a info-level log that reports this every 15 minutes

We have observed the Info level agent logs as usual.
But, we haven't observed any periodically generating logs after 15 minutes for Info level agent logs.

  1. Could you please confirm is there any particular Info level agent logs which we are expecting to be generated periodically after 15 minutes.

Please let us know if we are missing anything.

Thanks!

@kpollich
Copy link
Member

@nchaulet could you take a look at the above? Seems like we're not getting the Fleet Usage: ... logs as expected here.

@nchaulet
Copy link
Member

Hi @prachigupta-qasource looks like I should have given more info on how to test this/what is the expected behavior

That task add new Kibana logs (not agent/fleet server), you should see in your Kibana logs somethings like this

2022-11-23T10:17:10.934-05:00][INFO ][plugins.fleet] Fleet Usage: {"agents_enabled":true,"agents":{"total_enrolled":3,"healthy":1,"unhealthy":1,"offline":1,"total_all_statuses":5,"updating":0},"fleet_server":{"total_enrolled":1,"healthy":1,"unhealthy":0,"offline":0,"updating":0,"total_all_statuses":1,"num_host_urls":2}}

Does it make sense to you?

@ghost
Copy link

ghost commented Nov 30, 2022

Hi @nchaulet,

We are blocked to test this feature due to #146656

We will test this feature once the blocker issue is fixed.

Thanks!

@ghost
Copy link

ghost commented Jan 3, 2023

Hi @nchaulet,

We have re-validated this issue on the latest 8.6.0 SNASPSHOT Kibana Staging environment and found the below observations.

Observations:

  • Info level Kibana logs are generated periodically after 15 minutes on Stream page.

Screenshots:

image

Build details:

Version: 8.6.0 SNASPSHOT
Build: 58836	
Commit: c735e9fc6fdf0221cc3134b5fe110e9ae4a0effb

Query

image

image

  • On changing the Agent logging level to Debug, we are still observing that only Info level logs are available for Kibana.log event.dataset recording logs with Fleet Usage at every 15 minutes.

image

Verify that, when debug log level is enabled in agent settings, the same "Fleet Usage" messaged is logged every 5 minutes instead

So @kpollich, we wanted to confirm if new debug logs will be generated at every 5 minutes for Kibana.log.

Please let us know if we are missing anything.

Thanks!

@nchaulet
Copy link
Member

nchaulet commented Jan 3, 2023

Hi @prachigupta-qasource

The debug logs should change to 5 minute when settings the kibana plugin fleet log level not the agent log level. Let me know if it's clearer to you.

@ghost
Copy link

ghost commented Jan 4, 2023

Hi @nchaulet,

Thank you for the feedback and looking into our queries.

We have added xpack logging.root.level: debug in the kibana.yml on the latest 8.6.0 SNASPSHOT Kibana Staging environment and found the below observations.

Observations:

What's working:

  • Debug level Kibana logs are generated periodically after 15 minutes on Stream page.

What's not working:

  • Debug level Kibana logs are not generated periodically after 5 minutes on Stream page.

Screenshots:

image

Build Details:

Version: 8.6.0 SNASPSHOT
Build: 58836	
Commit: c735e9fc6fdf0221cc3134b5fe110e9ae4a0effb

Please let us know if we are missing anything.

Thanks!

@ghost
Copy link

ghost commented Jan 4, 2023

Hi @nchaulet,

We have created below 02 test cases for this feature under our Fleet Test Suite:

Please review the test cases and let us know if we are missing anything.

Thanks!

@ghost
Copy link

ghost commented Jan 9, 2023

Hi @nchaulet,

We have updated our details observations related to collection of debug level logs in comment #140014 (comment)

Further, as per above shared observation, we are blocked to test test case:
[C168553]: Validate that debug level Kibana logs are generated periodically after 5 minutes on the Stream page

So, Could you please look into it, so that we can continue on above testing. Please let me know if we are missing anything.

Thanks!

@ghost
Copy link

ghost commented Jan 17, 2023

Hi @nchaulet,

We have re-validated this ticket after fixes on #148635 on the latest 8.7.0 SNAPSHOT Kibana Staging environment and found the below observations.

Observations:

  • Info level Kibana logs are generated periodically after 15 minutes on Stream page.
  • Debug level Kibana logs are generated periodically after 5 minutes on Stream page.

Screenshot:

image

Build details:

Version: 8.7.0 SNAPSHOT
Build: 59793	
Commit: 832a128e2aae7d3a61a6df48779fe03126b30308

Hence, marking this ticket as QA: Validated.

Thanks!

@ghost ghost added QA:Validated Issue has been validated by QA and removed QA:Needs Validation Issue needs to be validated by QA labels Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
estimate:small Small Estimated Level of Effort Project:FleetScaling QA:Validated Issue has been validated by QA Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. Team:Fleet Team label for Observability Data Collection Fleet team v8.6.1
Projects
None yet
5 participants