-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Fix agent memory query #174458
Comments
Pinging @elastic/fleet (Team:Fleet) |
As mentioned in elastic/elastic-agent#4005 (comment), the |
👍 FYI I reported a quick comparison of the |
We are missing the monitoring components. |
@cmacknz Were you able to confirm that these are omitted? |
Context
Following the investgation carried out for https://github.com/elastic/sdh-beats/issues/4209, the agent memory reported in Fleet's agent table and agent details appears to be about 3-4 times under its actual value. One comparison point is the memory reported by running
systemctl status elastic-agent
.The first round of analysis (see details below) suggests that the current query used to calculate the total memory for the agent incorrectly aggregates separate Beat instances together.
Furthermore, the agent memory displayed in the
[Elastic Agent] Agent metrics
dashboard appears to be similarly undervalued (which is the original issue raised by https://github.com/elastic/sdh-beats/issues/4209). Since the query should be very similar, this should be fixed as well.It is likely that the agent CPU, which is calculated from the same query, should also be corrected. Note that this metric has also been reported to have unrealistic values (https://github.com/elastic/sdh-beats/issues/3834) and there is an ongoing effort to document how it works (elastic/elastic-agent#4005). It would make sense to do the same for agent memory (either as part of this issue or a followup documentation issue).
Details
Steps to reproduce
systemctl
(from Multipass shell):Analysis
The issue seems to arise from the query used to calculate the agent's memory and CPU. This query computes, for each agent, two values called
memory_size_byte_avg
andcpu_avg
.In plain words, this query aggregates over the processes of the Elastic Agent (elastic-agent, filebeat and metricbeat), takes the average of system.process.memory.size for each process, and then sums these averages together.
The problem is that
elastic_agent.process
is not unique per Beat. For example, with a setup as described in the steps above, runningsudo elastic-agent status --output=full
shows that thesystem
integration and monitoring runs 3 metricbeat instances (system/metrics-default
,http/metrics-monitoring
,beat/metrics-monitoring
) and 2 filebeat instances (filestream-monitoring
,log-default
):Output of elastic-agent status --output=full
See also this comment for added context and details.
It is possible (and helpful) to play with the query in the Console in order to tweak the aggregation. Here is a simplified version (memory only):
Agent memory query
Acceptance criteria
Tasks
The text was updated successfully, but these errors were encountered: