Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiles collected by diagnostics should be per component (or process) not per unit #2141

Closed
cmacknz opened this issue Jan 18, 2023 · 2 comments · Fixed by #3118
Closed

Profiles collected by diagnostics should be per component (or process) not per unit #2141

cmacknz opened this issue Jan 18, 2023 · 2 comments · Fixed by #3118
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@cmacknz
Copy link
Member

cmacknz commented Jan 18, 2023

We are duplicating the profiles per unit, when the profiles apply at the process level and they should likely be at the component level. For example looking at a log component implemented by Filebeat will show the following where the per unit profiles don't add any value.

tree diagnostics/components/log-default/
├── log-default
│   ├── allocs.txt
│   ├── beat-rendered-config.yml
│   ├── block.txt
│   ├── goroutine.txt
│   ├── heap.txt
│   ├── mutex.txt
│   └── threadcreate.txt
└── logfile-system-b856919c-ade6-4a47-9e0c-28d4822b7ada
    ├── allocs.txt
    ├── beat-rendered-config.yml
    ├── block.txt
    ├── goroutine.txt
    ├── heap.txt
    ├── mutex.txt
    └── threadcreate.txt
@cmacknz cmacknz added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels Jan 18, 2023
@fearful-symmetry
Copy link
Contributor

Running into this while working on #2140, and there's a series of implementation and API issues that kind of lead to this:

  1. In implementation, the elastic-agent-client will run any diagnostic callback (including the pprof diagnostics mentioned above) that are registered at the client level, and run them per-unit
  2. in the protobuf API between the agent CLI and the agent server, the actual API call is DiagnosticUnits and it operates at the level of individual units; there's no component-level DiagnosticComponents protobuf call or anything.
  3. In the protobuf API between the agent and the components, the diagnostic gets compressed down to a single ActionRequest_DIAGNOSTICS type.

So, I feel like the correct way to fix this would be to have a DiagnosticComponents API call, and then leverage the ActionRequest struct in some way so the client can differentiate between unit and component-level diagnostics. At that point we can properly fix the first issue of client-level callback registration.

@cmacknz
Copy link
Member Author

cmacknz commented Jul 13, 2023

Yeah I think the diagnostics API should have accounted for component diagnostics initially, but since it didn't the next best option is a DiagnosticsComponents API call.

We need a way to support process global diagnostics like CPU and heap profiles and this is the only way that makes sense long term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants