Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CPU profiling and component-level diagnostics #3118

Merged
merged 10 commits into from
Jul 28, 2023

Conversation

fearful-symmetry
Copy link
Contributor

What does this PR do?

Follow-up to elastic/elastic-agent-client#80

Part of #2140 and #2141

NOTE: Before the above two issues are actually closed, after we merge this we also need to update elastic-agent-client in beats itself.

This PR accomplishes a few things:

  • makes changes to the agent control protocol to add a DiagnosticComponentsRequest call and an AdditionalDiagnosticRequest request field. This allows us to request an optional CPU profile to the diagnostic bundle.
  • A number of changes to the coordinator, diagnostics command, and manager to support component-level diagnostics.
  • An additional integration test
  • Increase the timeout for diagnostic requests
  • Restructures the .zip files that elastic-agent produces, so component-level diagnostics are properly placed in the components/ directory, and not copied per-unit
  • Add a CPU profile diag hook for the agent itself

It should be noted that codepath for component diagnostics is different from the normal unit-diagnostics in a number of ways:

  • The manager PerformComponentDiagnostics method returns an error value, the corresponding unit-level diagnostic does not
  • The component level diagnostic aggregator works in parallel, spawning a go routine for every component that requires diagnostics. As CPU profiles run for a set period of time, this is kind of needed so diagnostic request doesn't take 3 minutes.

Why is it important?

We want CPU profiling in diagnostics

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

  • Pull down and build in whatever way you normally would
  • run elastic-agent diagnostics -p

@fearful-symmetry fearful-symmetry added the Team:Elastic-Agent Label for the Agent team label Jul 24, 2023
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner July 24, 2023 17:16
@fearful-symmetry fearful-symmetry self-assigned this Jul 24, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@mergify
Copy link
Contributor

mergify bot commented Jul 24, 2023

This pull request does not have a backport label. Could you fix it @fearful-symmetry? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 24, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-28T17:17:35.486+0000

  • Duration: 27 min 7 sec

Test stats 🧪

Test Results
Failed 0
Passed 6153
Skipped 31
Total 6184

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 24, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.718% (77/78) 👍
Files 68.75% (187/272) 👍 0.093
Classes 67.659% (341/504) 👎 -0.349
Methods 54.064% (1071/1981) 👎 -0.266
Lines 40.262% (12296/30540) 👎 -0.113
Conditionals 100.0% (0/0) 💚

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good.

A couple comments, I believe the one for stoping the CPU profile on context cancellation needs to be fixed before merge.

internal/pkg/diagnostics/diagnostics.go Outdated Show resolved Hide resolved
internal/pkg/diagnostics/diagnostics.go Show resolved Hide resolved
pkg/control/v2/server/server.go Show resolved Hide resolved
pkg/control/v2/server/server.go Outdated Show resolved Hide resolved
Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Team:Elastic-Agent Label for the Agent team
Projects
None yet
3 participants