-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
command/operator_debug: add pprof interval #11938
command/operator_debug: add pprof interval #11938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @danishprakash! Overall this is looking pretty good. I've left some comments, but I'm also going to rope in @davemay99 for a review as he's kind of the internal "owner" of this feature.
27b5d0b
to
b5840d7
Compare
b5840d7
to
f2b7865
Compare
f2b7865
to
107f477
Compare
107f477
to
e36f1ca
Compare
e36f1ca
to
ef538ab
Compare
Hey @danishprakash Just checking in to see if you're still interested in working on this. We think this'd be a great change and would love to see this get over the hump 😄 |
Hey @Amier3, I missed following up on this, I've pushed some updates, will try to close this soon :) Thanks for the heads up. |
Signed-off-by: danishprakash <[email protected]>
Signed-off-by: danishprakash <[email protected]>
Signed-off-by: danishprakash <[email protected]>
a4a0da4
to
215c80a
Compare
215c80a
to
c1f41c2
Compare
c1f41c2
to
e7c0317
Compare
e7c0317
to
23f26fb
Compare
23f26fb
to
69a473f
Compare
We introduced a `pprof-interval` argument to `operator debug` in #11938, and unfortunately this has resulted in a lot of test flakes. The actual command in use is mostly fine (although I've fixed some quirks here), so what's really happened is that the change has revealed some existing issues in the tests. Summary of changes: * Make first pprof collection synchronous to preserve the existing behavior for the common case where the pprof interval matches the duration. * Clamp `operator debug` pprof timing to that of the command. The `pprof-duration` should be no more than `duration` and the `pprof-interval` should be no more than `pprof-duration`. Clamp the values rather than throwing errors, which could change the commands that existing users might already have in debugging scripts * Testing: remove test parallelism The `operator debug` tests that stand up servers can't be run in parallel, because we don't have a way of canceling the API calls for pprof. The agent will still be running the last pprof when we exit, and that breaks the next test that talks to that same agent. (Because you can only run one pprof at a time on any process!) We could split off each subtest into its own server, but this test suite is already very slow. In future work we should fix this "for real" by making the API call cancelable. * Testing: assert against unexpected errors in `operator debug` tests. If we assert there are no unexpected error outputs, it's easier for the developer to debug when something is going wrong with the tests because the error output will be presented as a failing test, rather than just a failing exit code check. Or worse, no failing exit code check! This also forces us to be explicit about which tests will return 0 exit codes but still emit (presumably ignorable) error outputs. Additional minor bug fixes (mostly in tests) and test refactorings: * Fix text alignment on pprof Duration in `operator debug` output * Remove "done" channel from `operator debug` event stream test. The goroutine we're blocking for here already tells us it's done by sending a value, so block on that instead of an extraneous channel * Event stream test timer should start at current time, not zero * Remove noise from `operator debug` test log output. The `t.Logf` calls already are picked out from the rest of the test output by being prefixed with the filename. * Remove explicit pprof args so we use the defaults clamped from duration/interval
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Closes #11514