-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remote agent pprof endpoints #6841
Conversation
4d58bb3
to
5a2eec2
Compare
b7924d5
to
9358f64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks great! Mostly docs/wording/style comments that are not blockers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite meaty - great thinking through so many cases and conditions. I have many stylistic nitpicks though.
I'd be curious if we have considered using a streaming RPC approach with command/agent/profile
effectively invoking pprof.handles - we can have a wrapper RequestHandler that stream results to httpserver directly. The logging endpoints might be a pattern to follow here? Doing so would allow us to keep parity with pprof endpoints handling and avoid loading entire profile in memory. I don't have a sense of how big the profiles would be (i guess memory related once can very large in a busy cluster).
df82bd4
to
92b5140
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - thanks.
@notnoop I've been doing some testing on how large the profiles can get on a busy server. on a t2.2xl server with 27/31 gb utilized (all pending nomad jobs) I've gotten the following results. Trace is by far the largest and grows with duration of the request. I'm wondering if that's small enough to ease your concerns around streaming or if its something we should still consider doing in the near term. cc @schmichael trace of 55 seconds -> 72 Mb profile
goroutines -> 8.8k
Heap 97k
profile -> 9k
|
92b5140
to
5517743
Compare
wip, agent endpoint and client endpoint for pprof profiles agent endpoint test
Return rpc errors for profile requests, set up remote forwarding to target leader or server id for profile requests. server forwarding, endpoint tests
rename implementation method
m -> a receiver name return codederrors, fix query
tidy up, add comments clean up seconds param assignment
helper func to return serverPart based off of serverID
prevent region forwarding loop, backfill tests fix failing test
Passes in agent enable_debug config to nomad server and client configs. This allows for rpc endpoints to have more granular control if they should be enabled or not in combination with ACLs. enable debug on client test
fix test expectation test wrapNonJSON
Address pr feedback, rename profile package to pprof to more accurately describe its purpose. Adds gc param for heap lookup profiles.
comment why we ignore errors parsing params
5517743
to
a58b8a5
Compare
@drewbailey Can you create an issue for streaming traces (and make sure it has a link to the discussion here)? As discussed it's not something I think we need to prioritize, but it might make a good starter issue for someone wanting to learn the RPC internals. Or in the future if there's a tool that expects tracing to be a stream, it'd be important to update our implementation. |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
This PR adds server and client rpc endpoints to allow operators to generate pprof reports for any given node or server as long as they have proper acl privileges.
A new HTTP endpoint
/v1/agent/pprof/
acts as typical golang pprof endpoints https://golang.org/pkg/net/http/pprof/ but forwards the request to remote nodes.TODO