-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/pprofessor: introduce pprof-based tool #4017
Conversation
💚 Build SucceededExpand to view the summary
Build stats
Test stats 🧪
Steps errorsExpand to view the steps failures
|
pprofessor is based on the pprof tool, using an alternative "Fetcher" implementation that queries Elasticsearch to aggregate profile samples recorded by APM Server. I haven't gone through all of the pprof flags to make sure they work, so some may not make sense. There are a couple of flags added specifically for our use case: * -service, which defines the service name to filter on. * -start, which defines the start timestamp from which to start aggregating. If specified along with -duration, the latter controls the end time. If only -start is specified, then the end time is now. If only -duration is specified, then the end time is now and the start time is duration seconds before now. We record the number of profiles and documents aggregated in profile comments. All metrics (currently: cpu, inuse/allocated heap objects/space) are aggregated into a single profile. To aggregate duration correctly, we introduce the `profile.id` field, which is a unique ID generated per profile, shared by all sample docs derived from a profile.
This doesn't seem to work for me with the 7.8 stack:
A few other notes/questions:
|
@jalvz thanks for the review!
That's probably because there's a composite aggregation on
I can add a basic README. Until then: my intention was for the tool to be used with the same (branch) stack & server version. Otherwise, no additional dependencies apart from pprof's (which includes Graphviz).
I'll set "apm-server" as the default.
That's one of the standard
This is part of the pprof tool: the "source" (i.e. Elasticsearch URL in this case) is a positional argument.
Sure, I'll remove it until it's needed.
Are you referring to the wording in
The reason for having struct types for responses is because we need to operate on the results. The requests are just JSON marshalled, so it doesn't really matter if they're structs or maps. I did what's simplest.
-1 on separate package, see https://dave.cheney.net/practical-go/presentations/qcon-china.html#_consider_fewer_larger_packages If you think it's really worthwhile I can create a separate file, but I don't think it is. The types are used in exactly one place. |
Don't follow. Did
Hmm, I actually had to apt-install it... |
Looking at the error message again, maybe I introduced a bug in between testing and proposing this PR. I'll look into it and come back.
Sorry, what I meant was "no additional dependencies other than pprof's dependencies, among which is Graphviz" -- not that it's included within the pprof binary. |
This handles the case where there's no `profile.id` keyword field, e.g. because the template hasn't been overwritten.
Update to match the wording for "profile.duration".
I hadn't tested under the scenario where there was an existing index/template without the I've added a README and (hopefully) clarified the meaning of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the readme and changes!
Merging without a changelog entry, as this is not intended for end users. |
* cmd/pprofessor: introduce pprof-based tool pprofessor is based on the pprof tool, using an alternative "Fetcher" implementation that queries Elasticsearch to aggregate profile samples recorded by APM Server. I haven't gone through all of the pprof flags to make sure they work, so some may not make sense. There are a couple of flags added specifically for our use case: * -service, which defines the service name to filter on. * -start, which defines the start timestamp from which to start aggregating. If specified along with -duration, the latter controls the end time. If only -start is specified, then the end time is now. If only -duration is specified, then the end time is now and the start time is duration seconds before now. We record the number of profiles and documents aggregated in profile comments. All metrics (currently: cpu, inuse/allocated heap objects/space) are aggregated into a single profile. To aggregate duration correctly, we introduce the `profile.id` field, which is a unique ID generated per profile, shared by all sample docs derived from a profile. * cmd/pprofessor: set default service name * cmd/pprofessor: remove -tls_ca handling * cmd/pprofessor: rename Fetcher src param * cmd/pprofessor: don't set nil "after" in composite This handles the case where there's no `profile.id` keyword field, e.g. because the template hasn't been overwritten. * cmd/pprofessor: add minimal README * model/profile: update wording for "profile.id" Update to match the wording for "profile.duration".
* cmd/pprofessor: introduce pprof-based tool pprofessor is based on the pprof tool, using an alternative "Fetcher" implementation that queries Elasticsearch to aggregate profile samples recorded by APM Server. I haven't gone through all of the pprof flags to make sure they work, so some may not make sense. There are a couple of flags added specifically for our use case: * -service, which defines the service name to filter on. * -start, which defines the start timestamp from which to start aggregating. If specified along with -duration, the latter controls the end time. If only -start is specified, then the end time is now. If only -duration is specified, then the end time is now and the start time is duration seconds before now. We record the number of profiles and documents aggregated in profile comments. All metrics (currently: cpu, inuse/allocated heap objects/space) are aggregated into a single profile. To aggregate duration correctly, we introduce the `profile.id` field, which is a unique ID generated per profile, shared by all sample docs derived from a profile. * cmd/pprofessor: set default service name * cmd/pprofessor: remove -tls_ca handling * cmd/pprofessor: rename Fetcher src param * cmd/pprofessor: don't set nil "after" in composite This handles the case where there's no `profile.id` keyword field, e.g. because the template hasn't been overwritten. * cmd/pprofessor: add minimal README * model/profile: update wording for "profile.id" Update to match the wording for "profile.duration".
* cmd/pprofessor: introduce pprof-based tool pprofessor is based on the pprof tool, using an alternative "Fetcher" implementation that queries Elasticsearch to aggregate profile samples recorded by APM Server. I haven't gone through all of the pprof flags to make sure they work, so some may not make sense. There are a couple of flags added specifically for our use case: * -service, which defines the service name to filter on. * -start, which defines the start timestamp from which to start aggregating. If specified along with -duration, the latter controls the end time. If only -start is specified, then the end time is now. If only -duration is specified, then the end time is now and the start time is duration seconds before now. We record the number of profiles and documents aggregated in profile comments. All metrics (currently: cpu, inuse/allocated heap objects/space) are aggregated into a single profile. To aggregate duration correctly, we introduce the `profile.id` field, which is a unique ID generated per profile, shared by all sample docs derived from a profile. * cmd/pprofessor: set default service name * cmd/pprofessor: remove -tls_ca handling * cmd/pprofessor: rename Fetcher src param * cmd/pprofessor: don't set nil "after" in composite This handles the case where there's no `profile.id` keyword field, e.g. because the template hasn't been overwritten. * cmd/pprofessor: add minimal README * model/profile: update wording for "profile.id" Update to match the wording for "profile.duration".
Motivation/summary
pprofessor is based on the pprof tool, using an alternative
"Fetcher" implementation that queries Elasticsearch to
aggregate continuous profiling samples recorded by APM Server.
I haven't gone through all of the pprof flags to make sure
they work, so some may not make sense. There are a couple of
flags added specifically for our use case:
-service
, which defines the service name to filter on.-start
, which defines the start timestamp (or date math)from which to start aggregating. If specified along with
-seconds
, the latter controls the end time. If only-start
is specified, then the end time is now. If only
-seconds
isspecified, then the end time is now and the start time is duration
seconds before now.
We record the number of profiles and documents aggregated in
profile comments. All metrics (currently: cpu, inuse/allocated
heap objects/space) are aggregated into a single profile.
To aggregate duration correctly, we introduce the
profile.id
field, which is a unique ID generated per profile, shared by
all sample docs derived from a profile.
No automated tests for now, and comes with no warranty or support.
Note that there is a known issue with how the Go Agent reports "alloc"
samples: it reports ever-increasing counters, rather than deltas. We
need to change this, so we can visualise the allocations within a define
time range. See elastic/apm-agent-go#708
Checklist
- [ ] I have updated CHANGELOG.asciidocI have considered changes for:
- [ ] documentation- [ ] logging (add log lines, choose appropriate log selector, etc.)- [ ] metrics and monitoring (create issue for Kibana team to add metrics to visualizations, e.g. Kibana#44001)- [ ] telemetry- [ ] Elasticsearch Service (https://cloud.elastic.co)- [ ] Elastic Cloud Enterprise (https://www.elastic.co/products/ece)- [ ] Elastic Cloud on Kubernetes (https://www.elastic.co/elastic-cloud-kubernetes)How to test these changes
apm-server -E apm-server.instrumentation.enabled=true -E apm-server.instrumentation.profiling.cpu.enabled=true -E apm-server.instrumentation.profiling.cpu.interval=10s -E apm-server.instrumentation.profiling.heap.enabled=true
go run ./cmd/pprofessor -start=now-1h -service=apm-server --http=:6060 http://admin:changeme@localhost:9200
Demo
Related issues
Closes #3828