-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental self-profiling #2839
Conversation
630c4a3
to
dc31cca
Compare
a77ebf9
to
c0dfd59
Compare
77f5824
to
f7b934d
Compare
Codecov Report
@@ Coverage Diff @@
## master #2839 +/- ##
=======================================
Coverage 78.47% 78.47%
=======================================
Files 89 89
Lines 4711 4711
=======================================
Hits 3697 3697
Misses 1014 1014 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind to create issues for the TODO
comments and link to them?
I am missing some integration tests. We usually have some integration test covering the incoming payload being processed and transformed. You can have a look at some integration tests where we capture the event before calling the libbeat publishing.
We also usually add a test to the system_tests
, checking that the actual ingestion works as expected (ES template + ingested event fit together, ..)
Do the agent devs have a possibility to get the specs for how the payload has to look like? Usually they pull the json spec and test against it, so they don't have to actually run the APM Server in their integration tests.
👍 Looking into this now.
Yes, it's protobuf encoded: https://github.com/google/pprof/blob/master/proto/profile.proto. If/when we open this feature up to applications as well as the server, then this will be better documented. |
@simitt I've added integration tests. I'll try and have a look at system tests next week. |
"profile": { | ||
"cpu.ns": 20000000, | ||
"duration": 5100778105, | ||
"id": "d58847c89c275eba", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the id
of a profile be unique, and the timestamp
usually differ per profile? Is it possible to extract 2 or 3 unique profiles per example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A profile is just a collection of samples, and each document corresponds to one sample. I added the profile.id
field so that we could group all samples by the profile that they came from.
I've since removed this field for a few reasons:
- It's not needed right now
- We can potentially use the
@timestamp
field instead (all samples from a profile will have the same timestamp value) - The field won't be very meaningful if we start performing additional aggregation of samples in the server
0fc203d
to
8c44037
Compare
Jenkins run the tests please |
I added a couple of system tests that show that with appropriate config, the server will profile its own CPU and heap usage. While I was doing that I found that there was an issue :) I'd missed adding "profile" as an event type to the |
1d70cbf
to
be3eec8
Compare
Thanks a bunch for bringing this to the server! |
be3eec8
to
ab5e8d9
Compare
a64bb70
to
270266e
Compare
Already being used by Beats.
Not needed for now, and will get in the way of storage optimisations (aggregating multiple profiles) later.
270266e
to
c600819
Compare
* vendor: add github.com/google/pprof/profile * model/profile: add profile data model * config: config for profiling the server * beater: add profiling endpoint * vendor: add github.com/OneOfOne/xxhash Already being used by Beats. * model: fix fields * beater/api/profile: address review comments * beater/config: split InstrumentationConfig out * decoder: add LimitedReader * beater/api/profile: use decoder.LimitedReader * Add integration test * idxmgmt: add "profile" event index * tests/system: add system tests for profiling * model/profile: remove profile.id Not needed for now, and will get in the way of storage optimisations (aggregating multiple profiles) later.
* vendor: add github.com/google/pprof/profile * model/profile: add profile data model * config: config for profiling the server * beater: add profiling endpoint * vendor: add github.com/OneOfOne/xxhash Already being used by Beats. * model: fix fields * beater/api/profile: address review comments * beater/config: split InstrumentationConfig out * decoder: add LimitedReader * beater/api/profile: use decoder.LimitedReader * Add integration test * idxmgmt: add "profile" event index * tests/system: add system tests for profiling * model/profile: remove profile.id Not needed for now, and will get in the way of storage optimisations (aggregating multiple profiles) later.
This PR adds experimental support for continuously profiling the server itself, and recording the data in Elasticsearch.
By default the feature is disabled, and must be explicitly enabled through the config file. There is a new
/intake/v2/profile
endpoint, which is enabled when CPU or heap profiling is enabled.We emit an event per profile sample, roughly as described in the proposal document. Some of the field names have changed. Each document includes a hash of the function names in the call stack. We use https://github.com/OneOfOne/xxhash for hashing, which is both faster and of higher quality than the FNV1a algorithm used in the prototype.
TODO: