Experimental self-profiling #2839

axw · 2019-10-22T08:41:47Z

This PR adds experimental support for continuously profiling the server itself, and recording the data in Elasticsearch.

By default the feature is disabled, and must be explicitly enabled through the config file. There is a new /intake/v2/profile endpoint, which is enabled when CPU or heap profiling is enabled.

We emit an event per profile sample, roughly as described in the proposal document. Some of the field names have changed. Each document includes a hash of the function names in the call stack. We use https://github.com/OneOfOne/xxhash for hashing, which is both faster and of higher quality than the FNV1a algorithm used in the prototype.

~~TODO:~~

complete and merge the Go agent branch
disable the profile endpoint by default
define response format for profile endpoint
generate a unique ID for each profile, so we can obtain samples for a specific profile
add tests

codecov-io · 2019-11-21T07:21:14Z

Codecov Report

Merging #2839 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #2839   +/-   ##
=======================================
  Coverage   78.47%   78.47%           
=======================================
  Files          89       89           
  Lines        4711     4711           
=======================================
  Hits         3697     3697           
  Misses       1014     1014

simitt

Would you mind to create issues for the TODO comments and link to them?

I am missing some integration tests. We usually have some integration test covering the incoming payload being processed and transformed. You can have a look at some integration tests where we capture the event before calling the libbeat publishing.
We also usually add a test to the system_tests, checking that the actual ingestion works as expected (ES template + ingested event fit together, ..)

Do the agent devs have a possibility to get the specs for how the payload has to look like? Usually they pull the json spec and test against it, so they don't have to actually run the APM Server in their integration tests.

beater/config/config.go

model/profile/_meta/fields.yml

beater/api/profile/handler.go

model/profile/_meta/fields.yml

axw · 2019-11-22T10:20:14Z

Would you mind to create issues for the TODO comments and link to them?

#2954
#2955

I am missing some integration tests. We usually have some integration test covering the incoming payload being processed and transformed. You can have a look at some integration tests where we capture the event before calling the libbeat publishing.
We also usually add a test to the system_tests, checking that the actual ingestion works as expected (ES template + ingested event fit together, ..)

👍 Looking into this now.

Do the agent devs have a possibility to get the specs for how the payload has to look like? Usually they pull the json spec and test against it, so they don't have to actually run the APM Server in their integration tests.

Yes, it's protobuf encoded: https://github.com/google/pprof/blob/master/proto/profile.proto. If/when we open this feature up to applications as well as the server, then this will be better documented.

axw · 2019-11-22T11:11:13Z

@simitt I've added integration tests. I'll try and have a look at system tests next week.

simitt · 2019-11-25T08:29:27Z

beater/test_approved_es_documents/TestPublishIntegrationProfileCPUProfile.approved.json

+            "profile": {
+                "cpu.ns": 20000000,
+                "duration": 5100778105,
+                "id": "d58847c89c275eba",


Shouldn't the id of a profile be unique, and the timestamp usually differ per profile? Is it possible to extract 2 or 3 unique profiles per example?

A profile is just a collection of samples, and each document corresponds to one sample. I added the profile.id field so that we could group all samples by the profile that they came from.

I've since removed this field for a few reasons:

It's not needed right now

We can potentially use the @timestamp field instead (all samples from a profile will have the same timestamp value)

The field won't be very meaningful if we start performing additional aggregation of samples in the server

axw · 2019-11-27T05:54:42Z

Jenkins run the tests please

axw · 2019-11-27T06:11:22Z

We also usually add a test to the system_tests, checking that the actual ingestion works as expected (ES template + ingested event fit together, ..)

I added a couple of system tests that show that with appropriate config, the server will profile its own CPU and heap usage.

While I was doing that I found that there was an issue :) I'd missed adding "profile" as an event type to the idxmgmt package.

idxmgmt/indices.go

simitt · 2019-11-27T11:45:27Z

Thanks a bunch for bringing this to the server!

Already being used by Beats.

Not needed for now, and will get in the way of storage optimisations (aggregating multiple profiles) later.

* vendor: add github.com/google/pprof/profile * model/profile: add profile data model * config: config for profiling the server * beater: add profiling endpoint * vendor: add github.com/OneOfOne/xxhash Already being used by Beats. * model: fix fields * beater/api/profile: address review comments * beater/config: split InstrumentationConfig out * decoder: add LimitedReader * beater/api/profile: use decoder.LimitedReader * Add integration test * idxmgmt: add "profile" event index * tests/system: add system tests for profiling * model/profile: remove profile.id Not needed for now, and will get in the way of storage optimisations (aggregating multiple profiles) later.

simitt added the [zube]: In Review label Oct 23, 2019

axw force-pushed the server-profiling branch 3 times, most recently from 630c4a3 to dc31cca Compare November 21, 2019 03:38

axw changed the base branch from 7.x to master November 21, 2019 03:39

axw force-pushed the server-profiling branch 2 times, most recently from a77ebf9 to c0dfd59 Compare November 21, 2019 04:10

axw marked this pull request as ready for review November 21, 2019 04:15

axw force-pushed the server-profiling branch 6 times, most recently from 77f5824 to f7b934d Compare November 21, 2019 07:09

simitt reviewed Nov 22, 2019

View reviewed changes

simitt reviewed Nov 25, 2019

View reviewed changes

axw force-pushed the server-profiling branch from 0fc203d to 8c44037 Compare November 27, 2019 03:32

simitt reviewed Nov 27, 2019

View reviewed changes

idxmgmt/indices.go Show resolved Hide resolved

axw mentioned this pull request Nov 27, 2019

approvals: use go-cmp instead of gojsondiff #2968

Merged

axw force-pushed the server-profiling branch from 1d70cbf to be3eec8 Compare November 27, 2019 09:59

simitt approved these changes Nov 27, 2019

View reviewed changes

axw force-pushed the server-profiling branch from be3eec8 to ab5e8d9 Compare November 27, 2019 12:02

vendor: add github.com/google/pprof/profile

e339a99

axw force-pushed the server-profiling branch from a64bb70 to 270266e Compare November 29, 2019 08:57

axw added 13 commits November 29, 2019 17:53

vendor: add github.com/OneOfOne/xxhash

39378ef

Already being used by Beats.

model/profile: add profile data model

ed96095

config: config for profiling the server

6d69a11

beater: add profiling endpoint

db3cbf4

model: fix fields

73b79a0

beater/api/profile: address review comments

f08c60c

beater/config: split InstrumentationConfig out

5a528a2

decoder: add LimitedReader

ff03e9b

beater/api/profile: use decoder.LimitedReader

13710a2

Add integration test

2c6b960

idxmgmt: add "profile" event index

4fe749e

model/profile: remove profile.id

caf9c15

Not needed for now, and will get in the way of storage optimisations (aggregating multiple profiles) later.

tests/system: add system tests for profiling

c600819

axw force-pushed the server-profiling branch from 270266e to c600819 Compare November 29, 2019 09:54

axw merged commit 2b1f9e9 into elastic:master Nov 29, 2019

axw deleted the server-profiling branch November 29, 2019 11:27

zube bot added [zube]: Done and removed [zube]: In Review labels Nov 29, 2019

graphaelli mentioned this pull request Dec 11, 2019

[7.x] Experimental self-profiling (#2839) #3037

Merged

dgieselaar mentioned this pull request Feb 19, 2021

[APM] Profiling elastic/kibana#91818

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental self-profiling #2839

Experimental self-profiling #2839

axw commented Oct 22, 2019 •

edited

Loading

codecov-io commented Nov 21, 2019 •

edited

Loading

simitt left a comment

axw commented Nov 22, 2019

axw commented Nov 22, 2019

simitt Nov 25, 2019

axw Nov 27, 2019

axw commented Nov 27, 2019

axw commented Nov 27, 2019

simitt commented Nov 27, 2019

Experimental self-profiling #2839

Experimental self-profiling #2839

Conversation

axw commented Oct 22, 2019 • edited Loading

codecov-io commented Nov 21, 2019 • edited Loading

Codecov Report

simitt left a comment

Choose a reason for hiding this comment

axw commented Nov 22, 2019

axw commented Nov 22, 2019

simitt Nov 25, 2019

Choose a reason for hiding this comment

axw Nov 27, 2019

Choose a reason for hiding this comment

axw commented Nov 27, 2019

axw commented Nov 27, 2019

simitt commented Nov 27, 2019

axw commented Oct 22, 2019 •

edited

Loading

codecov-io commented Nov 21, 2019 •

edited

Loading