Rich support for labeled samples #3830
Replies: 1 comment
-
Thinking about this slightly more, the idea of a PromQL-inspired function for inserting the virtual stack frames is growing on me. Something like I also have a use case where I insert multiple virtual frames. (A practical example would be sub-categories. Your |
Beta Was this translation helpful? Give feedback.
-
I have a general use case for labeling individual stacks/samples (in addition to labeling entire profiles, which Pyroscope already supports).
For example, I want to annotate the category of a sample from its stack frames. Example categories may include things like metrics, garbage collection, authentication, http dispatch, etc.
Let's assume for the point of discussion and for simplification that labels are created outside the purview of Pyroscope. We may want to assume they are defined on the pprof Sample message's label field.
There are a few goals I'd like to accomplish with sample labels:
A solution to this problem - which I've already implemented - is to insert virtual frames at the base of stacktraces denoting the label. I then publish a variant profile (e.g.
cpu_categorized
) where the flamegraphs are naturally aggregated by label.Here's a partial screenshot of what this looks like for a JVM application.
I cannot overstate the utility of the labeled profiles. The flamegraphs clearly show things like which logical functionality is consuming resources without people having to expend cognitive effort to map stacks back to logical role. You can look at the evolution of a label over time and see if things are getting faster or slower.
While virtual stack frames do work as a solution to this problem (they certainly solve the visualization aspect nicely), they don't work well with Pyroscope's current storage backend.
Each virtual frame / label effectively constitutes a copy of every stacktrace. That's because stacktraces are stored as a tree in order to achieve deduplication. Inserting new virtual frames at the base of the stack effectively copies the entire tree and doubles the tree node count. This in turn leads to some pathological behavior of Pyroscope. (@kolesnikovae and I debugged this privately.) Virtual frames in the stored stacktraces seem to break fundamental assumptions about how to store stacktraces efficiently.
@kolesnikovae encouraged me to start a discussion about better technical solutions to the problem. So here I am.
One idea that @kolesnikovae had was to somehow associate labels with stacks and then have a way for the Pyroscope UI to pull out labels into flamegraphs, similar to what I'm doing with virtual base frames. But it would be done dynamically. The query language could also be extended to allow querying for stacks having labels, enabling you to investigate all stacks with a given label.
With today's Pyroscope, you could logically break apart 1 profile into N profiles, each with a different profile label value. This allows you to query individual labels. And without a label filter, samples would aggregate into the original profile. But it stops short of grouping labels together in the flamegraph (via virtual base stack frame insertion). (I intend to try this experiment to work around stacktrace storage performance issues.)
An interesting idea would be UI or a query language extension that allows you to turn profile label values into virtual base stack frames. These would naturally aggregate in the flamegraph, allowing you to see contribution of each. e.g.
process_cpu:cpu:nanoseconds:cpu:nanoseconds{service_name="foo"} by (category)
, where the newby (category)
syntax would insert a virtual base frame for the profile'scategory
label value.Beta Was this translation helpful? Give feedback.
All reactions