Configure stack output order #84

aparamon · 2019-01-28T11:10:49Z

With #74 it is now possible to output stacks for visualization with flamegraph.pl or e.g. Speedscope. The latter respects stack order in the input file ("Time Order") by default.

PR #76 is an early attempt to introduce logical stack order: stacks are output in the order they were first encountered. Often it is a gross improvement over alphanumeric (as in flamegraph.pl) or frequency (as in Speedscope's "Left Heavy") sorting, as the logical flow of program is better preserved.

It is now proposed to implement a general algorithm and set of options to specify desired stack output order. Below is prototype code that post-processes py-spy stacks (with #74, #76 enabled). The implementation uses different (dict-of-dicts tree) data structure but should be ~ as efficient as currently-used HashMap of stack-hash->count.

stackshuffle.py

import sys
from collections import defaultdict

class StackTree:
    def __init__(self):
        self.subtree = defaultdict(StackTree)
        self.total = 0

root = StackTree()

# Read stacks

for line in sys.stdin:
    *stack, n = line.split(';')
    n = int(n)
    tree = root
    for item in (*stack, None):
        tree = tree.subtree[item]
        tree.total += n
    root.total += n

# Consistency checks

def check_total(tree):
    assert tree.total == sum(subtree.total for subtree in tree.subtree.values())
    for item, subtree in tree.subtree.items():
        if item is not None:
            # recurse
            check_total(subtree)
        else:
            # terminator
            assert len(subtree.subtree) == 0
            assert subtree.total > 0

check_total(root)

# Sorted output

def getstacks(tree, key):
    for item, subtree in sorted(tree.subtree.items(), key=key):
        if item is not None:
            # recurse
            for substack, n in getstacks(subtree, key):
                yield (item, *substack), n
        else:
            # terminator
            yield (), subtree.total

if len(sys.argv) > 1 and sys.argv[1] == '--heavy':
    # left heavy
    key = lambda kv: -kv[1].total
elif len(sys.argv) > 1 and sys.argv[1] == '--alnum':
    # alphanumeric
    key = lambda kv: kv[0] or ''
else:
    # preserve natural occurence order
    key = lambda kv: 0

for stack, n in getstacks(root, key):
    print('{}; {}'.format(';'.join(stack), n))

The profiled code is basically pytest case with the following logic:

for f in test_examples:
    session.put('http://server/v1/data/{}'.format(f),
                files={'file': open(f)})
for f in test_examples:
    session.get('http://server/v1/slice/{}'.format(f))
    session.get('http://server/v1/spec/{}'.format(f))

To compare different sorting schemes, please drag-and-drop into https://www.speedscope.app:

stacks.txt (stack order by first occurrence, Output stacks in order they were encountered #76):
Stack order is mostly logical: getfixutevalue() followed by put() and then get(). Only in the end slice and spec stacks become "fragmented" (can be fixed by stackshuffle.py, see time.txt below).
alnum.txt (python stackshuffle.py --alnum < stacks.txt > alnum.txt):
Alphanumeric sorting, similar to what flamegraph.pl does.
Order illogical, e.g. get() before put(); but useful for locating particular function quickly.
heavy.txt (python stackshuffle.py --heavy < stacks.txt > heavy.txt):
Frequency descending sort order, similar to Speedscope's "Left Heavy".
Useful for quickly seeing most significant time eaters, but illogical (get() before put(), spec before slice, getfixturevalue() in the end).
time.txt (python stackshuffle.py < stacks.txt > time.txt):
Program logical flow preserved perfectly :-)

Attached prototype stackshuffle.py implements the latter 3 options (different only by specification of sort function key).

I'm volunteering to integrate into py-spy! (after #74 lands in master)

The text was updated successfully, but these errors were encountered:

aparamon mentioned this issue Jan 28, 2019

Aggregation in default "time-order" view jlfwong/speedscope#195

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure stack output order #84

Configure stack output order #84

aparamon commented Jan 28, 2019 •

edited

Loading

Configure stack output order #84

Configure stack output order #84

Comments

aparamon commented Jan 28, 2019 • edited Loading

aparamon commented Jan 28, 2019 •

edited

Loading