Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add speedscope renderer #160

Merged
merged 38 commits into from
Nov 14, 2021
Merged

Conversation

goxberry
Copy link
Contributor

This pull request adds a renderer that outputs JSON conforming to speedscope's JSON schema; in particular, speedscope's "evented" format is used.

This implementation is very rough, and intended as a request for comment. I expect to revise this pull request based on feedback from the project maintainer(s) to better conform with project style.

Closes #89.

Copy link
Owner

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @goxberry ! This could be a nice addition. I haven't had time for a full review, but here are some thoughts.

Comment on lines 35 to 43
def encode_speedscope_frame(sframe: SpeedscopeFrame) -> str:
"""Returns a string encoding a SpeedscopeFrame as a JSON object."""

property_decls: list[str] = []
property_decls.append('"name": %s' % encode_str(sframe.name))
property_decls.append('"file": %s' % encode_str(sframe.file))
property_decls.append('"line": %d' % sframe.line)

return "{%s}" % ",".join(property_decls)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing jsonmodule.py uses this structure to encode JSON due to limits on the depth of objects that the built-in json module can't handle. This code would be neater just using json.dumps.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are a few other places where the code is building json as strings. The thing to try here is to add your renderer to the overflow test - test/test_overflow.py - if that passes, and you can use the built-in json module, that would be preferable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the code neater, I have:

  • added JSON encoders for SpeedscopeEvent and SpeedscopeFrame in beb07eac and 9d32a191, respectively
  • replaced the encode_speedscope_event and encode_speedscope_frame calls with calls to json.dumps in 582925f5 and a78ee717, respectively
  • deleted the encode_speedscope_event and encode_speedscope_frame functions in 7a4f2e8b and 4a9ccfd4, respectively

So far, the overflow test passes. I think converting other parts of SpeedscopeRenderer to use json.dumps should be straightforward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of porting the rest of the SpeedscopeRenderer implementation to use json.dumps, I ended up replacing the NamedTuple types with dataclasses, because those are easier to serialize to JSON via the __dict__ field. This work should be done as of e66984e4.

pyinstrument/renderers/speedscope.py Outdated Show resolved Hide resolved
pyinstrument/renderers/speedscope.py Outdated Show resolved Hide resolved
pyinstrument/renderers/speedscope.py Outdated Show resolved Hide resolved
pyinstrument/renderers/speedscope.py Outdated Show resolved Hide resolved
@goxberry
Copy link
Contributor Author

I think this PR is probably ready for a second review.

Copy link
Owner

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant, this is looking really good, @goxberry! I appreciate the effort you've put into this :)

Would you be able to add the Speedscope renderer to the overflow test and write a simple test for the speedscope output?

I'm thinking something like:

  • test starts profiler
  • test calls function a
    • function a calls function b
      • function b sleeps (you can use 'fake time' for this, see other tests)
      • function b returns
    • function a returns
  • test stops profiler
  • test renders speedscope JSON
  • test loads speedscope JSON
  • test checks a few things about the JSON, I suppose, a few things about events and metadata. I'm not sure of the specifics, perhaps the events should be [open, open, close, close]?

@goxberry goxberry requested a review from joerick November 5, 2021 20:01
@goxberry
Copy link
Contributor Author

goxberry commented Nov 5, 2021

@joerick Thanks! And thank you for your review and for writing this profiler! Your design has made it straightforward to add this new feature. As you've suggested, I've added SpeedscopeRenderer to the tests in test/test_overflow.py and test/test_profiler.py. Because the functions long_function_a and long_function_b within test/test_profiler.py call the sleep function, calling both long_function_a and long_function_b suffices to generate an event timeline with nested function calls, so I elected to make the test_speedscope_output test similar to the other test_*_output tests for simplicity.

As a result of testing, I noticed something peculiar in SpeedscopeRenderer and wanted to get your insight about what might be happening. When I run test_profiler.test_speedscope_output in a debugger and:

  • set a breakpoint at test/test_profiler.py:149 and continue until I reach it
  • step into pyinstrument/renderers/speedscope.py
  • set another breakpoint at pyinstrument/renderers/speedscopy.py:220 and continue until I reach it

I notice that the value of session.duration at that line is less than the value of self._event_time, and I'm not sure why session.duration < self._event_time; I would have expected session.duration >= self._event_time.

You should be able to reproduce this phenomenon yourself by running in a terminal:

> pytest --trace ${PATH_TO_PYINSTRUMENT_REPO}/test/test_profiler.py -k "test_speedscope_output"
(Pdb) b 149
(Pdb) c
(Pdb) s    # should step into speedscope.py.__init__() here
(Pdb) b 220
(Pdb) c
(Pdb) p session.duration     # on my machine, this command prints the value 0.00041103363037109375
(Pdb) p self._event_time      # on my machine, this command prints 0.75, as expected, within a tolerance of 0.3

where the first > denotes a shell prompt, and the (Pdb) denotes a pdb debugger prompt.

Do you have any insight as to why this behavior might occur?

@joerick
Copy link
Owner

joerick commented Nov 6, 2021

Hi @goxberry! Yes, I can offer an explanation. The session.duration is computed by calling time.time() before and after profiling. But we are using 'fake time', which is defined here:

@contextlib.contextmanager
def fake_time(fake_clock=None):
fake_clock = fake_clock or FakeClock()
stack_sampler.get_stack_sampler().timer_func = fake_clock.get_time
try:
with mock.patch("time.sleep", new=fake_clock.sleep):
yield fake_clock
finally:
stack_sampler.get_stack_sampler().timer_func = None

This monkeypatches time.sleep and some internal machinery, but not time.time. So in the reality the session duration is correct, very little time has passed, but the test has a simulated reality, where time has passed!

If this is causing a problem, you could edit the fake_time function to also monkeypatch time.time (returning fake_clock.get_time should do the trick). Or we can just ignore the difference :)

@goxberry
Copy link
Contributor Author

goxberry commented Nov 7, 2021

Thanks for the explanation!

If this is causing a problem, you could edit the fake_time function to also monkeypatch time.time (returning fake_clock.get_time should do the trick). Or we can just ignore the difference :)

In the interest of pragmatism, let's ignore the difference. I'll revert the change in SpeedscopeRenderer regarding session.duration vs self._event_time and remove the assertion checking the endValue field in the Speedscope output.

Copy link
Owner

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @goxberry! Thanks for adding the test. I have a couple requests :)

test/test_profiler.py Outdated Show resolved Hide resolved
test/test_profiler.py Outdated Show resolved Hide resolved
test/test_profiler.py Outdated Show resolved Hide resolved
test/test_profiler.py Outdated Show resolved Hide resolved
@goxberry goxberry requested a review from joerick November 8, 2021 03:41
This commit starts the implementation of a speedscope renderer by:

* copying the existing JSONRenderer to a new SpeedscopeRenderer class
  in `speedscope.py` of the `renderers` directory

* adding an import hook to `__init__.py` of the `renderers` directory

* adding a `speedscope` option to the `-r` flag at the command line
This commit replaces the SpeedscopeRenderer implementation copied from
JSONRenderer with an implementation of an "evented" speedscope JSON
profile output file writer.

The file format is documented in a few different ways, listed below:

* the speedscope GitHub repo wiki:
  https://github.com/jlfwong/speedscope/wiki/Importing-from-custom-sources

* the speedscope JSON schema:
  https://www.speedscope.app/file-format-schema.json

* a Typescript file documenting the schema
  https://github.com/jlfwong/speedscope/blob/master/src/lib/file-format-spec.ts

* a simple example
  https://github.com/jlfwong/speedscope/blob/100578c536a3afab39fb6803d28913d12eac29c5/sample/profiles/speedscope/0.0.1/simple.speedscope.json

The style of the implementation needs obvious work; this commit is
intended as a starting point for a pull request with the expectation
that upstream maintainer feedback will be needed to fix any style or
documentation issues.
This commit deletes the jsonrenderer comment in the SpeedscopeRenderer
implementation; the source file is obviously not named jsonrenderer.
This commit converts SpeedscopeFrame from a non-class-style namedtuple
to a class-style namedtuple for readability.
This commit uncomments the SpeedscopeEventType enumeration class and
uses it in the SpeedscopeRenderer implementation.
This commit converts the SpeedscopeEvent namedtuple from
non-class-type to class-type to make it more self-documenting and to
conform with project guidelines regarding type hints.
This commit revises the docstring for SpeedscopeRenderer.render_frame
by deleting text that no longer applies to the implementation of this
method.
This commit uses the previously-unused profile_name object within
SpeedscopeRenderer.render.
This commit adds a JSON encoder class for the SpeedscopeEvent
namedtuple in order to stand up a SpeedscopeRenderer implementation
based on the Python json module.
This commit renames the _total_time field to _event_time in order to
clarify its purpose in the SpeedscopeRenderer class.
This commit replaces calls of the encode_speedscope_frame function
with json.dumps calls.
This commit deletes the encode_speedscope_event function because it is
no longer needed, and has been replaced with calls to json.dumps and
SpeedscopeEventEncoder.
This commit adds a JSON encoder for the SpeedscopeFrame class to try
to stand up a SpeedscopeRenderer implementation that uses the json
module.
This commit replaces calls of encode_frame with calls of json.dumps.
This commit deletes the encode_frame function because it is no longer
used.
This commit adds type hints to the Speedscope-related JSON encoders.
This commit deletes the import of `collections.namedtuple` because the
speedscope renderer now uses class-style namedtuple definitions.
This commit corrects the type hints by making each field a union type
with None, because `pyright` does not detect that `frame` cannot be
`None` where `SpeedscopeFrame.__init__` is invoked.
This commit removes type hints from `SpeedscopeEventType` in order to
silence some `pyright` errors regarding incorrect types when using
`str` as the type hint for each value.
This commit changes the SpeedscopeFrame and SpeedscopeEvent types from
class-style namedtuples to dataclasses because I couldn't figure out
how to serialize namedtuples to JSON objects via:

* subclassing json.JSONEncoder

* defining a default method

Attempting to return a dictionary using the data in each namedtuple
did not seem to yield a string containing a JSON object; instead, a
string containing a JSON array was returned.

Changing the namedtuple types to dataclasses and leveraging the
__dict__ dunder field, in concert with subclassing json.JSONEncoder
and defining a default method, yielded the desired result, although
memory usage will increase slightly.
This commit adds a SpeedscopeProfile class that stores the data
corresponding to speedscope "profile" objects, and adds a
SpeedscopeProfileEncoder class to serialize that class to JSON. The
encoder classes will be consolidated in a later commit.
This commit deletes the commented-out dead code used by the pure
string approach to serializing speedscope profile objects.
This commit deletes an orphaned LIFO iteration order comment about
dictionaries in Python 3.7+.
This commit adds a SpeedscopeFile data class and a SpeedscopeEncoder
class to serialize to JSON the SpeedscopeFrame, SpeedscopeEvent,
SpeedscopeProfile, SpeedscopeEventType, and SpeedscopeFile data
classes.
This commit revises the SpeedscopeRenderer class by removing a lot of
dead code and updating docstrings and comments in the class and its
auxiliary classes.
This commit deletes the processor.aggregate_repeated_calls method from
the list of default processors returned by SpeedscopeRenderer because
speedscope is a timeline-based format, and aggregating repeated calls
fouls up a timeline view.
This commit updates a code comment within SpeedscopeRenderer.render
discussing why the frame list is constructed as it is.
This commit fixes some pyright errors in
pyinstrument/renderers/speedscope.py.
This commit replaces the for loop that builds up the speedscope frame
list with a list comprehension.
This commit adds SpeedscopeRenderer and the `-r speedscope` flag to
the pyinstrument documentation.
This commit removes the dataclass arguments from the SpeedscopeEvent
class because this class does not need to be hashable.
This commit modifies the display title of a speedscope profile
exported from pyinstrument to include the timestamp of when the
profile was generated (which also happens to be argument to
`--load-prev` needed to render output in other formats).
This commit changes the SpeedscopeRenderer code to comply with project
style guidelines regarding code formatting with black and isort.
This commit adds SpeedscopeRenderer to the overflow test in
pyinstrument's test suite.
This commit fixes an apparent inconsistency in the
profiles[0].endValue field of the Speedscope output from
SpeedscopeRenderer. In unit testing, the value of session.duration
(within a call to SpeedscopeRenderer.render) may not be equal to the
event time of the last event in the profiles[0].events field of the
Speedscope output. In fact, in the test_speedscope_output test within
test_profiler.py, the value of session duration was approximately
0.0047s (in local testing on my laptop), whereas the time value of the
last event generated by the profile in that test should be 0.75s +/-
0.3s.

To correct this inconsistency, the end value of profile is set equal
to the event time of the last event.
This commit adds a test to test_profiler that tests the JSON output
emitted by SpeedscopeRenderer against known properties it should have.
This commit:

* Removes, in `test/test_profiler.py`, the test of the value of the
  `"profiles[0].endValue"` field in the Speedscope JSON output. The
  difference between the value of `profiles[0].endValue` (equal to
  `session.duration` from the `session` argument passed to
  `SpeedscopeRenderer.render`) and the time of the last event can be
  attributed to the `fake_time` context manager used as a mock timer in
  the profiler/renderer tests.

* Changes the value assigned to the `profiles[0].endValue` field via
  the last positional argument to `SpeedscopeProfile.__init__` from
  `self._event_time` (which, at that point in
  `SpeedscopeRenderer.render` equals the last event time) back to
  `session.duration`. This change is made because the discrepancy
  between the values of `session.duration` and `self._event_time` can
  be attributed to the `fake_time` context manager used as a mock
  timer in the profiler/renderer tests.
This commit aims to make the `test_speedscope_output` test in
`test/test_profiler.py` less wordy and more readable by:

* deleting message arguments to assertions

* replacing local variables with literals or expressions, because many
  of these variables were motivated by keeping line length low in
  message arguments passed to assertions

* removing tolerances in the pytest.approx calls because CI jitter
  should not affect timings, and all floating point numbers involved
  are exactly representable per the IEEE-754 standard
@goxberry
Copy link
Contributor Author

goxberry commented Nov 8, 2021

Rebased on main.

Copy link
Owner

@joerick joerick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@RyannDaGreat
Copy link

Whoah :O Thank you guys so much!! This is so cool!! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Add support for speedscope
3 participants