-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ NEW: nb_merge_streams
configuration
#364
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
"""A Sphinx post-transform, to convert notebook outpus to AST nodes.""" | ||
import os | ||
import re | ||
from abc import ABC, abstractmethod | ||
from typing import List, Optional | ||
from unittest import mock | ||
|
@@ -91,6 +92,58 @@ def load_renderer(name: str) -> "CellOutputRendererBase": | |
raise MystNbEntryPointError(f"No Entry Point found for myst_nb.mime_render:{name}") | ||
|
||
|
||
RGX_CARRIAGERETURN = re.compile(r".*\r(?=[^\n])") | ||
RGX_BACKSPACE = re.compile(r"[^\n]\b") | ||
|
||
|
||
def coalesce_streams(outputs: List[NotebookNode]) -> List[NotebookNode]: | ||
"""Merge all stream outputs with shared names into single streams. | ||
|
||
This ensure deterministic outputs. | ||
|
||
Adapted from: | ||
https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py. | ||
""" | ||
if not outputs: | ||
return [] | ||
|
||
new_outputs = [] | ||
streams = {} | ||
for output in outputs: | ||
if output["output_type"] == "stream": | ||
if output["name"] in streams: | ||
streams[output["name"]]["text"] += output["text"] | ||
else: | ||
new_outputs.append(output) | ||
streams[output["name"]] = output | ||
else: | ||
new_outputs.append(output) | ||
|
||
# process \r and \b characters | ||
for output in streams.values(): | ||
old = output["text"] | ||
while len(output["text"]) < len(old): | ||
old = output["text"] | ||
# Cancel out anything-but-newline followed by backspace | ||
output["text"] = RGX_BACKSPACE.sub("", output["text"]) | ||
# Replace all carriage returns not followed by newline | ||
output["text"] = RGX_CARRIAGERETURN.sub("", output["text"]) | ||
|
||
# We also want to ensure stdout and stderr are always in the same consecutive order, | ||
# because they are asynchronous, so order isn't guaranteed. | ||
for i, output in enumerate(new_outputs): | ||
if output["output_type"] == "stream" and output["name"] == "stderr": | ||
if ( | ||
len(new_outputs) >= i + 2 | ||
and new_outputs[i + 1]["output_type"] == "stream" | ||
and new_outputs[i + 1]["name"] == "stdout" | ||
): | ||
stdout = new_outputs.pop(i + 1) | ||
new_outputs.insert(i, stdout) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So I understand correctly, is this correct:
and if that is correct, maybe I'm missing something but could we simplify with something like: sorted(new_outputs, lambda output: output.get("name")) ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @choldgraf I think it's not what you describe (although I might have misread it).
Looking at the description of nbval, I think its purpose is different from publishing notebooks, and therefore the logic it applies might not be optimal for myst-nb or vice versa. Speaking of publication purposes, reordering streams may be undesirable: imagine demonstrating code that produces a warning halfway through. On the other hand, coalescing same stream types may be a good idea without even giving the users an option to configure this. Or is there a use case for having two independent text outputs in a row after a single input? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
except, if you write to anyhow, I have no capacity left to devote to this, so I'll leave you guys to propose new PRs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point @akhmerov - I feel like we should see how folks use this feature and see if there are any natural points to improve the ux over time 👍 |
||
|
||
return new_outputs | ||
|
||
|
||
class CellOutputsToNodes(SphinxPostTransform): | ||
"""Use the builder context to transform a CellOutputNode into Sphinx nodes.""" | ||
|
||
|
@@ -108,6 +161,8 @@ def run(self): | |
renderer_cls = load_renderer(node.renderer) | ||
renderers[node.renderer] = renderer_cls | ||
renderer = renderer_cls(self.document, node, abs_dir) | ||
if self.config.nb_merge_streams: | ||
node._outputs = coalesce_streams(node.outputs) | ||
output_nodes = renderer.cell_output_to_nodes(self.env.nb_render_priority) | ||
node.replace_self(output_nodes) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"source": [ | ||
"import sys\n", | ||
"print('stdout1', file=sys.stdout)\n", | ||
"print('stdout2', file=sys.stdout)\n", | ||
"print('stderr1', file=sys.stderr)\n", | ||
"print('stderr2', file=sys.stderr)\n", | ||
"print('stdout3', file=sys.stdout)\n", | ||
"print('stderr3', file=sys.stderr)\n", | ||
"1" | ||
], | ||
"outputs": [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here all the stdout/stderr are all mixed up |
||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"stdout1\n", | ||
"stdout2\n" | ||
] | ||
}, | ||
{ | ||
"output_type": "stream", | ||
"name": "stderr", | ||
"text": [ | ||
"stderr1\n", | ||
"stderr2\n" | ||
] | ||
}, | ||
{ | ||
"output_type": "stream", | ||
"name": "stdout", | ||
"text": [ | ||
"stdout3\n" | ||
] | ||
}, | ||
{ | ||
"output_type": "stream", | ||
"name": "stderr", | ||
"text": [ | ||
"stderr3\n" | ||
] | ||
}, | ||
{ | ||
"output_type": "execute_result", | ||
"data": { | ||
"text/plain": [ | ||
"1" | ||
] | ||
}, | ||
"metadata": {}, | ||
"execution_count": 1 | ||
} | ||
], | ||
"metadata": {} | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.1" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
<document source="merge_streams"> | ||
<CellNode cell_type="code" classes="cell"> | ||
<CellInputNode classes="cell_input"> | ||
<literal_block language="ipython3" linenos="False" xml:space="preserve"> | ||
import sys | ||
print('stdout1', file=sys.stdout) | ||
print('stdout2', file=sys.stdout) | ||
print('stderr1', file=sys.stderr) | ||
print('stderr2', file=sys.stderr) | ||
print('stdout3', file=sys.stdout) | ||
print('stderr3', file=sys.stderr) | ||
1 | ||
<CellOutputNode classes="cell_output"> | ||
<literal_block classes="output stream" language="myst-ansi" linenos="False" xml:space="preserve"> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here they are all in single blocks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah - thanks @chrisjsewell that's helpful. So the order will always be the same (so long as the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes the order of stdout / stderr, individually, are guaranteed, so they can be group merged. the order of stdout vs stderr is not guaranteed (they are independent streams), so stderr is placed always after stdout |
||
stdout1 | ||
stdout2 | ||
stdout3 | ||
<literal_block classes="output stderr" language="myst-ansi" linenos="False" xml:space="preserve"> | ||
stderr1 | ||
stderr2 | ||
stderr3 | ||
<literal_block classes="output text_plain" language="myst-ansi" linenos="False" xml:space="preserve"> | ||
1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This explanation is perhaps worth to clarify further. Which aspect of the outputs becomes deterministic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to make a PR with the change, and I'll merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, perhaps 5 minutes left to review of a new config option is too short. Maybe something to consider for the future.
Also I can't make this PR exactly because I don't understand what result does merging of the streams achieve. Given that I'm somewhat familiar with nbformat and the involved stack, this is possibly true for many others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result is "documented" by the test and it output (see PR files changed).
It is a common pattern, already used in places like: https://github.com/computationalmodelling/nbval/blob/master/nbval/plugin.py, and was already planned in jupyter-book/jupyter-book#973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I understand the purpose of the feature now. I'll trust your judgement call that looking at the source and the test is good enough as far as documentation is concerned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, I'm certainly open to improved documentation, and we will probably iterate on that more in jupyter-book/jupyter-book#1448 (comment)
The results, e.g. here https://sqla-tutorials-nb.readthedocs.io/en/latest/metadata.html#emitting-ddl-to-the-database, speak for themselves though: before all the output lines were in separate
div
boxes 😄There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @chrisjsewell -- this is a great improvement 👍
Thanks @akhmerov for your comments -- I am also not sure what
This ensures deterministic outputs.
means here.Just my
2cents
-- I find documentation in tests a sub-optimal form of documentation as it requires reading tests, which typically means you need to understand the test infrastructure with is non-trivial in many cases and (for me) is cognitive overhead :-).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its deterministic because, instead of having a random (non-deterministic) number of stdout/stderr outputs, you have only a maximum of one for each. As explained in jupyter-book/jupyter-book#973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this could be documented more clearly for end-users - if we think that this is a useful feature, and if it is turned off by default, then we should have a section that is easily discoverable for a user that would benefit from this feature.
For example, we could create a parent category of
## `stdout`/`stderr`
around here, then create a nested section like:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I'll just open a PR to propose these changes since this one is already merged.