Jupyter Notebooks offer a great interactive environment for programming, for teaching or for doing research.
Notebooks are complex objects as they combine the user input (text or code cells) with the outputs of the user code, which may be much bigger.
This causes difficulties for version control and/or manual edition of the documents (outside of the classical notebook editors).
What we propose here is to implement a format for Jupyter Notebooks as percent scripts with outputs, based on the experience that we gathered in Jupytext with the percent script format (with no outputs).
The purpose of this proposal is to
- set a few guidelines for a possible implementation
- identify possible sponsors for the project (I would like to get funding for working on this)
- identify technical correspondents to ease future integration with the main notebook editors (i.e. Jupyter, VS Code, PyCharm Professional)
- The percent format with outputs works for all up-to-date notebooks (cell ids are required i.e. notebook format should be at least 4.5)
- All output types are supported
- The implementation is done in Python, but the format works for any language (we just need to know what is the single line comment char)
- Enough examples are provided to allow easy re-implementation in other languages (e.g. TypeScript for VS Code or Jupyter Lab)
- This format is implemented either in Jupytext, or in a standalone library. The sponsor chooses the licence and organisation.
Sometime cell.outputs = []
, i.e. there is no outputs. In such a case we would just code the code cell as a standard percent cell, e.g.
# %% e57e3703-0a89-4dd7-b906-2304c8d32df7
x = 5
Note that e57e3703-0a89-4dd7-b906-2304c8d32df7
is the cell id. The user will be able to edit the cell ids in the text notebook, and we also plan to provide a utility that will programmatically rename cell ids to more friendly names like unnamed_code_cell_1
.
A sample notebook with no output is available here: 00_no_output.py
Short text outputs should be inlined in the text notebook as in 01_simple_output.py
# %% aca6afe8-eb2f-45c7-bd16-35e48e1f43e4
1 + 1
# «1» 2
Long or multiline text outputs are exported to text files like in 01b_long_output.py
# %% aca6afe8-eb2f-45c7-bd16-35e48e1f43e4
"very " * 55 + "long text"
# «1»
# data:
# text/plain: aca6afe8-eb2f-45c7-bd16-35e48e1f43e4_0.txt
And the content of the txt
file is simply the output of the command:
'very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very long text'
Markdown outputs are exported to .md
files like here in 02_markdown.py
# %% 03220820-56d9-4e56-a8c7-90244408ea8e
from IPython.display import Markdown
Markdown("**bold**")
# «1»
# data:
# text/markdown: 03220820-56d9-4e56-a8c7-90244408ea8e_0.md
# text/plain: 03220820-56d9-4e56-a8c7-90244408ea8e_0.txt
We store HTML, PNG and JSON outputs in files with the expected extension.
The 06_matplotlib.py notebook is an example that has PNG outputs.
The 07_plotly.py notebook is an example that has HTML, JSON and PNG outputs.
We have tested Altair HTML plots at 08_altair.py and their HTML output, and 08_altair.py.
Note that our examples also include ipywidgets and Javascript outputs.
Streams (bash commands) and Exceptions will also be supported, see the other examples.
This text format makes changes easier to read. See for instance how a commit on an ipynb
file that involves an update in the matplotlib plot becomes readable on the text plus outputs representation of the notebook:
They must all be supported, otherwise the notebook cannot be trusted. In the prototype I have dropped some data like metadata
or output_type
when it takes the default value (respectively {}
and "execute_result"
)
Maybe we could give an option to export the execution count to a dedicated file to avoid VCS changes in the notebook itself when the notebook is partially re-run.
Cell attachments are not outputs, but they should be supported (maybe in another folder {notebook_name}_attachments
to avoid conflicts with outputs).
The active-py
and active-ipynb
cell tags of Jupytext that let the user decide whether a code cell should be commented out in the .py
file should be supported. And the bash commands like !echo
should be commented out like they are in Jupytext.
The number of blank lines introduced after code cells should depend on whether the last command was an import or a function definition (currently the case in the Jupytext percent format), so that notebooks can pass flake8 checks.
Files that corresponds to outputs that are not anymore in the notebook should be deleted when the notebook is saved (and the folder deleted when the notebook has no outputs).
If you would like to help with the project and contribute either funding or support, please contact me. You will find my email on my GitHub account.