Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a simplified Markdown format for declaring Jupyter cells with results #8598

Open
sergei-mironov opened this issue Feb 5, 2023 · 11 comments

Comments

@sergei-mironov
Copy link

sergei-mironov commented Feb 5, 2023

Currently, the Pandoc manual requires users to specify divs (either fenced or native) around code sections in order for Pandoc to recognize them as code/result cells while converting Markdown documents to the Jupyter Notebook format.

Unfortunately (a) not all Markdown renderers support fenced divs, (b) resulting documents look overly complex.

Let me suggest adding a simplified mode as well, where the result sections would just follow the code sections, possibly with some simple tag set on them

So, instead of:

:::::: {.cell .code execution_count=1}
```python
print("why so hard")
```
::: {.output .stream .stdout}
```result
why so hard
```
:::
::::::

.. Pandoc should recognize a single Jupyter cell with the result section in the following document:

``` python
print("simple!")
```
``` result
simple!
```

Note, there are projects (e.g. LitREPL) that do support interactive evaluation of Markdown sections in the style of Jupyter, as described above.

@sergei-mironov sergei-mironov changed the title Provide a simplified Markdown format for declaring Jupyter cells **with results** Provide a simplified Markdown format for declaring Jupyter cells with results Feb 5, 2023
@sergei-mironov
Copy link
Author

Related #5992

@jgm
Copy link
Owner

jgm commented Feb 5, 2023

Since I'm not a Jupyter user, I'd welcome more feedback from others on this proposal.
Obviously we don't want to make it impossible to have a python code example that isn't a code cell, but maybe the immediately following result cell is enough to make it unambiguous.

@alerque
Copy link
Contributor

alerque commented Feb 5, 2023

Would an attribute (say a .cell class) on the code block be serviceable here?

``` python {.cell}
print("with attrs")
```
``` {.result}
with attrs
```

@jgm
Copy link
Owner

jgm commented Feb 5, 2023

That might be incompatible with the goal of making the syntax available to non-pandoc markdown variants.

@sergei-mironov
Copy link
Author

sergei-mironov commented Feb 10, 2023

I have sketched the following test.md where used 3 types of cell declaration syntax:

  1. The one originally suggested in this PR (V1)
  2. The python { .cell } syntax suggested by @alerque (V2)
  3. Additionally, the {.python .cell } syntax which resembles the syntax which is supported by another popular tool codebraid (V3)

Then I tried to display the file using the following renderers:

  1. Online Github renderer
  2. Chromium with the Markdown Viewer plugin (chromium test.md)
  3. Pandoc-to-HTML converter (pandoc test.md --metadata title=test -s -t slidy -o test.html && firefox test.html ). I used Pandoc version 2.17.1.1

Here is the results (C - Correct, H - Python code is highlighted, B - Bad: non-code markup is visible) :

Syntax Github online Chromium-based Pandoc-to-HTML
V1 C,H C,H C,H
V2 C,H C,H B
V3 C C C,H

So based on this information, I would note that V2 syntax by @alerque would probably require some efforts for adding it to the Pandoc in general, it is not only about conversion to the Jupyter format.
As a third option, I suggest thinking about supporting the Codebraid version ( {.python .cell} ) rather than python { .cell }.
I will be glad to see any of the above formats supported.

@jgm
Copy link
Owner

jgm commented Feb 10, 2023

@fperez any thoughts on this?

@fperez
Copy link

fperez commented Feb 10, 2023

My first thought is that the concept of output in Jupyter goes well beyond plaintext. Our output objects are "mime bundles" (JSON dicts keyed by mimetype and with base64 encoded values if necessary). You can see e.g. R code that supports these.

I'm not right now deep in the weeds of our format machinery, but this is the kind of thing that would make for good discussion at the upcoming Notebook format workshop, if nothing else in order to have an informed conversation between Jupyter devs and the Pandoc team.

Pinging @rowanc1 @stevejpurves @agoose77 in case any of them has further thoughts as our MyST work progresses.

@jgm
Copy link
Owner

jgm commented Feb 10, 2023

It's a good point that we cannot assume that the output will be representable by a code block. But I wonder if that's a problem with this proposal. Pandoc already has ways of representing arbitrary code and output cells in its markdown format. The proposal here is to provide an alternative, simplified markdown syntax in addition to the current, more verbose markdown syntax (which can represent arbitrary code and output cells). If the alternative syntax can only represent simple output cells, that isn't necessarily a problem.

@rowanc1
Copy link

rowanc1 commented Feb 10, 2023

In my experience with Juptyer most of the time outputs are (1) rich mime-bundles (images, interactive figures, latex, ansi, maps, etc.); and (2) a list (i.e. multiple outputs to a single code execution). When people represent notebooks as markdown (e.g. in MyST or with jupytext or with quarto/rmd) the outputs are not stored because of this complexity which is usually at odds with a human-written format like md.

I suppose because of that I am having a hard time wrapping my head around the use case here, but if it is just an optional simplification of what already exists and make it work in more renderers: 👍

@sergei-mironov
Copy link
Author

sergei-mironov commented Feb 11, 2023

I suppose because of that I am having a hard time wrapping my head around the use case here, but if it is just an optional simplification of what already exists and make it work in more renderers: +1

This suggestion is primarily intended to support the text-terminal-mode software development. Tools like Codebraid or LitREPL are designed with this goal in mind and they could be used as a limited but lightweight Jupyter alternative. In this scenario, conversion to the Jupyter format is required for sharing sketches or demonstrations with external teams at the end of development cycles.

Of cause in text mode we don't aim to render things like interactive figures and may assume only a limited support for the rich-text outputs. Note, however, that while this is probably beyond the scope of discussion, there are terminals which can display images even in text mode. For example, this fork of St could display a picture sent from a tmux-session running on a remote server over SSH. Consider also this Vim plugin project as a working poof of the whole concept.

@fperez
Copy link

fperez commented Feb 12, 2023

I agree with @rowanc1 - if it helps simplify existing worfklows, I don't see any reason why not. And you could imagine, with some good tooling support, being able to also store rich outputs in a sidecar file with convenient linkage, so the main doc remains plaintext while binary results are stored in the sidecar. Something along the lines of

```python
f(x)
```
```result
::sidecar:my-uuid.png
```

or similar. Lots of details would need to be worked out for something like this, but it would allow a human, terminal-based workflow to still represent richer documents via plaintext.

Again - these are some of the conversations that I'm sure will take place at the notebook format workshop, I hope some of you have a chance to participate (sadly I can't).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants