Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support codebraid #545

Open
teucer opened this issue Jun 22, 2020 · 6 comments
Open

Support codebraid #545

teucer opened this issue Jun 22, 2020 · 6 comments

Comments

@teucer
Copy link

teucer commented Jun 22, 2020

codebraid is a Python program that enables executable code in Pandoc Markdown documents.

It is similar to rmarkdown and claims certain advantages over it.

It would be beneficial to support it as well.

@mwouts
Copy link
Owner

mwouts commented Jun 22, 2020

Hello @teucer, that is an interesting suggestion! And yes, I would be happy to take a PR that integrates codebraid with jupytext.

The first question I'll ask you is: how safe is the round trip from a codebraid document to a Jupyter Notebook? Is the conversion implemented in codebraid? I am asking because codebraid documents seem to be a bit different from notebooks (for instance, multiple languages are allowed, and code may not be executed with Jupyter kernels, etc...)

If the answer is positive, then it should not be too difficult to plug codebraid into jupytext - you could have a look at how the md:myst or md:pandoc are implemented, using external tools like pandoc or myst-parser.

@teucer
Copy link
Author

teucer commented Jun 22, 2020

  • ipynb -> codebraid: codebraid uses a special syntax e.g. {.python .cb.run copy=part1+part2 session=copied show=code+stdout:raw example=true}. One could use a pandox filter to do the conversion. The cell level meta data can be leveraged for the "key=value" pairs
  • codebraid -> ipynb: the plain conversion with pandoc is not working, so probably one needs to create another pandoc filter

@timothymillar
Copy link

timothymillar commented Jun 27, 2020

Codebraid just uses regular pandoc markdown but a user can specify different methods of running a code block using
attributes.
It provides a "notebook mode" using the attribute .cb.nb which would make a sensible default for conversion between other notebook formats.
You can also specify the use of a jupyter kernel in the first code block using jupyter_kernel=....

The codebraid repo doesn't have examples with pandoc divs for cells, but based on some quick testing it works fine (presumably passed through to pandoc).
So a strait forward codebraid backend could just be a slight variation of the pandoc backend.

e.g. this:

::: {.cell .markdown}
# A quick insight at world population
:::

::: {.cell .code}
\``` {.python}
import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::

::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::

::: {.cell .code}
\``` {.python}
wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::

becomes this:

::: {.cell .markdown}
# A quick insight at world population
:::

::: {.cell .code}
\``` {.python .cb.nb}
import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::

::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::

::: {.cell .code}
\``` {.python .cb.nb}
wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::

and if a jupyter kernel spec is specified in the meta it could also be specified in the first code block:

::: {.cell .markdown}
# A quick insight at world population
:::

::: {.cell .code}
\``` {.python .cb.nb jupyter_kernel=python3}
import pandas as pd
import wbdata as wb

pd.options.display.max_rows = 6
pd.options.display.max_columns = 20
\```
:::

::: {.cell .markdown}
Corresponding indicator is found using search method - or, directly,
the World Bank site.
:::

::: {.cell .code}
\``` {.python .cb.nb}
wb.search_indicators('Population, total')  # SP.POP.TOTL
# wb.search_indicators('area')
# => https://data.worldbank.org/indicator is easier to use
\```
:::

@timothymillar
Copy link

I have written a small pandoc filter to covert pandoc ast to a codebraid "notebook").
This adds .cb.nb to each code block and if a a jupyter kernel is defined in the metadata (in the style of jupytext) it adds jupyter_kernel=<name> to the first codeblock.

I have briefly tested it on a jupytext markdown file and a ipynb and it seems to work fine with both.

Converting directly from an ipynb using the script obviously creates the same verbose pandoc-markdown output as jupytext (with the additional code block data).
This output can also be converted back to an ipynb using pandoc.

@mwouts
Copy link
Owner

mwouts commented Jun 28, 2020

That's great, thank you @timothymillar!

The next step will be to add the codebraid format to Jupytext. For this, you will have to tell me how to convert the codebraid document to a Jupyter notebook, and back. For the pandoc format we use md_to_notebook and notebook_to_md in jupytext/pandoc.py, do you think you could implement similar functions? Do you think these functions and the filter should belong to jupytext, or maybe rather to codebraid (cc @gpoore)?

I have briefly tested it on a jupytext markdown file and a ipynb and it seems to work fine with both.

That's a good start! The next step will be to test on our collection of test notebooks, see e.g.

@requires_pandoc
@pytest.mark.parametrize(
"nb_file",
list_notebooks("ipynb", skip="(functional|Notebook with|flavors|invalid|305)"),
)
def test_ipynb_to_pandoc(nb_file, no_jupytext_version_number):
assert_conversion_same_as_mirror(nb_file, "md:pandoc", "ipynb_to_pandoc")

@timothymillar
Copy link

@mwouts I forgot to link the related codebraid issue.

I personally think integration with codebraid could be really nice, some thing like codebraid notebook <pandoc-compatible-file> ....
But It may be the antithesis of what @gpoore is aiming for.

I'm not sure that a specific codebraid backend for jupytext would add much in terms of jupytexts goals.
The pandoc backend is compatible with code braid, it simply lacks codebraid specific classes that tell codebraid to run and/or echo results of codeblocks.
So the pandoc output can be built with codebraid but the result will be identical to building with pandoc.

I initially though it would be a big convenience to be able to convert a script directly to pandoc markdown with codebraid classes (my main use case) but this can be achieved with a one liner (using the filter):

jupytext script.py --to pandoc -o - | pandoc --filter cbnb.filter.py --to markdown ...

Currently it seems to be possible to convert 'regular' pandoc/codebraid markdown file (i.e. no divs) to jupytext via github flavored markdown, but this looses metadata and may not work well with more complex examples:

pandoc codebraid.md --to gfm | jupytext - --to ipynb ...

If anything an additional 'simplified' pandoc backend might be useful for jupytext users.
This would allow for conversion to/from pandoc markdown without the cell divs (::: {.cell .code}) which are not commonly used.
It would improve jupytext inter-op with many pandoc based tools including codebraid.

@teucer seemed to be suggesting more specific translation of cell level metadata between formats so I'd be interested to here more detail on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants