Codebraid is a Python program that enables executable code in
Pandoc Markdown documents. Using Codebraid can be as
simple as adding a class to your code blocks' attributes, and then running
codebraid
rather than pandoc
to convert your document from Markdown to
another format. codebraid
supports almost all of pandoc
's options and
passes them to pandoc
internally. See
Codebraid Preview for VS Code
for editor support. See the Codebraid website for
additional examples and documentation.
Codebraid provides two options for executing code. It includes a built-in code execution system that currently supports Python 3.7+, Julia, Rust, R, Bash, JavaScript, GAP and SageMath. Code can also be executed using Jupyter kernels, with support for rich output like plots.
Development: https://github.com/gpoore/codebraid
Citing Codebraid: "Codebraid: Live Code in Pandoc Markdown", Geoffrey M. Poore, Proceedings of the 18th Python in Science Conference, 2019, 54-61.
View example HTML output, or see the Markdown source or raw HTML (the Python and Rust examples demonstrate more advanced features at the end):
- Python example [Pandoc Markdown source] [raw HTML]
- Jupyter example [Pandoc Markdown source] [raw HTML]
- Rust example [Pandoc Markdown source] [raw HTML]
- Julia example [Pandoc Markdown source] [raw HTML]
- R example [Pandoc Markdown source] [raw HTML]
- Bash example [Pandoc Markdown source] [raw HTML]
- JavaScript example [Pandoc Markdown source] [raw HTML]
- GAP example [Pandoc Markdown source] [raw HTML]
Markdown source test.md
:
```{.python .cb-run}
var = 'Hello from Python!'
var += ' $2^8 = {}$'.format(2**8)
```
```{.python .cb-run}
print(var)
```
Run codebraid
(to save the output, add something like -o test_out.md
, and
add --overwrite
if it already exists):
codebraid pandoc --from markdown --to markdown test.md
Output:
Hello from Python! $2^8 = 256$
As this example illustrates, variables persist between code blocks; by default, code is executed within a single session. Code output is also cached by default so that code is only re-executed when modified.
Codebraid | Jupyter Notebook | knitr | Pweave | |
---|---|---|---|---|
multiple programming languages per document | ✓ | ✓* | ✓† | ✓* |
multiple independent sessions per language | ✓ | |||
inline code execution within paragraphs | ✓ | ✓ | ✓ | |
no out-of-order code execution | ✓ | ✓‡ | ✓ | |
no markdown preprocessor or custom syntax | ✓ | ✓ | ||
minimal diffs for easy version control | ✓ | ✓ | ✓ | |
insert code output anywhere in a document | ✓ | ✓ | ||
can divide code into incomplete snippets | ✓ | ✓ | ✓ | |
support for literate programming | ✓ | ✓ | ||
compatible with any text editor | ✓ | ✓ | ✓ |
* One primary language from the Jupyter kernel. The IPython kernel
supports additional languages via %%script
magics. There is no continuity
between %%script
cells, because each cell is executed in a separate process.
Some magics, such as those provided by
PyJulia and
rpy2, provide more advanced capabilities.
† knitr only provides continuity between code chunks for R, and more recently
Python and Julia. Code chunks in other languages are executed individually
in separate processes.
‡ Out-of-order execution is possible with R Markdown notebooks.
The table above summarizes Codebraid features in comparison with Jupyter notebooks (without extensions), knitr (R Markdown), and Pweave, emphasizing Codebraid's unique features. Here are some additional points to consider:
Jupyter notebooks — Notebooks have a dedicated, browser-based graphical user interface. Jupyter kernels typically allow the code in a cell to be executed without re-executing any preceding code, providing superior interactivity. Codebraid has advantages for projects that are more focused on creating a document than on exploratory programming.
knitr — R Markdown documents have a dedicated user interface in R Studio. knitr provides superior support for R, as well as significant Python and Julia support that includes R integration. Codebraid offers continuity between code chunks for all supported languages, as well as multiple independent sessions per language. It also provides unique options for displaying code and its output.
Easy debugging — By default, stderr is shown automatically in the document
whenever there is an error, right next to the code that caused it. It is also
possible to monitor code output in real time during execution via
--live-output
.
Simple language support — Codebraid supports Jupyter kernels. It also has a
built-in system for executing code. Adding support for a new language with
this system can take only a few minutes. Just create a config file that tells
Codebraid which program to run, which file extension to use, and how to write
to stdout and stderr. See
languages/
for examples.
No preprocessor — Unlike many approaches to making code in Markdown executable, Codebraid is not a preprocessor. Rather, Codebraid acts on the abstract syntax tree (AST) that Pandoc generates when parsing a document. Preprocessors often fail to disable commented-out code blocks because the preprocessor doesn't recognize Markdown comments. Preprocessors can also fail due to the finer points of Markdown parsing. None of this is an issue for Codebraid, because Pandoc does the Markdown parsing.
No custom syntax — Codebraid introduces no additional Markdown syntax. Making a code block or inline code executable uses Pandoc's existing syntax for defining code attributes.
Installation: pip3 install codebraid
or pip install codebraid
Manual installation: python3 setup.py install
or python setup.py install
Requirements:
-
Pandoc 2.4+ (2.17.1.1+ recommended for
commonmark_x
). -
Python 3.7+ with
setuptools
, andbespon
0.6 (bespon
installation is typically managed bypip
/setup.py
) -
For Jupyter support,
jupyter_client
and language kernels -
For YAML metadata support,
ruamel.yaml
(can beruamel_yaml
for Anaconda installations)
Simply run codebraid pandoc <normal pandoc options>
. Codebraid currently
supports Pandoc Markdown (--from markdown
) and CommonMark with Pandoc
extensions (--from commonmark_x
) as input formats.
Note that --overwrite
is required to overwrite existing files. If you are
using a defaults file, --from
, --to
, and --output
must be given
explicitly and cannot be inherited from the defaults file. If you are using a
defaults file and converting to a standalone Pandoc Markdown document,
--standalone
should be given explicitly rather than being inherited from the
defaults file.
codebraid
should typically be run in the same directory as the document, so
that the default working directory for code is the document directory.
If you are converting from Pandoc Markdown to Pandoc Markdown with
--standalone
(basically using codebraid
to preprocess Markdown documents),
note that the following YAML metadata fields and command-line options are
ignored in that situation:
header-includes
and--include-in-header
include-before
and--include-before-body
include-after
and--include-after-body
toc
/table-of-contents
and--toc
/--table-of-contents
This is typically what you want. Usually, "include" and a table of contents
are desired in a final output format like HTML or PDF, not in a Pandoc
Markdown file. In the rare cases where "includes" and a table of contents are
needed in Markdown documents, this can be accomplished by piping the output of
codebraid
through pandoc
.
-
--live-output
— Show code output (stdout and stderr) live in the terminal during code execution. For Jupyter kernels, also show errors and a summary of rich output. Output still appears in the document as normal.Individual sessions can override this by setting
live_output=false
in the document. -
--no-execute
— Disables code execution. Only use available cached output. -
--only-code-output
={format} — Write code output in JSON Lines format to stdout as soon as it is available, and do not create a document.This is intended for use with Codebraid Preview, so that document previews can be updated during code execution. Currently, the only supported format is
codebraid_preview
. One JSON data object followed by a newline is written to stdout for each code chunk. In some cases, the data for a chunk will be resent later if the data relevant for a chunk changes (for example, if code execution fails after the first chunk runs, but in such a way that an error message needs to be attached to the first chunk). Data for a chunk is sent as soon as it is available from code processing, from cache, or from code execution (as soon as the chunk completes, typically before the session completes). Additional JSON data may be sent to provide tracking of code execution progress or information such as metadata. The JSON data provided for formatcodebraid_preview
may change between minor versions.
By default, code output is cached, and code is only re-executed when it is
modified. The default cache location is a _codebraid
directory in the
working directory (directory where codebraid
is run, typically the document
directory). This can be modified using --cache-dir
. Multiple documents can
share a single cache location. A cache directory can be synced between
different operating systems (such as Windows and Linux) while retaining full
functionality so long as documents are in equivalent locations under the
user's home directory (as resolved by
os.path.expanduser()
).
When multiple documents share the same cache location, each document will automatically clean up its own unused, outdated files. However, if a document is deleted or renamed, it may leave behind unused files in the cache, so it may be worth manually deleting and regenerating the cache in those circumstances. Future cache enhancements should be able to detect all unused files, making this unnecessary.
If you are working with external data that changes, you should run codebraid
with --no-cache
or delete the cache as necessary to prevent the cache from
becoming out of sync with your data. Future releases will allow external
dependencies to be specified so that caching will work correctly in these
situations.
Some document-wide settings can be given in the Markdown YAML metadata.
Codebraid settings must be under either a codebraid
or codebraid_
key in
the metadata. Pandoc will ignore codebraid_
so it will not be available to
filters; this distinction should not typically be important.
To use Jupyter kernels automatically for all sessions, simply set
jupyter: true
. For example,
---
codebraid:
jupyter: true
---
It is also possible to set a default kernel and/or default timeout. For example,
---
codebraid:
jupyter:
kernel: python3
timeout: 120
---
A Jupyter kernel and/or timeout can still be set in the first code chunk for a given session, and will override the document-wide default.
It is also possible to set live_output: <bool>
in the metadata.
Additional metadata settings will be added in future releases.
Code is made executable by adding a Codebraid class to its
Pandoc attributes.
For example, `code`{.python}
becomes
`code`{.python .cb-run}
.
When code is executed, the output will depend on whether the built-in code execution system or a Jupyter kernel is used.
When code is executed with the built-in system, the output is equivalent to
collecting all code for each session of each language, saving it to a file,
and then executing it (with an added compile step for some languages). For
example, running Python code is equivalent to saving it to file.py
and then
running python file.py
, while running R code is equivalent to saving it to
file.R
and then running Rscript file.R
. Code is not executed as it would
be in an interactive session (like running python
or R
at the command
prompt). As a result, some output that would be present in an interactive
session is absent. For example, in interactive sessions for some languages,
simply entering a variable returns a string representation without explicit
printing, and plotting opens a separate image window or displays an image
inline. Such output is absent in Codebraid unless it is also produced when
code is executed as a script rather than in an interactive session. The
.cb-expr
command is provided for when an inline string representation of a
variable is desired.
An option for interactive-style code execution with the built-in system is
planned for a future release. In the meantime, many interactive-style
features are available between the .cb-expr
command and Jupyter kernels.
When code is executed with a Jupyter kernel, the default output will be
equivalent to executing it in a Jupyter notebook. Rich output such plots,
images, and LaTeX math will be displayed automatically by default. This can
be customized by using the show
and hide
options.
All classes for making code executable are listed below. These all have the
form .cb-<command>
. Classes with the form .cb.<command>
(period rather
than hyphen) are supported for Pandoc Markdown (--from markdown
), but not
for commonmark_x
since it has a more restricted class syntax. The forms
shown below (.cb-<command>
) should be preferred for compatibility across
Markdown variants supported by Pandoc.
-
.cb-code
— Insert code verbatim, but do not run it. This is primarily useful when combined with other features like naming and then copying code chunks. -
.cb-expr
— Evaluate an expression and interpret the result as Markdown. Only works with inline code. This is not currently compatible with Jupyter kernels. -
.cb-nb
— Execute code in notebook mode. For inline code, this is equivalent to.cb-expr
with verbatim output unless a Jupyter kernel is used, in which case rich output like plots or LaTeX will be displayed. For code blocks, this inserts the code verbatim, followed by any printed output (stdout) verbatim. If stderr exists, it is also inserted verbatim. When a Jupyter kernel is used, rich output like plots or LaTeX is also displayed. -
.cb-paste
— Insert code and/or output copied from one or more named code chunks. Thecopy
keyword is used to specify chunks to be copied. This does not execute any code. Unlessshow
is specified, display options are inherited from the first copied code chunk.If content is copied from multiple code chunks that are executed, all code chunks must be in the same session and must be in sequential order without any omitted chunks. This ensures that what is displayed is always consistent with what was executed.
If content is copied from another
cb-paste
code chunk, only a single code chunk can be copied. This reduces the indirection that is possible when displaying the output of code that has been executed. This restriction may be removed in the future. -
.cb-run
— Run code and interpret any printed content (stdout) as Markdown. Also insert stderr verbatim if it exists. When a Jupyter kernel is used, rich output like plots or LaTeX is also displayed.
Pandoc code attribute syntax allows keyword arguments of the form key=value
,
with spaces (not commas) separating subsequent keys. value
can be
unquoted if it contains only letters and some symbols; otherwise, double
quotation marks "value"
are required. For example,
{.python key1=value1 key2=value2}
Codebraid adds support for additional keyword arguments. In some cases, multiple keywords can be used for the same option. This is primarily for Pandoc compatibility.
These are only permitted for the first code chunk in a session (or the first chunk for a language, if a session is not specified and thus the default session is in use).
-
executable
={string} — Executable to use for running or compiling code, instead of the default. This only applies to Codebraid's built-in code execution system. -
executable_opts
={string} — Command-line options passed toexecutable
. This only applies to Codebraid's built-in code execution system. -
args
={string} — Command-line arguments passed to code during execution. For example, this could be used to add values tosys.argv
for Python. This only applies to Codebraid's built-in code execution system. -
jupyter_kernel
={string} — Jupyter kernel to use for executing code instead of Codebraid's built-in code execution system. Multiple Jupyter kernels can be used within a single document, and multiple sessions are possible per kernel. Except when otherwise specified, Jupyter kernels should be usable just like the built-in code execution system. -
jupyter_timeout
={int} — Jupyter kernel timeout per code chunk in seconds. The default is 60. -
live_output
={true
,false
} — Show code output (stdout and stderr) live in the terminal during code execution. For Jupyter kernels, also show errors and a summary of rich output. Output still appears in the document as normal. Showing output can also be enabled via the command-line option--live-output
.When
live_output=false
is set for a session, this setting takes precedence over the command-line option--live-output
, and output will not be shown for that session.All output is written to stderr, so stdout only contains the document when
--output
is not specified. Output is interspersed with delimiters marking the start of each session and the start of each code chunk. The delimiters for the start of each code chunk include source names and line numbers.With Codebraid's built-in code execution system, the output for a code chunk may be delayed until all code in the chunk has finished executing, unless code output is line buffered or code manually flushes stdout and stderr. For example, with Python you may want to use print functions like
print("text", flush=True)
. Another option is to use Python in line-buffered mode by settingexecutable_opts="-u"
in the first code chunk of a session.With Jupyter kernels, the output for a code chunk will be delayed until all code in the chunk has finished executing.
-
complete
={true
,false
} — By default, code chunks must contain complete units of code (function definitions, loops, expressions, and so forth). Withcomplete=false
, this is not required. Any stdout from code chunks withcomplete=false
is accumulated until the next code chunk withcomplete=true
(the default value), or until the end of the session, whichever comes first.Setting
complete
is incompatible withoutside_main=true
, since thecomplete
status of code chunks withoutside_main=true
is inferred automatically. -
outside_main
={true
,false
} — This allows code chunks to overwrite the Codebraid template code when code is executed with Codebraid's built-in code execution system. It is primarily useful for languages like Rust, in which code is inserted by default into amain()
template. In that case, if a session starts with one or more code chunks withoutside_main=true
, these are used instead of the beginning of themain()
template. Similarly, if a session ends with one or more code chunks withoutside_main=true
, these are used instead of the end of themain()
template. If there are any code chunks in between that lackoutside_main
(that is, defaultoutside_main=false
), then these will have their stdout collected on a per-chunk basis like normal. Having code chunks that lackoutside_main
is not required; if there are none, the total accumulated stdout for a session belongs to the last code chunk in the session.outside_main=true
is incompatible with explicitly settingcomplete
. Thecomplete
status of code chunks withoutside_main=true
is inferred automatically. -
session
={identifier-style string} — By default, all code for a given language is executed in a single, shared session so that data and variables persist between code chunks. This option allows code to be separated into multiple independent sessions. Session names must be Python-style identifiers.
-
first_number
/startFrom
/start-from
/start_from
={integer ornext
} — Specify the first line number for code when line numbers are displayed.next
means continue from the last code in the current session. -
hide
={markup
,copied_markup
,code
,stdout
,stderr
,expr
,rich_output
,all
} — Hide some or all of the elements that are displayed by default. Elements can be combined. For example,hide=stdout+stderr
. Note thatexpr
only applies to.cb-expr
or.cb-nb
with inline code using Codebraid's built-in code execution system, since only these evaluate an expression.rich_output
is currently only relevant for Jupyter kernels. -
hide_markup_keys
={key(s)} — Hide the specified code chunk attribute key(s) in the Markdown source displayed viamarkup
orcopied_markup
. Multiple keys can be specified viahide_markup_keys=key1+key2
.hide_markup_keys
only applies to the code chunk in which it is used, to determined themarkup
for that code chunk. Thus, it only affectscopied_markup
indirectly. -
line_numbers
/numberLines
/number-lines
/number_lines
={true
,false
} — Number code lines in code blocks. -
show
={markup
,copied_markup
,code
,stdout
,stderr
,expr
,rich_output
,none
} — Override the elements that are displayed by default.expr
only applies to.cb-expr
and to.cb-nb
with inline code using Codebraid's built-in code execution system, since only these evaluate an expression. Elements can be combined. For example,show=code+stdout
.Each element except
rich_output
can optionally specify a format fromraw
,verbatim
, orverbatim_or_empty
. For example,show=code:verbatim+stdout:raw
.raw
means interpreted as Markdown.verbatim
produces inline code or a code block, depending on context. Nothing is produced if there is no content (for example, nothing in stdout.)verbatim_or_empty
produces inline code containing a single non-breaking space or a code block containing a single empty line in the event that there is no content. It is useful when a placeholder is desired, or a visual confirmation that there is indeed no output.
For
rich_output
, the format is specified as one or more abbreviations for the mime types of the output to be displayed. For example,rich_output:plain
will displaytext/plain
output if it exists, and otherwise nothing.rich_output:png|plain
will display a PNG image if it exists, or otherwise will fall back to plain text if available. The following formats are currently supported:latex
(corresponds totext/latex
)html
(text/html
)markdown
(text/markdown
)plain
(text/plain
)png
(image/png
)jpg
andjpeg
(image/jpeg
)svg
(image/svg+xml
)pdf
(application/pdf
)
For
rich_output
formats with atext/*
mime type (latex
,html
,markdown
,plain
), it is possible to specify whether they are displayedraw
,verbatim
, orverbatim_or_empty
. For example,show=rich_output:latex:raw
andshow=rich_output:latex:verbatim
.raw
treatslatex
andhtml
as raw content with those formats embedded within Markdown.raw
treatsmarkdown
andplain
as Markdown. When a display style is not specified, allrich_output
formats with atext/*
mime type are displayedraw
by default, except forplain
which is displayedverbatim
.markup
displays the Markdown source for the inline code or code block. Because the Markdown source is not available in the Pandoc AST but rather must be recreated from it, the Markdown source displayed withmarkup
may use a different number of backticks, quote attribute values slightly differently, or contain other insignificant differences from the original document.copied_markup
displays the Markdown source for code chunks copied viacopy
.expr
defaults toraw
if a format is not specified.rich_output
defaults tolatex|markdown|png|jpg|svg|plain
. All others default toverbatim
. -
example
={bool} — Insert a code block containing the Markdown source of the code chunk, followed by the rest of the output as normal. This is only valid for inline code if the code is in a paragraph by itself. This option is currently not compatible with--only-code-output
and Codebraid Preview. This option is intended primarily for documentation about Codebraid.
-
copy
={chunk name(s)} — Copy one or more named code chunks. Whencopy
is used with a command like.cb-run
that executes code, only the code is copied, and it is executed as if it had been entered directly. Whencopy
is used with.cb-code
, only the code is copied and nothing is executed. Whencopy
is used with.cb-paste
, both code and output are copied, and nothing is executed. Multiple code chunks may be copied; for example,copy=name1+name2
. In that case, the code from all chunks is concatenated, as is any output that is copied. Becausecopy
brings in code from other code chunks, the actual content of a code block or inline code usingcopy
is discarded. As a result, this must be empty, or a space or underscore can be used as a placeholder. -
name
={identifier-style string} — Name a code chunk so that it can later be copied by name. Names must be Python-style identifiers.
-
include_file
={path} — Include the specified file. A leading~/
or~<user>/
is expanded to the user's home directory under all operating systems, including under Windows with both slashes and backslashes.When
include_file
is used with a command like.cb-run
that executes code, the file is included and executed as part of the current session just as if the file contents had been entered directly. Wheninclude_file
is used with.cb-code
, the file is included and displayed just as if it had been entered directly. Becauseinclude_file
brings in code from another file, the actual content of a code block or inline code usinginclude_file
is discarded. As a result, this must be empty, or a space or underscore can be used as a placeholder. -
include_encoding
={encoding} — Encoding for included file. The default encoding is UTF-8. -
include_lines
={lines/line ranges} — Include the specified lines or line ranges. For example,1-3,5,7-9,11-
. Line numbers are one-indexed. Line ranges are inclusive, so1-3
is1
up to and including3
. If a range ends with a hyphen, like11-
, then everything is included from the line through the end of the file.Cannot be combined with other
include
options that specify what is to be included. -
include_regex
={regex} — Include the first segment of the file that matches the provided regular expression.Keep in mind that Pandoc's key-value attributes evaluate backslash escapes in values whether or not the values are quoted with double quotation marks, so two levels of backslash-escaping are always necessary (one for Pandoc's strings, one for the regex itself; there are no raw strings). Regular expressions use multiline mode, so
^
/$
match the start/end of a line, and\A
/\Z
can be used to match the start/end of the file. Regular expressions use dotall mode, so.
matches anything including the newline\n
; use[^\n]
when this is not desired.Cannot be combined with other
include
options that specify what is to be included. -
include_start_string
={string} — Include everything from the first occurrence of this string onward.Can only be combined with other
include
options that specify the end of what is to be included. -
include_start_regex
={regex} — Include everything from the first match of this regex onward.Can only be combined with other
include
options that specify the end of what is to be included. Seeinclude_regex
for notes on regex usage. -
include_after_string
={string} — Include everything after the first occurrence of this string onward.Can only be combined with other
include
options that specify the end of what is to be included. -
include_after_regex
={regex} — Include everything after the first match of this regex onward.Can only be combined with other
include
options that specify the end of what is to be included. Seeinclude_regex
for notes on regex usage. -
include_before_string
={string} — Include everything before the first occurrence of this string.Can only be combined with other
include
options that specify the start of what is to be included. If the start is specified, then the first occurrence after this point is used, rather than the first occurrence in the overall file. -
include_before_regex
={regex} — Include everything before the first match of this regex.Can only be combined with other
include
options that specify the start of what is to be included. If the start is specified, then the first match after this point is used, rather than the first match in the overall file. Seeinclude_regex
for notes on regex usage. -
include_end_string
={string} — Include everything through the first occurrence of this string.Can only be combined with other
include
options that specify the start of what is to be included. If the start is specified, then the first occurrence after this point is used, rather than the first occurrence in the overall file. -
include_end_regex
={regex} — Include everything through the first match of this regex.Can only be combined with other
include
options that specify the start of what is to be included. If the start is specified, then the first match after this point is used, rather than the first match in the overall file. Seeinclude_regex
for notes on regex usage.