From 748cbd6b084fa3b92d187bb7c7d8685193d7e26e Mon Sep 17 00:00:00 2001 From: Mathieu Boespflug Date: Wed, 4 Nov 2020 09:34:49 +0100 Subject: [PATCH] [RFC 0072] Switch to CommonMark for documentation (#72) * Switch to CommonMark for documentation This RFC proposes to convert the documentation for the Nix, Nixpkgs and NixOS projects from Docbook to CommonMark Markdown. It proposes requirements for web facing documentation. It does not mandate a choice of toolchain to generate a website with documentation. The choice is left to the implementers and maintainers. * Update rfcs/0000-commonmark-docs.md Co-authored-by: Profpatsch * Update rfcs/0000-commonmark-docs.md Co-authored-by: Frederik Rietdijk * Update rfcs/0000-commonmark-docs.md Co-authored-by: davidak * Update rfcs/0000-commonmark-docs.md Co-authored-by: davidak * Update rfcs/0000-commonmark-docs.md Co-authored-by: Ryan Mulligan * Add shepherd metadata * Add shepherd metadata * Allow CommonMark plus small number of extensions The exact set of extensions used is left to the documentation team. * Final adjustments - Nix now already uses markdown - Fix grammar mistake - Remove requirement of extensions having to be supported by at least 3 toolchains - Add requirement for an extension for references - Rename the file to its designated name Co-authored-by: Profpatsch Co-authored-by: Frederik Rietdijk Co-authored-by: davidak Co-authored-by: zimbatm Co-authored-by: Ryan Mulligan Co-authored-by: Linus Heckemann Co-authored-by: Silvan Mosberger --- rfcs/0072-commonmark-docs.md | 334 +++++++++++++++++++++++++++++++++++ 1 file changed, 334 insertions(+) create mode 100644 rfcs/0072-commonmark-docs.md diff --git a/rfcs/0072-commonmark-docs.md b/rfcs/0072-commonmark-docs.md new file mode 100644 index 000000000..087e32bb1 --- /dev/null +++ b/rfcs/0072-commonmark-docs.md @@ -0,0 +1,334 @@ +--- +feature: commonmark-docs +start-date: 2020-07-05 +author: mboes +shepherd-team: garbas, zimbatm, Kloenk +shepherd-leader: Infinisil +--- + +# Summary +[summary]: #summary + +Nixpkgs and NixOS documentation is currently in Docbook format. +We propose to migrate all existing content to CommonMark, a flavour of +Markdown. + +# Motivation +[motivation]: #motivation + +Documentation ought to be easy to write, easy to maintain, easy to +make pretty, and its format be so boring as to seldom be the object of +future RFC's. But beyond that, the significance of the documentation +format is in its very real impact on the perception of the project. +Documentation is the storefront of any project online in addition to +its front page. Modern looking documentation that can be edited at the +click of a button sends a signal: that the Nix* projects are forward +looking, are welcoming to novice contributors and feel familiar. + +The motivation to switch away from Docbook is that it's unfamiliar. +None of the [top 100 projects on Github][gitstar-rankings] use +Docbook. The motivation to switch to CommonMark is that among these +projects, roughly 10 times more projects use Markdown (or some +variation thereof) than all other documentation formats combined, it +can be considered the default. It's worth considering alternatives, +but the burden ought to lie with the proponents of other format/tool +combinations to convince this community that following the precedent +set by the overwhelming majority of communities larger than ours won't +work well for us. + +Increasing the number of contributors to the documentation, increasing +its coverage and improving its quality are social goals, not technical +requirements. To recruit more writers and users, should we make +support for callouts, admonitions and a precise taxonomy (like Docbook +has) requirements? Should literate programming and the ability to do +transclusions also be requirements? Does reducing documentation build +times from seconds to milliseconds matter? Maybe meeting these +requirements helps us reach our social goals. Maybe they are +immaterial. What we can go by is the following piece of evidence: +nearly all of the most beautifully laid out, high-quality and +easy-to-navigate documentation in the GitHub Top 100 are in +CommonMark. The only notable exception is the [Tensorflow +documentation][tensorflow-docs], which uses Jupyter notebooks for +everything. + +Consider the following 5 examples of great Markdown documentation in +terms of breadth, presentation and richness (cross references, +definition lists, tables, callouts, integration with playgrounds, +etc): + +- [GatsbyJS documentation][gatsby-docs] (using Gatsby with [MDX][mdx]) +- Facebook's [React documentation][react-docs] (using Gatsby with + plain CommonMark) +- Microsoft's [VS Code documentation][vscode-docs] (using Jekyll) +- [Kubernetes documentation][kubernetes-docs] (using Hugo) +- The [Rust book][rust-docs] (using mdBook) + +The goal of this RFC is to change the *form* of the current +documentation for Nixpkgs and NixOS to look similar to any one of +the above 5 projects (see requirements in the next section). We submit +that the least effort route to do so is to use the same toolchain as +they do (CommonMark or MDX input and one of Gatsby, Jekyll or Hugo to +generate a static site). + +What all of the projects cited above have in common is a large +contributor base with heterogeneous skillsets and multiple +subcommunities. Just like in Nixpkgs, they cannot leverage the +hegemony of RST in Python, because not all their users are Python +programmers. They prefer not to count on writers learning Docbook or +Asciidoc, because many documentation patches are from casual +contributors. The *lingua franca* across subcommunities for both +humans and their toolchains is Markdown. It's all the more easy to +create good looking documentation with Markdown that the tools +available to process it are plentiful and flexible (JavaScript +converters with plugin support like [Remark][remark], [static site +generators][staticgen] by the dozen, [MDX][mdx] to extend CommonMark +with arbitrary React components, etc). + +[gatsby-docs]: https://www.gatsbyjs.org/docs/ +[gitstar-rankings]: https://gitstar-ranking.com/repositories +[kubernetes-docs]: https://kubernetes.io/docs/home/ +[react-docs]: https://reactjs.org/docs/getting-started.html +[rust-docs]: https://doc.rust-lang.org/book/ +[tensorflow-docs]: https://github.com/tensorflow/docs +[vscode-docs]: https://code.visualstudio.com/docs +[mdx]: https://github.com/mdx-js/mdx +[remark]: https://remark.js.org/ +[staticgen]: https://www.staticgen.com/ + +# Detailed design +[design]: #detailed-design + +## Documentation website requirements + +AsciiDoc, CommonMark and RST are all formats that support basic +markup: emphasis, bold, (nested) (un)numbered lists, headings, inline +and display code, tables (using a CommonMark extension), etc. The +requirements we set below pertain to the appearance of the +documentation available on the website. + +The key requirements we work towards are: + +1. easy-to-find "Edit on GitHub" button to increase drive-by + contributions, +1. good quality documentation search engine, +1. syntax highlighting of all code, +1. separate page per chapter, instead of the current monolithic page + for each manual, +1. table of content for each chapter available in a side bar to easily + jump through long content, +1. make warning and info boxes stand out from the rest of the content + with colour, +1. easy to customize HTML and CSS, since tired old templates we've all + seen too many times are best avoided. The objective is to make + a statement to newcomers that the community takes documentation + seriously, with both good form and good content. + +## Choice of format + +Satisfying all requirements above is possible with a number of +toolchains, including Asciidoc-specific or RST-specific toolchains +(not just Markdown). We propose CommonMark plus a small number of +extensions as the documentation format. The choice of toolchain is +left at the discretion of the implementers. We feature two demos below +(one uses [Gatsby][gatsby] and another uses [Sphinx][sphinx]). + +The choice of CommonMark extensions is also left to the implementors. +However, this RFC stipulates the following guidelines: + +* The overall number of extensions should be kept small, to facilitate + interoperability. The goal is not perfect compliance with a standard + (i.e. pure CommonMark), but it should nonetheless remain easy to + switch from one toolchain to another for generating HTML from the + CommonMark source, with minimal manual work. For example, + * enabling an extension for tables is acceptable because few tables + appear in the documentation and converting them by hand to a new + format is not labour intensive; + * YAML frontmatter for metadata is also acceptable, because nearly + all toolchains support this. +* An extension for defining references between section should be + supported, since this is widely used in the current manuals and + essential for navigating around. +* CommonMark allows HTML span and block elements. These should be + avoided in documentation source, because this complicates targeting + multiple output formats (e.g. man pages, epub, etc). + +## Transition to CommonMark + +The one-time conversion from Docbook to CommonMark is lossy, because +CommonMark has far less expressive markup. It is done using +[Pandoc][pandoc], which has a reader available for Docbook and +a high-quality writer available for CommonMark. The Pandoc Docbook +reader isn't perfect, but in the process of putting together the quick +demo below, it was easy to fix 5 bugs already. + +We propose to convert man pages to CommonMark as well. However, this +transition need not happen concurrently to the format transition for +the rest of the documentation. They are self-contained documents, +whose form is constrained by convention and the limits of the man page +format. When the time comes to convert the man pages as well, we can +turn here again to prior art. The Kubernetes project uses +[md2man][md2man] to generate man pages from CommonMark. This is +a small Go command with a 4.1MB closure size (including 2MB for +`tzdata`). + +[gatsby]: https://gatsbyjs.org +[sphinx]: https://www.sphinx-doc.org +[pandoc]: https://pandoc.org/ +[md2man]: https://github.com/cpuguy83/go-md2man + +# Examples and Interactions +[examples-and-interactions]: #examples-and-interactions + +This RFC is *not* about vetting a single implementation choice. The +five large projects cited above use one of Hugo, Gatsby, Jekyll, or +mdBook. We can add Sphinx to the set of available implementation +choices, which likewise has excellent support for Markdown. In this +section we show how two of these choices could work in the context of +the Nix* projects. *Selecting a toolchain is left to the discretion of +the implementors. Discussing implementation choices that are +transparent to the user is out of scope for an RFC process*. + +The following demo shows how to satisfy all requirements above using +CommonMark as the input format and a Gatsby-based [starter kit from +Hasura][hasura-docs-starter] to generate a ready-to-deploy static +website: + +https://nixos-docs-mockup.netlify.app + +It features the following sections of the Nix manual: +* introduction +* quickstart and installation +* the Nix expressions chapter +* advanced topics + +The content is the unedited output of running the Pandoc conversion, +which still has bugs (in particular the handling of cross-references). + +This other demo shows how to satisfy many of the same requirements +using Sphinx, again with CommonMark as the input: + +https://nixpkgs-manual-sphinx-markedown-example.netlify.app + +The look and feel is customizable, and indeed could be made the same +in both cases. Both are examples meant to demonstrate that choosing +CommonMark as the input format doesn't force unreasonable compromises. + +[hasura-docs-starter]: https://github.com/hasura/gatsby-gitbook-starter + +# Drawbacks +[drawbacks]: #drawbacks + +The current DocBook format is semantically richer. There are specific +tags for definitions, environment variables, user accounts, various +types of callouts, etc. CommonMark's data model isn't nearly as rich, +so in converting to CommonMark, even with extensions, some information +is lost. However, experience in other very large ecosystems with many +users tells us that authors are happy to make do with an inexpressive +but familiar format, which in any case *can* be extended with the wise +use of ``-like HTML tags and custom tags that expand to HTML. +That appears to be seldom necessary in practice (see e.g. the five +documentation examples in [Motivation][motivation]). + +# Alternatives +[alternatives]: #alternatives + +Other formats share the desirable properties of the various flavours +of Markdown (of which CommonMark is but one): + +- textual format that is easy to diff, +- reuses familiar conventions from decades of plain text emails, +- terseness of the markup and consistent levels of indentation. + +The main two other contenders are AsciiDoc and RST. There are +technical reasons to prefer either of them to CommonMark, despite +social factors in favour of CommonMark. + +## AsciiDoc + +AsciiDoc has the same data model as Docbook - just a different syntax. +This makes the two formats interchangeable in principle. The markup +language is arguably better designed than CommonMark and is more +expressive. However, AsciiDoc is not as precisely specified as +CommonMark. It has only two extant toolchains ([Asciidoc][asciidoc] +and [Asciidoctor][asciidoctor]) for transforming into HTML, one of +which is infrequently maintained (3 releases in 7 years). In the top +100 projects on GitHub, only one project uses Asciidoc for their +documentation format. + +Asciidoc is also effectively a format transformer dead-end: +round-tripping via Docbook using `asciidoc` and `docbookrx` doesn't +work and Pandoc does not support Asciidoc as an input. + +[asciidoc]: https://asciidoc.org/ +[asciidoctor]: https://asciidoctor.org/ + +## reStructuredText + +Pros: + +- reStructuredText (RST) has wider adoption than Asciidoc. +- Most Python projects use it. The Python ecosystem is deep and wide. +- Good and mature toolchains like Sphinx exist to process RST files. + +Cons: + +- the syntax for simple things like links or inline code is + non-standard. Inline code requires double backticks, but single + backticks is legal syntax and used for links, so it's easy to get + things wrong that don't lead to build errors. +- RST constructs do not compose nearly as well as they should. What is + trivial in CommonMark is impossible in RST. For example, links with + inline code in the title are not expressible. Links with italics in + the title are not expressible either. In general, inline markup does + not concatenate well. Backslashes are required in common constructs + like the following: + ``` + Python ``list``\s use square bracket syntax. + This is a long\ *ish* paragraph + ``` +- nesting block elements suffers from gotchas (like [this + one][rst-nested-lists]). + +[rst-nested-lists]: https://stackoverflow.com/questions/44557957/unexpected-indentation-with-restructuredtext-list + +## Are they popular? + +The extra expressiveness of Asciidoc or RST over CommonMark was not +deemed to be a crucial requirement by many other large projects. In +the GitHub Top 100 (projects ranked by number of stars), only one +project (Spring Boot) uses Asciidoc, and only two projects (Ansible +and Linux) use RST other than the predominantly Python projects. Swift +uses a mix of RST and Markdown, what with being in the middle of +a transition to full Markdown (grep for `[Gardening] De-RST` in [the +history][swift-docs-history]). + +[swift-docs-history]: https://github.com/apple/swift/tree/master/docs + +# Unresolved questions +[unresolved]: #unresolved-questions + +We propose to take CommonMark as a baseline. This is sufficient to +support all the markup we need. It's also a stepping stone towards +[MDX][mdx], which enables embedding custom widgets if so desired. For +example, Gatsby uses MDX to include newsletter signup forms, embed +videos from third-party training websites, or to generate +documentation sitemaps within a section (e.g. the bottom of [this +page][gatsby-graphql]). It's unclear that custom widgets are worth the +trouble at this stage. + +The choice of toolchain to publish the documentation on the website +and as man pages is left open. + +[gatsby-graphql]: https://www.gatsbyjs.org/docs/graphql/ + +# Future work +[future]: #future-work + +- Build out the documentation demo above into a full replacement for + each of the Nix, Nixpks and NixOS user manuals online. +- Fix all Pandoc bugs encountered along the way, in particular to get + working cross-references. +- Customize the CSS and layout of the demo to be more in line with Nix + branding. +- Integrate the result into the Nix and Nixpkgs repositories and hook + into their respective CI/CD pipelines.