Towards a unified doctree #184

MichaelHatherly · 2016-08-08T14:15:55Z

Definitely not in a mergable/usable state at the moment and we'll need to
wait until the dust has settled after the HTML work is merged and tagged
before actually considering this refactoring.

What it does:

A single "doctree" node type, Node. Mostly immutable apart from links
to parents and children nodes. HTML tag attributes and additional metadata
are stored in ImmutableDicts, much better for our use case than Dict.
Since Node is immutable to set the correct autolink for at-ref links
a Documenter.Renderers.attributes function is provided that can be overloaded
to map the correct URLs during the rendering stage based on data collected
during earlier stages.
The current Documents.Document type will be stored instead as .metadata in
the root Node of the doctree. Every Node will have access to it by traversing
their .parent nodes. This doesn't appear to be much of a bottleneck, so adding
a direct .root Node doesn't seem worth it.
This uses the Selectors module to avoid dynamic dispatch when rendering doctrees
rather than the current approach where we're looping over Vector{Any}s. Everything
important appears to be type-stable as far as I can tell so far.
Using a single unified doctree structure should make it possible to add new outputs
by simply adding a new Renderers/<format> module that matches the layout of the
others. Adding a new input format should just need a new function like mdconvert
that builds a tree of Nodes from files. (Not that there are any other parsers yet...)
Throughout the rest of the internal stages, i.e. not the input or output stages, we
should just be able to work with Node (and Tag) objects. Extra formatting can just
be stripped by Renderers that can't handle them. This should mean we can remove all
the markdown-specific bits that are used internally and replace them with just the
parts in HTMLWriter that are currently being worked on.
"navtrees" and next/prev pages should be pretty easy to construct directly from the
doctree without needing too much work, though I've not yet looked too carefully into
that.

@mortenpi, if you have any questions/suggestions/concerns then you're very welcome to raise them here.

Ref #168. Definitely not in a mergable/usable state at the moment and we'll need to wait until the dust has settled after the HTML work is merged and tagged before actually considering this refactoring. What it does: * A single "doctree" node type, `Node`. Mostly immutable apart from links to parents and children nodes. HTML tag attributes and additional metadata are stored in `ImmutableDict`s, much better for our use case than `Dict`. * Since `Node` is immutable to set the correct autolink for `at-ref` links a `Documenter.Renderers.attributes` function is provided that can be overloaded to map the correct URLs during the rendering stage based on data collected during earlier stages. * The current `Documents.Document` type will be stored instead as `.metadata` in the root `Node` of the doctree. Every `Node` will have access to it by traversing their `.parent` nodes. This doesn't appear to be much of a bottleneck, so adding a direct `.root` `Node` doesn't seem worth it. * This uses the `Selectors` module to avoid dynamic dispatch when rendering doctrees rather than the current approach where we're looping over `Vector{Any}`s. Everything important appears to be type-stable as far as I can tell so far. * Using a single unified doctree structure *should* make it possible to add new outputs by simply adding a new `Renderers/<format>` module that matches the layout of the others. Adding a new input format should just need a new function like `mdconvert` that builds a tree of `Node`s from files. (Not that there are any other parsers yet...) * Throughout the rest of the internal stages, i.e. not the input or output stages, we should just be able to work with `Node` (and `Tag`) objects. Extra formatting can just be stripped by `Renderers` that can't handle them. This should mean we can remove all the markdown-specific bits that are used internally and replace them with just the parts in `HTMLWriter` that are currently being worked on. * "navtrees" and next/prev pages should be pretty easy to construct directly from the doctree without needing too much work, though I've not yet looked too carefully into that.

codecov-io · 2016-08-08T14:21:33Z

Current coverage is 61.75% (diff: 0.00%)

Merging #184 into master will decrease coverage by 20.76%

@@             master       #184   diff @@
==========================================
  Files            19         25     +6   
  Lines          1213       1561   +348   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           1001        964    -37   
- Misses          212        597   +385   
  Partials          0          0

Powered by Codecov. Last update 25a6f2f...6b1f198

mortenpi · 2016-08-17T13:32:02Z

I didn't see this in the code, but the current Pages would also become Nodes? So we'd have the root node, which would have a bunch of Node(:PageTag) (or something) as children?

Secondly, my impression is that currently we'd convert the document into basically the HTML DOM structure we have, including doing many of the rendering decisions? I am wondering if it would make more sense to keep the nodes more semantic and let the renderer decide more? I.e. in that sense I wouldn't deviate that much from what we have now (a DocsNode etc), just implement our own tags do that we'd have more control, could add metadata, parents etc (as opposed to mixing Base.Markdown objects with Documenter.Documents objects etc). Not sure what it would look like exactly, but basically I wouldn't add much additional formatting information (e.g. how exactly DocsNode divides up into a header div and a body div). Also not sure how well that would work with a unified Node type.

I am very much in favor of having access to the whole tree and the ability to store metadata in the tree (mostly in the root node, as you said). Would be so much nicer if you wouldn't have to pass stuff along all the time.

Just these quick questions for now, I should to give it some more thought at some point.

MichaelHatherly · 2016-08-17T13:57:44Z

I didn't see this in the code, but the current Pages would also become Nodes? So we'd have the root node, which would have a bunch of Node(:PageTag) (or something) as children?

Yes, they'd just be Nodes. Probably named similar to the #TEXT# and #NULL# tags, probably just #PAGE#. We'd also need a #SECTION# tag as well that contains several #PAGE#s to handle the navmenu I'd think.

Secondly, my impression is that currently we'd convert the document into basically the HTML DOM structure we have, including doing many of the rendering decisions?

Yes, pretty much that. Convert MD (and its friends) to Node objects as early as possible. Manipulate the resulting tree, i.e. expand all @ nodes, do links, adding styling, and other things like that. Then pass the tree to the renderer and let it output as much of the tree as possible. For example: the markdown renderer would just ignore things like div tags. As far as I can tell we'd probably not loose any of the current markdown output.

Not sure what it would look like exactly, but basically I wouldn't add much additional formatting information (e.g. how exactly DocsNode divides up into a header div and a body div). Also not sure how well that would work with a unified Node type.

I think it'll probably work fine with a single Node type, since something like the markdown parser just ignores things like classes and ids, and just prints out a best approximation to what the HTML one does.

Would be so much nicer if you wouldn't have to pass stuff along all the time.

Definitely agree on that one.

MichaelHatherly added Priority: Low Status: In Progress Type: Enhancement Type: Decision Type: Feature labels Aug 8, 2016

MichaelHatherly added this to the 0.4 milestone Aug 8, 2016

mortenpi mentioned this pull request Aug 15, 2016

Implement HTML output #171

Merged

12 tasks

mortenpi mentioned this pull request Aug 18, 2016

Overwriting Julia's internal docstring objects #208

Closed

MichaelHatherly mentioned this pull request Oct 15, 2016

Including doc pages from other packages #319

Open

mortenpi modified the milestones: 1.0, 0.6 Oct 17, 2016

mortenpi mentioned this pull request Oct 31, 2018

Formalize the syntax that a writer is guaranteed to support #874

Open

mortenpi removed the Priority: Low label Jul 20, 2020

mortenpi mentioned this pull request Sep 23, 2022

Use MarkdownAST throughout Documenter #1948

Merged

mortenpi closed this in #1948 Sep 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Towards a unified doctree #184

Towards a unified doctree #184

MichaelHatherly commented Aug 8, 2016

codecov-io commented Aug 8, 2016 •

edited

Loading

mortenpi commented Aug 17, 2016

MichaelHatherly commented Aug 17, 2016

Towards a unified doctree #184

Towards a unified doctree #184

Conversation

MichaelHatherly commented Aug 8, 2016

codecov-io commented Aug 8, 2016 • edited Loading

Current coverage is 61.75% (diff: 0.00%)

mortenpi commented Aug 17, 2016

MichaelHatherly commented Aug 17, 2016

codecov-io commented Aug 8, 2016 •

edited

Loading