Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a unified doctree #184

Closed
wants to merge 1 commit into from
Closed

Towards a unified doctree #184

wants to merge 1 commit into from

Conversation

MichaelHatherly
Copy link
Member

Ref #168.

Definitely not in a mergable/usable state at the moment and we'll need to
wait until the dust has settled after the HTML work is merged and tagged
before actually considering this refactoring.

What it does:

  • A single "doctree" node type, Node. Mostly immutable apart from links
    to parents and children nodes. HTML tag attributes and additional metadata
    are stored in ImmutableDicts, much better for our use case than Dict.
  • Since Node is immutable to set the correct autolink for at-ref links
    a Documenter.Renderers.attributes function is provided that can be overloaded
    to map the correct URLs during the rendering stage based on data collected
    during earlier stages.
  • The current Documents.Document type will be stored instead as .metadata in
    the root Node of the doctree. Every Node will have access to it by traversing
    their .parent nodes. This doesn't appear to be much of a bottleneck, so adding
    a direct .root Node doesn't seem worth it.
  • This uses the Selectors module to avoid dynamic dispatch when rendering doctrees
    rather than the current approach where we're looping over Vector{Any}s. Everything
    important appears to be type-stable as far as I can tell so far.
  • Using a single unified doctree structure should make it possible to add new outputs
    by simply adding a new Renderers/<format> module that matches the layout of the
    others. Adding a new input format should just need a new function like mdconvert
    that builds a tree of Nodes from files. (Not that there are any other parsers yet...)
  • Throughout the rest of the internal stages, i.e. not the input or output stages, we
    should just be able to work with Node (and Tag) objects. Extra formatting can just
    be stripped by Renderers that can't handle them. This should mean we can remove all
    the markdown-specific bits that are used internally and replace them with just the
    parts in HTMLWriter that are currently being worked on.
  • "navtrees" and next/prev pages should be pretty easy to construct directly from the
    doctree without needing too much work, though I've not yet looked too carefully into
    that.

@mortenpi, if you have any questions/suggestions/concerns then you're very welcome to raise them here.

Ref #168.

Definitely not in a mergable/usable state at the moment and we'll need to
wait until the dust has settled after the HTML work is merged and tagged
before actually considering this refactoring.

What it does:

  * A single "doctree" node type, `Node`. Mostly immutable apart from links
    to parents and children nodes. HTML tag attributes and additional metadata
    are stored in `ImmutableDict`s, much better for our use case than `Dict`.

  * Since `Node` is immutable to set the correct autolink for `at-ref` links
    a `Documenter.Renderers.attributes` function is provided that can be overloaded
    to map the correct URLs during the rendering stage based on data collected
    during earlier stages.

  * The current `Documents.Document` type will be stored instead as `.metadata` in
    the root `Node` of the doctree. Every `Node` will have access to it by traversing
    their `.parent` nodes. This doesn't appear to be much of a bottleneck, so adding
    a direct `.root` `Node` doesn't seem worth it.

  * This uses the `Selectors` module to avoid dynamic dispatch when rendering doctrees
    rather than the current approach where we're looping over `Vector{Any}`s. Everything
    important appears to be type-stable as far as I can tell so far.

  * Using a single unified doctree structure *should* make it possible to add new outputs
    by simply adding a new `Renderers/<format>` module that matches the layout of the
    others. Adding a new input format should just need a new function like `mdconvert`
    that builds a tree of `Node`s from files. (Not that there are any other parsers yet...)

  * Throughout the rest of the internal stages, i.e. not the input or output stages, we
    should just be able to work with `Node` (and `Tag`) objects. Extra formatting can just
    be stripped by `Renderers` that can't handle them. This should mean we can remove all
    the markdown-specific bits that are used internally and replace them with just the
    parts in `HTMLWriter` that are currently being worked on.

  * "navtrees" and next/prev pages should be pretty easy to construct directly from the
    doctree without needing too much work, though I've not yet looked too carefully into
    that.
@codecov-io
Copy link

codecov-io commented Aug 8, 2016

Current coverage is 61.75% (diff: 0.00%)

Merging #184 into master will decrease coverage by 20.76%

@@             master       #184   diff @@
==========================================
  Files            19         25     +6   
  Lines          1213       1561   +348   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           1001        964    -37   
- Misses          212        597   +385   
  Partials          0          0          

Powered by Codecov. Last update 25a6f2f...6b1f198

@mortenpi mortenpi mentioned this pull request Aug 15, 2016
12 tasks
@mortenpi
Copy link
Member

I didn't see this in the code, but the current Pages would also become Nodes? So we'd have the root node, which would have a bunch of Node(:PageTag) (or something) as children?

Secondly, my impression is that currently we'd convert the document into basically the HTML DOM structure we have, including doing many of the rendering decisions? I am wondering if it would make more sense to keep the nodes more semantic and let the renderer decide more? I.e. in that sense I wouldn't deviate that much from what we have now (a DocsNode etc), just implement our own tags do that we'd have more control, could add metadata, parents etc (as opposed to mixing Base.Markdown objects with Documenter.Documents objects etc). Not sure what it would look like exactly, but basically I wouldn't add much additional formatting information (e.g. how exactly DocsNode divides up into a header div and a body div). Also not sure how well that would work with a unified Node type.

I am very much in favor of having access to the whole tree and the ability to store metadata in the tree (mostly in the root node, as you said). Would be so much nicer if you wouldn't have to pass stuff along all the time.

Just these quick questions for now, I should to give it some more thought at some point.

@MichaelHatherly
Copy link
Member Author

I didn't see this in the code, but the current Pages would also become Nodes? So we'd have the root node, which would have a bunch of Node(:PageTag) (or something) as children?

Yes, they'd just be Nodes. Probably named similar to the #TEXT# and #NULL# tags, probably just #PAGE#. We'd also need a #SECTION# tag as well that contains several #PAGE#s to handle the navmenu I'd think.

Secondly, my impression is that currently we'd convert the document into basically the HTML DOM structure we have, including doing many of the rendering decisions?

Yes, pretty much that. Convert MD (and its friends) to Node objects as early as possible. Manipulate the resulting tree, i.e. expand all @ nodes, do links, adding styling, and other things like that. Then pass the tree to the renderer and let it output as much of the tree as possible. For example: the markdown renderer would just ignore things like div tags. As far as I can tell we'd probably not loose any of the current markdown output.

Not sure what it would look like exactly, but basically I wouldn't add much additional formatting information (e.g. how exactly DocsNode divides up into a header div and a body div). Also not sure how well that would work with a unified Node type.

I think it'll probably work fine with a single Node type, since something like the markdown parser just ignores things like classes and ids, and just prints out a best approximation to what the HTML one does.

Would be so much nicer if you wouldn't have to pass stuff along all the time.

Definitely agree on that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants