Add generic syntax for extensions/plugins #99

SimonSapin · 2014-08-09T22:49:29Z

Rust uses Hoedown for its documentation. We would like to add a bunch of new features (cross-references, math, …) that would need to be represented somehow in Markdown source. Rather than inventing new punctuation-based syntax for each of them, I believe (as discussed here) that we should only have one additional inline-level syntax and one block-level that each include the name of the "plugin". The built-in HTML renderer would not implement any plugin, it’s up to rustdoc and other users to do that in a custom renderer.

So, proposal. I’m not attached to the syntax or the naming, feel free to change them as you like. I just picked something similar to reStructuredText.

New inline (span) syntax for a "role": :role-name:content. Like in code spans, backticks can be repeated and whitespace is trimmed: `:foo: ` ``` (the content here is a single backtick.)

New block-level syntax for a "directive":

.. directive-name:: header
    indented content

    … that goes on until the end of the indentation, like unfenced code blocks.

Renderers would have two new additional callbacks.

struct hoedown_renderer {
    // ...
    void (*directive)(hoedown_buffer *ob,  const hoedown_buffer *name,  const hoedown_buffer *header, const hoedown_buffer *text,void *opaque);
    int (*role)(hoedown_buffer *ob, const hoedown_buffer *name, const hoedown_buffer *text, void *opaque);
}

The default behavior when the callback is NULL or returns NULL/0 would be as usual: directives are skipped and roles are printed verbatim. The built-in HTML renderer would not provide these callbacks.

Does this sound like a good idea?

The text was updated successfully, but these errors were encountered:

mildsunrise · 2014-08-09T23:14:27Z

I love how the content is specified inside codespans and codeblocks; so even if the syntax is not supported, you still get a nice codeblock as fallback:

See :ref:hoedown_document_free(hoedown_document *doc) for more info.

I'd like to suggest something: for the directive syntax, could make it use the fenced code block syntax? Something like:

~~~ directive-name:: header
Directive content goes here.
~~~

This would be trivial to parse, and again if the plugin is not supported you get a simple codeblock:

Directive content goes here.

Side note: It'd actually be a good idea to have math in its own extension, between $dollars$ for similarity to TeX. Mathematics StackExchange do it this way. Cross references are a more specific feature, though.

SimonSapin · 2014-08-09T23:28:36Z

Yeah, fenced code blocks could be used for block-level directives, but kinda feels like an abuse. And maybe not as nice as the reStructuredText syntax for directives that are not as "content-oriented" as code blocks. See for example this source documentation using autodoc: https://raw.githubusercontent.com/mitsuhiko/flask/master/docs/api.rst

Yes, math might be worth it’s own syntax. But rendering math is non-obvious, and so maybe out of scope for Hoedown?

devinus · 2014-08-09T23:43:49Z

Well frequently requested extensions (as long as they remained maintained) are never out of scope for Hoedown.

SimonSapin · 2014-08-10T09:11:44Z

To clarify the proposed directive syntax: both header and content are optional.

.. include:: README.md
.. table-of-contents::

mildsunrise · 2014-08-10T09:20:09Z

But rendering math is non-obvious, and so maybe out of scope for Hoedown?

Just like fenced code blocks. The callback can highlight the code, or leave that to the client or postprocessor. The HTML renderer does the latter, but it can be overriden.

In the case of math, the most appropiate for HTML would be to output it without changes, so that MathJax can display it.

mildsunrise · 2014-08-10T09:48:31Z

And maybe not as nice as the reStructuredText syntax for directives that are not as "content-oriented" as code blocks.

You're right. So, if we use this syntax, could we enforce a blank line between the directive header and the content? i.e. this would be invalid:

.. directive-name:: header
    indented content

And should be written as:

.. directive-name:: header

    indented content

That way, we get a nice codeblock if the extension's not present.

SimonSapin · 2014-08-10T09:51:55Z

I don’t think it is necessary to have the syntax be compatible a Markdown parser that doesn’t know about directives. The Hoedown parser would always be the same, it’s only different renderers that would do different things with directives. The built-in one could render them the same as code blocks.

mildsunrise · 2014-08-10T09:56:43Z

I think it'd make a good effect. For instance, a README.md file containing roles and directives would be displayed by Github in a neat way. IMO it's important for a generic syntax to have fallback.

Roles are already compatible, and directives only need a blank line before the content (which you always use on that file you've linked).

SimonSapin · 2014-08-10T10:00:05Z

You can always use a blank line if you want, I don’t know if it has to be required. reStructuredText separates the indented text into parameters (before the first blank line) and content (after):

.. autoclass:: Request
   :members:

   .. attribute:: form

      A :class:`~werkzeug.datastructures.MultiDict` with the parsed form data from `POST`
      or `PUT` requests.  Please keep in mind that file uploads will not
      end up here,  but instead in the :attr:`files` attribute.

But I don’t think we need that at the Hoedown parser level. Specific can always parse the content with their own syntax.

mildsunrise · 2014-08-10T10:02:10Z

So, rst requires a blank line to be present to separate the content from {directive, header, parameters}?

SimonSapin · 2014-08-10T10:03:48Z

I don’t know, the general documentation for directive syntax doesn’t mention it, so it might be specific to some directives:

http://docutils.sourceforge.net/docs/ref/rst/directives.html

Directives have the following syntax:

+-------+-------------------------------+
| ".. " | directive type "::" directive |
+-------+ block                         |
        |                               |
        +-------------------------------+

mildsunrise · 2014-08-10T10:04:41Z

Hmm, I understand...

mildsunrise · 2014-08-10T14:24:59Z

But there's something I still don't understand: how should the parser treat the content inside directives?
Should it parse it as Markdown? Should it parse it looking for nesting directives?

SimonSapin · 2014-08-10T14:27:46Z

It should pass the raw (dedented) source to the renderer callback. Additionally, the callback should have the ability to parse that content as Markdown (including nested directives), but it doesn’t have to.

mildsunrise · 2014-08-10T14:30:26Z

Additionally, the callback should have the ability to parse that content as Markdown (including nested directives), but it doesn’t have to.

Well, that ability would require exposing the nuts and guts of the parser (aka parse_block and parse_inline) since calling hoedown_document_render when a render is in progress will result in undefined behaviour.

mildsunrise · 2014-08-10T14:31:36Z

That, or the renderer should itself keep another hoedown_document to render content with.

SimonSapin · 2014-08-10T14:33:38Z

There are apparently multiple possible approaches to parsing nested things for a renderer callback, but I don’t know enough about Hoedown’s internals to have an opinion on which is preferable.

mildsunrise · 2014-08-10T14:36:12Z

?? AFAIK no nested parsing can / is currently done at the renderer.

SimonSapin · 2014-08-10T14:37:00Z

Yes, that’s what I mean. Support for this would need to be added somehow.

mildsunrise · 2014-08-10T14:38:15Z

And that would mean exposing internal parser methods publicly. I'm not sure about that.

SimonSapin · 2014-08-10T14:38:58Z

I don’t know how this would be achieved.

uranusjr · 2014-08-10T15:08:29Z

If contents inside a directive block need to be parsed, we would need a parsing function to override, not a rendering one—and then you’ll have issues like #94, and the following would need to be answered:

Should Hoedown make its parser extendable? If the answer is no, things end here.
Are the current parser internals suitable to be made public as API? If yes, we’re good. But IMO the answer is definitely no.
How much internals should be made public? parse_block and parse_inline would be enough to “parse the content as Markdown”, but many parsers expose quite a lot, including docutils, reST’s reference implementation.
How should they be exposed? Simply make everything non-static is not a good idea, but we’ll need some way to abstract out at least some the parsing part to keep internal things internal. This is not impossible, as many other parsers do that, but much work would need to be done before we get to the custom directive part.

mildsunrise · 2014-08-10T15:43:33Z

If contents inside a directive block need to be parsed, we would need a parsing function to override, not a rendering one—and then you’ll have issues like #94,

Yes, in the sense that you need access to Hoedown's internals.

Should Hoedown make its parser extendable? If the answer is no, things end here.

You're not really adding new syntaxes to the parser, you have a fixed, generic syntax and you're parsing the content inside that syntax -- be it Markdown or not. Pretty much like you'd parse a codeblock's text to highlight it.

So no, technically you don't need to extend the parser to make it parse roles, you need to override the renderer to parse your custom roles content. Not the best place to do it, but...

How much internals should be made public? parse_block and parse_inline would be enough to “parse the content as Markdown”

I think that would be enough. If closer integration is required, an extension is the right way to do it.

Are the current parser internals suitable to be made public as API? If yes, we’re good. But IMO the answer is definitely no.

They are internals. They aren't expected to be called by the user, directly. Doing that otherwise than in specific cases is error prone unless user knows how Hoedown works internally.

Yes, they are modular and IMO are ready to be exposed, but since they're internals, the user must know how to use them.

How should they be exposed? Simply make everything non-static is not a good idea, but we’ll need some way to abstract out at least some the parsing part to keep internal things internal. This is not impossible, as many other parsers do that, but much work would need to be done before we get to the custom directive part.

Really agree on that. And I'm afraid we'd be killing the simplicty of Hoedown's API in the process.

mildsunrise · 2014-08-10T16:09:31Z

Specifically, calling a Hoedown internal is only acceptable if:

We're in the context of a render, that is, inside a renderer callback.
The data that we'll be passing to the internal also comes from Hoedown, i.e. from the callback parameters.

If these conditions apply, calling parse_block or parse_inline is safe and straighforward.

Calling a hoedown internal in any other case will result in undefined behaviour. (wrong output in the best case, crashing in the worst case)

mildsunrise · 2014-09-05T08:01:32Z

I've found a problem while trying to implement this.
parse_block mutates the data as it parses it.

Therefore things like that:

void rndr_directive(hoedown_buffer *ob,  const hoedown_buffer *name,  const hoedown_buffer *header, const hoedown_buffer *text, void *opaque) {
    BUFPUTSL(ob, "<div class=\"directive-");
    hoedown_buffer_put(ob, name->data, name->size);
    BUFPUTSL(ob, "\">");

    parse_block(ob, document, text->data, text->size);

    BUFPUTSL(ob, "</div>");
}

Would fail since you cannot modify text, it's marked const. You'd have to create and maintain a temporal buffer and copy the data to it.

uranusjr · 2014-09-05T10:36:09Z

Why does the parser need to mutate the input data? If there’s no good reason for it, I propose just change all parsing functions to take const uint8_t * instead. I just skimmed through document.c, and most usages of data that isn’t compatible with this are for setting the work buffer to be used in the renderer like this:

hoedown_buffer work = { NULL, 0, 0, 0, NULL, NULL, NULL };
work.data = data + some_offset;
work.size = some_size;
doc->md.some_rendering_callback(ob, &work, doc->md.opaque);

which can be easily refactored into

hoedown_buffer *work = hoedown_buffer_new(64);
hoedown_buffer_put(work, data + some_offset, some_size);
doc->md.some_rendering_callback(ob, work, doc->md.opaque);
hoedown_buffer_free(work);

And I think this makes the code more readable. I would work on this this weekend if there’re no issues. Shouldn’t be too much trouble.

mildsunrise · 2014-09-05T10:49:50Z

Why does the parser need to mutate the input data? If there’s no good reason for it, I propose just change all parsing functions to take const uint8_t * instead.

I completely agree.

Just spent an hour changing all parsers, triggers and prefix checkers to accept const uint8_t * and luckily the only point where the data is actually mutated is in parse_blockquote, here.

It seems this was an optimization for performance (instead of copying, just move the data in place) but I think it's unacceptable. This will be changed.

Everything else is fine.

most usages of data that isn’t compatible with this are for setting the work buffer

This would be fixed with a cast. We know the callback won't modify the data, since it's passed a const hoedown_buffer *, so it should be safe.

Doing hoedown_buffer_new and putting the data there would decrease performance (allocation and copying takes time) and we don't really need it since the data isn't being modified.

uranusjr · 2014-09-05T12:28:12Z

But wouldn’t you still be able to modify contents of uint8_t *data, even if it’s contained in a const hoedown_buffer *? For example:

void (*hrule)(hoedown_buffer *ob, void *opaque)
{
    ob->data[5] = '\0';
    // ...
}

I mean, obviously the bundled HTML renderer doesn’t do this, and custom renderers shouldn’t do it. But the point is that const hoedown_buffer * doesn’t make the data pointer inside it safe.

Another way to ensure data safety is to specifically declare an immutable buffer type:

struct hoedown_buffer {
    const uint8_t *data;
    size_t size;
    size_t asize;
    size_t unit;

    hoedown_realloc_callback data_realloc;
    hoedown_free_callback data_free;
    hoedown_free_callback buffer_free;
};

and use it instead of hoedown_buffer at the right places. But maybe I’m just overthinking this because it should not be modified, and the user should take the consequences if he/she does something like this.

mildsunrise · 2014-09-05T13:36:43Z

Yes, I know you can modify the data pointed to by the buffer, but if you use any of hoedown_buffer_ functions on a const hoedown_buffer *, the compiler will warn about it.

Even if you don't pay attention to the warnings (or don't enable them), the assertion for buf->unit > 0 will fail and the program will crash. A buffer with unit = 0 is read-only.

SimonSapin · 2014-09-16T18:41:23Z

This might be relevant: http://talk.commonmark.org/t/generic-directives-plugins-syntax/444/39

SimonSapin · 2014-09-16T18:42:34Z

The syntax is the origin message here was only a proposal. If the CommonMark folks settle on some syntax, it’d be preferable to align with that.

mildsunrise · 2014-09-16T19:10:01Z

Hmm... I think we can leave this aside for a bit, while we center on finishing v3 and adopting CommonMark for v4.

mildsunrise mentioned this issue Aug 10, 2014

Support math within dollars #100

Closed

mildsunrise added the enhancement label Aug 12, 2014

mildsunrise mentioned this issue Sep 14, 2014

Renderer API #124

Merged

uranusjr mentioned this issue Jan 3, 2015

Render custom markdown tags MacDownApp/macdown#240

Closed

mitaa mentioned this issue Mar 10, 2016

Broken links in String documentation from Deref impl rust-lang/rust#32129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic syntax for extensions/plugins #99

Add generic syntax for extensions/plugins #99

SimonSapin commented Aug 9, 2014

mildsunrise commented Aug 9, 2014

SimonSapin commented Aug 9, 2014

devinus commented Aug 9, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

uranusjr commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Sep 5, 2014

uranusjr commented Sep 5, 2014

mildsunrise commented Sep 5, 2014

uranusjr commented Sep 5, 2014

mildsunrise commented Sep 5, 2014

SimonSapin commented Sep 16, 2014

SimonSapin commented Sep 16, 2014

mildsunrise commented Sep 16, 2014

Add generic syntax for extensions/plugins #99

Add generic syntax for extensions/plugins #99

Comments

SimonSapin commented Aug 9, 2014

mildsunrise commented Aug 9, 2014

SimonSapin commented Aug 9, 2014

devinus commented Aug 9, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

SimonSapin commented Aug 10, 2014

uranusjr commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Aug 10, 2014

mildsunrise commented Sep 5, 2014

uranusjr commented Sep 5, 2014

mildsunrise commented Sep 5, 2014

uranusjr commented Sep 5, 2014

mildsunrise commented Sep 5, 2014

SimonSapin commented Sep 16, 2014

SimonSapin commented Sep 16, 2014

mildsunrise commented Sep 16, 2014