Allow Markdoc to render HTML tags #613
Replies: 3 comments 4 replies
-
Thanks for this Alex! This feels like a welcome addition to make Markdown -> Markdoc migration smoother. Bikeshed: I'm curious if |
Beta Was this translation helpful? Give feedback.
-
Regarding the concerns around JavaScript/XSS, Maybe I'm just over-thinking it, because Astro currently doesn't do anything to prevent this in the existing Markdown/MDX rendering (you can happily throw in a So if XSS needed to be solved, I would think that should probably be a separate roadmap discussion which would span Markdown, Markdoc and MDX equally. A proper XSS solution, if ever implemented, is not trivial (see OWASP recommendations) and given how Astro's rendering pipeline works in conjunction with Markdoc, there's not a clear way to deal with this using the OWASP recommended pattern (which is to apply a sanitization function to a completely rendered chunk of HTML via the DOMPurify library) |
Beta Was this translation helpful? Give feedback.
-
Closing since this shipped in the latest release of Markdoc! See the |
Beta Was this translation helpful? Give feedback.
-
Body
Summary
By default Markdoc does not render HTML in markup, instead, it's treated as raw strings that end up being escaped and rendered.
It would be very useful to have an opt-in ability to enable HTML in markup.
After all, Markdoc purports to be a superset of Markdown, which does support HTML in markup.
So converting a large corpus of existing Markdown which may contain HTML markup, and/or converting a corpus of markup from another CMS-like system which is HTML based into Markdoc is problematic if one cannot render that HTML markup.
Background & Motivation
By default, Markdoc tokenizes markup using markdown-it with HTML processing disabled.
The Markdoc GitHub repo had a previous discussion in a long-running issue and PR about baking HTML support directly into the markdoc library, however, the end result of this was the development of a solution that provides the desired result (full HTML markup integration) that did not require any changes to the markdoc library (see the issue and PR solution example) and is viewed as the ideal way to deal with this problem.
The solution is to apply a pure functional transform on either the tokens or AST (token time is easier) which parses HTML in the Markdoc markup and replaces the raw HTML string tokens with Markdoc Tag tokens that describe the HTML element that was parsed, while interleaving the HTML tree with any other Markdoc tags/nodes that are interleaved in the markup.
Goals
Example
The pure functional token transform process is basically:
A complete working example can be found in this repo: https://github.com/alex-sherwin/astro-markdoc-html/tree/main complete with unit test coverage
Overall, this solution requires:
Concerns
There are potential security concerns with enabling this, since you're taking HTML markup and rendering it.
However, in theory, this shouldn't be materially different then rendering the HTML markup in Markdown/MDX.
I think the test cases should ensure that a few things are not possible to render via this feature, such as
<script>
tags or inline JavaScript in HTML attribute event handlers.Beta Was this translation helpful? Give feedback.
All reactions