Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New base markdown parser #324

Closed
20 of 31 tasks
pngwn opened this issue Oct 16, 2021 · 5 comments
Closed
20 of 31 tasks

New base markdown parser #324

pngwn opened this issue Oct 16, 2021 · 5 comments
Milestone

Comments

@pngwn
Copy link
Owner

pngwn commented Oct 16, 2021

As per this discussion mdsvex v1 will have a custom flavour of markdown that aims to be simpler and come with a few additional features that are pretty much expected at this point. This is a tracking issue for this work.

This does not include the svelte parser, the aim is to keep the svelte and markdown parsers separate and compose them in another step. This may prove to be too challenging due to the fact that the svelte syntax can appear almost anywhere in a mdsvex file but I'll see how that pans out.

The parsing strategy listed in the markdown spec is not really appropriate in this instance due to the fact that HTML (and Svelte) syntax will be parsed into a full AST and does not match all markdown semantics.


Broadly speaking there are a few different 'contexts' that you can be in when parsing a markdown file.

  • document, can contain leaf_block or container_block.
  • leaf_block cannot contain other blocks but can contain inline.
  • container_block can contain leaf_block, or additional container_block, container block is recursive.
  • inline cannot contain blocks but some inline nodes can nest other inline nodes.
 → Document →
   → Leaf Block ↑
     → Inline ⟳
   → Container Block ⟳
     → Leaf Block ↑
     → Container Block ⟳
       → ...

Specific nodes have additional rules on top of these very basic ones.


Probably stuff I've missed here, will be updated as we go.

Leaf blocks

Inline

Container Blocks

Bugs requiring tests to be closed by this:

@pngwn pngwn moved this to In Progress in mdsvex Oct 16, 2021
@pngwn pngwn added this to the 1.0 milestone Oct 16, 2021
@boehs
Copy link

boehs commented Oct 19, 2021

What is the plan for the actual parser layout? I think the dream for me is instead of hard coded parsing like most parsers somehow each inline, leaf and container are "registered", like plugins. This makes it amazingly easy to add new ones

I don't know how this would look but basically an abundance of modularity

@pngwn
Copy link
Owner Author

pngwn commented Oct 19, 2021

The parser design is more in the opposite direction, custom syntax extension won't be allowed. This is partly for performance and partly for simplicity. Markdown is bad enough on its own (although this flavour is simplified), add html + svelte into the mix and it gets even more complex, this is before there are any syntax extensions. While it would easy to register them, every time you add more syntax you run increase the chance of collisions.

If there is anything compelling then it could be added to the core as a new node, but I think most cases can be handled via the nodes that are already there using custom transforms + compilers. This is just the parser and will output an AST (will be specced and versioned). The process will broadly speaking be:

  • parse(src) -> AST
  • transform(ast, plugins) -> AST
  • compile(ast, handlers) -> SvelteComponent/ HTML/ whatever the custom handlers render

Most mdsvex features will be handled via plugins (there may be one or two exceptions).

Is there any specific functionality you had in mind when thinking about modularity.

This is just for the markdown parser. There will be 3 parsers: markdown, svelte, mdsvex (svelte + markdown).

@boehs
Copy link

boehs commented Oct 19, 2021

my problem with markdown parsers is
image

ideally each syntax piece has its own file

@pngwn
Copy link
Owner Author

pngwn commented Oct 19, 2021

Why does that matter? As long as the maintainers are comfortable maintaining it then it isn't an issue.

I also have a preference for large files over many files. It has no bearing on the API and users won't know the difference, but this will be no different, most things will be inlined for performance reasons.

@pngwn pngwn moved this from In Progress to Todo in mdsvex Jan 19, 2022
@Chaostheorie
Copy link

Chaostheorie commented Jun 2, 2022

Just my 2cts but what stands in the way of a generated parser (like lezer) for generating the AST from the markdown/svelte source code?
While the grammar may end up being complex it should be a good middle ground between a modular and a full self-written approach. It may also allow you (given the generator is popular) to use an existing markdown grammar and just extend the svelte parts as seen fit.

Edit: It seems I underestimated markdowns parsability. Though the lezer implementation may still be a good starting point

@pngwn pngwn added this to mdsvex Feb 22, 2024
@pngwn pngwn changed the title New markdown parser New base markdown parser Feb 22, 2024
@pngwn pngwn mentioned this issue Feb 23, 2024
27 tasks
@pngwn pngwn moved this to Todo in mdsvex Feb 23, 2024
@pngwn pngwn closed this as completed Aug 17, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in mdsvex Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: Todo
Development

No branches or pull requests

3 participants