Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite how MDX is parsed #1039

Merged
merged 14 commits into from
May 20, 2020
Merged

Rewrite how MDX is parsed #1039

merged 14 commits into from
May 20, 2020

Conversation

wooorm
Copy link
Member

@wooorm wooorm commented Apr 29, 2020

This completely rewrites the parser, which was previously regex based and error prone, and now follows a well-defined state machine.

TL;DR:

  • a I <3 Markdown and JSX -> I &lt;3 Markdown and JSX
  • b <!--I'm a comment--> -> {/*I'm a comment*/}
  • c <h2>`code`</h2> -> ## `code`
  • d see new features below

Too long

Where MDXjs is different than Markdown

HTML

(mostly didn’t work, doesn’t work now)

Incorrect

# Hello, <span style=color:red>world</span>!
<!--To do: add message-->
<img>

Correct:

# Hello, <span style='color:red'>world</span>!
<img />
Indented code

(worked sometimes but mixed weirdly with indented elements; doesn’t work now)

Incorrect:

    console.log(1)

Correct:

```js
console.log(1)
```
Autolinks

(worked; doesn’t work now)

Incorrect:

See <https://example.com> for more information

Correct:

See [example.com](https://example.com) for more information.

Where MDXjs is different than JSX

Comments

(didn’t work, won’t work)

Incorrect:

<hi/*comment!*//>
<hello// comment!
/>

Correct:

<hi/>
<hello
/>
Elements / fragments as attribute values

(didn’t work, doesn’t work)

Incorrect:

<welcome name=<>Venus</> />
<welcome name=<span>Pluto</span> />

Correct:

<welcome name='Mars' />
<welcome name={<>Jupiter</>} />
> and } in text

(worked, still works)

Correct:

<>This is 3>1 fine, and you can have a closing brace } too</>
Brace counting in expressions

(worked, doesn’t work anymore)

Incorrect:

<punctuation
  data={{
    '{': false // Left curly brace
  }}
/>

Correct:

<punctuation
  data={{
    '{': false, // Left curly brace
    '}': false // Right curly brace
  }}
/>

Most likely breaking changes from MDXjs 1

You can’t have random < or { in text anymore

Incorrect:

I <3 Markdown and JSX

Correct:

I &lt;3 Markdown and JSX

You can’t have HTML comments anymore

Previously HTML comments were allowed. We aren’t lax anymore and going w/ JSX all the way.

Incorrect:

<!--I'm a comment-->

Correct:

{/*I'm a comment*/}

Blocks will be blocks

We wanted to make MDX easier to reason about. We’re now more chill about mixing Markdown with JSX, for example: this is perfectly fine: <Wrapper>## Heading</Wrapper>. Because Wrapper is a block, its contents is also block. Meaning, the heading works. By extension, this means that <Wrapper>Paragraph</Wrapper> is a paragraph in a wrapper. Which then results in <h2>Paragraph</h2> being a paragraph in a heading!

Incorrect:

<h2>`code`</h2>

Correct:

## `code`

Or:

<h2>{<code>code</code>}</h2>

New features

Indent

You can indent your tags and expressions the way you want (within reason):

<section className="message-container">

    <div className="thanks">

        <h2>Hi</h2>
        <p>Thanks so much!</p>

    </div>

</section>

Interleaving

These are all fine now:

<div>
# heading
</div>
<div># heading</div>
<div>
  # heading
</div>

Blocks

Blocks can have blank lines now:

<Playground reference={`
public class Whatever() {
    // The blank line below will cause parsing to fail.

}
`} />

Expressions everywhere

All works:

# Hello, <>{props.name}</>
# Hello, {props.name}
{<h2>{props.name}</h2>}

Related-to: GH-195. (fixes everything except for imports/exports)
Closes GH-556.
Closes GH-611.
Closes GH-628.
Closes GH-716.
Closes GH-755.
Closes GH-757.
Closes GH-767.
Closes GH-784.

@wooorm wooorm requested a review from johno April 29, 2020 15:40
@vercel
Copy link

vercel bot commented Apr 29, 2020

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/mdx/mdx/e0mp24bx8
✅ Preview: https://mdx-git-next-parsing.mdx.now.sh

@johno

This comment has been minimized.

@johno

This comment has been minimized.

@JounQin

This comment has been minimized.

@wooorm

This comment has been minimized.

@JounQin

This comment has been minimized.

@wooorm

This comment has been minimized.

@johno

This comment has been minimized.

@wooorm
Copy link
Member Author

wooorm commented Apr 30, 2020

(Although it should be fatal, so an error. https://github.com/vfile/vfile-to-eslint covers how to do that)

@JounQin
Copy link
Member

JounQin commented Apr 30, 2020

(Although it should be fatal, so an error. https://github.com/vfile/vfile-to-eslint covers how to do that)

There is only a single mdx/remark rule for all remark plugins, I don't know if it's possible to separate error levels for a single ESLint rule, I'll do some research for it.

@vercel vercel bot temporarily deployed to Preview April 30, 2020 15:31 Inactive
@wooorm

This comment has been minimized.

@JounQin

This comment has been minimized.

@wooorm

This comment has been minimized.

@vercel vercel bot temporarily deployed to Preview April 30, 2020 17:19 Inactive
@johno johno changed the base branch from master to next April 30, 2020 17:20
@vercel vercel bot temporarily deployed to Preview April 30, 2020 17:20 Inactive
@laurieontech
Copy link

Noticed the { or < in text won't work anymore. Is that still the case if they're in backticks? I suspect <Hello/> works because it has a closing tag. But <Hello would not?

@wooorm
Copy link
Member Author

wooorm commented May 5, 2020

Thanks for trying it out!

I indeed decided to throw parse errors on those, but there is an exception for code (so in backticks), and also in JavaScript (so in braces). On other cases, character references can be used (see “You can’t have random < or { in text anymore” above for examples!).

And for your example: that is correct!

Do you think this makes sense, or should it be different?

@laurieontech
Copy link

That seems reasonable. I think this comes down to docs to make it clear that in-line "text" inside of backticks is treated/parsed as code. So if you want an errant { you can do so by including it in backticks, but beyond that there isn't a way to escape a character.

@johno johno marked this pull request as ready for review May 20, 2020 17:06
@vercel vercel bot temporarily deployed to Preview May 20, 2020 17:39 Inactive
@vercel vercel bot temporarily deployed to Preview May 20, 2020 17:42 Inactive
Copy link
Member

@johno johno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟💟

@vercel vercel bot temporarily deployed to Preview May 20, 2020 18:05 Inactive
@johno johno merged commit 9d30680 into next May 20, 2020
@johno johno deleted the next-parsing branch May 20, 2020 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment