Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Extra lex results when dealing with text within a list #2684

Closed
Bistard opened this issue Dec 14, 2022 · 3 comments
Closed

[Question] Extra lex results when dealing with text within a list #2684

Bistard opened this issue Dec 14, 2022 · 3 comments

Comments

@Bistard
Copy link

Bistard commented Dec 14, 2022

Marked version:4.0

Describe the bug
Given the following plain text:

This is a paragraph token
* This is a text token

The following data is the lexing result that I copied from the marked demo website:

[
{type:"paragraph", raw:"This is a paragraph token\n", text:"This is a paragraph token", tokens:[
  {type:"text", raw:"This is a paragraph token", text:"This is a paragraph token"}
]}
{type:"list", raw:"* This is a text token", ordered:false, start:"", loose:false, items:[
  {type:"list_item", raw:"* This is a text token", task:false, checked:undefined, loose:false, text:"This is a text token", tokens:[
    {type:"text", raw:"This is a text token", text:"This is a text token", tokens:[
      {type:"text", raw:"This is a text token", text:"This is a text token"}
]}
]}
]}
]

In the lex part of the list. I am not sure what is the expected behavior should be seen here:

  1. If the text token is expected, then its children token I think is totally redundant.
  2. If the text token is not expected, I believe maybe the correct one is the paragraph token?

P.S. I checked on the demo website from CommonMark Demo, The lex result from the same plain texts are shown as following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>This is a paragraph token</text>
  </paragraph>
  <list type="bullet" tight="true">
    <item>
      <paragraph>
        <text>This is a text token</text>
      </paragraph>
    </item>
  </list>
</document>

To Reproduce
Steps to reproduce the behavior: Copies the given plain text to the marked demo website.

@Bistard
Copy link
Author

Bistard commented Dec 14, 2022

I have a follow-up question:

I found out that in the marked.d.ts, the interface for text token is:

interface Text {
      type: 'text';
      raw: string;
      text: string;
      tokens?: Token[] | undefined;
}

In what kind of situation that a text token will have a list of children tokens? In my understanding, A text token is more like an inline token, if it suppose to have children tokens, then isn't it suppose to be a paragraph token which is a real block token?

I am not really familiar with markdown parsing and lexing. If I stated some points that are terribly wrong, please point me out 😃 .

@Bistard
Copy link
Author

Bistard commented Dec 14, 2022

P.S.S. This question might be similar to #2670.

@UziTech
Copy link
Member

UziTech commented Dec 14, 2022

This is working as intended. in marked there are block text token, inline text tokens, and block paragraph tokens for plain text depending on the context. Block paragraph tokens are wrapped in <p> tags. Block and inline text tokens are not wrapped in anything. Block text tokens can have other inline tokens inside of them. I think the only time we actually have block text tokens is in lists since we don't want them wrapped in <p> tags unless the list is loose. I am simplifying here because markdown rules can become strange when dealing with edge cases. But it is intentional to have block text tokens that contain inline text tokens in lists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants