Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Consistent Behavior of End-of-Line Characters Across Block-Level Tokens #3506

Open
Bistard opened this issue Oct 27, 2024 · 3 comments
Labels

Comments

@Bistard
Copy link

Bistard commented Oct 27, 2024

Marked version: 14.1.2

Background

This is not a bug, but rather a confusion from me. Consider the following text and tokenization result:

const token = lexer.lex('paragraph1\n');
// tokenization result
{type:"paragraph", raw:"paragraph1\n", text:"paragraph1", tokens:[
  {type:"text", raw:"paragraph1", text:"paragraph1"}
]}

I notice that the end of the line character \n only exists at the token.raw and undetectable under its children tokens or token.text. This is also confirmed by this previous issue I asked.

Expected behavior

My question is: Does this behaviour work for EVERY block-level token? That is, for every block-level token, when a '\n' character is at the end of that block, is it always only accessible and detectable in the token.raw property?

Example

I tested list, paragraph, heading, codeBlock, blockQuote in the official demo website. They seem to follow my expectations.

For example, the tokenization result from heading, codeBlock and BlockQuote tokens in my case is the following:

// '# Heading\n'
{type:"heading", raw:"# heading\n", depth:1, text:"heading", tokens:[
  {type:"text", raw:"heading", text:"heading"}
]}
// '> paragraph1\n'
{type:"paragraph", raw:"'> paragraph1\n", text:"'> paragraph1", tokens:[
  {type:"text", raw:"'> paragraph1", text:"'> paragraph1"}
]}
// '```ts\nconsole.log(1)\n```\n'
[
{type:"code", raw:"```ts\nconsole.log(1)\n```\n", lang:"ts", text:"console.log(1)"}
]

But I tried html token, seems like an exception:

// '<div>hi</div>\n'
[
{type:"html", block:true, raw:"<div>hi</div>\n", pre:false, text:"<div>hi</div>\n"}
]

Additionals

For hr token, since it only has the token.raw property but no token.text property, so this block-level token is not in the range of my question:

// '---\n'
{type:"hr", raw:"---"}
@Bistard Bistard changed the title [Question] Is every block-level token ignoring the end of the line [Question] Consistent Behavior of End-of-Line Characters Across Block-Level Tokens Oct 27, 2024
@UziTech
Copy link
Member

UziTech commented Oct 30, 2024

I don't think it is consistent. If you would like to create a PR to make it consistent we could get it in the next major version. 😁👍

@Bistard
Copy link
Author

Bistard commented Oct 30, 2024

OK. In the next few days or weeks, I will look up the source code and try to make it consistent through a PR.

@markedjs markedjs deleted a comment Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
@UziTech @Bistard and others