Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical syntax tree #19

Closed
b3nj5m1n opened this issue Jan 18, 2021 · 7 comments
Closed

Hierarchical syntax tree #19

b3nj5m1n opened this issue Jan 18, 2021 · 7 comments
Labels
question Further information is requested

Comments

@b3nj5m1n
Copy link

I would like to know why this parser doesn't parse the markdown file into a proper hierarchy based on sections, so for example, why the following markdown:

# Test Heading Level 1

## Test Heading Level 2

### Test Heading Level 3

Results in this tree:

document [0, 0] - [4, 24])
  atx_heading [0, 0] - [0, 22])
    atx_heading_marker [0, 0] - [0, 1])
    heading_content [0, 1] - [0, 22])
      text [0, 1] - [0, 22])
  atx_heading [2, 0] - [2, 23])
    atx_heading_marker [2, 0] - [2, 2])
    heading_content [2, 2] - [2, 23])
      text [2, 2] - [2, 23])
  atx_heading [4, 0] - [4, 24])
    atx_heading_marker [4, 0] - [4, 3])
    heading_content [4, 3] - [4, 24])
      text [4, 3] - [4, 24])

Instead of something like this:

document [0, 0] - [4, 24])
  atx_heading [0, 0] - [0, 22])
    atx_heading_marker [0, 0] - [0, 1])
    heading_content [0, 1] - [0, 22])
      text [0, 1] - [0, 22])
    atx_heading [2, 0] - [2, 23])
      atx_heading_marker [2, 0] - [2, 2])
      heading_content [2, 2] - [2, 23])
        text [2, 2] - [2, 23])
      atx_heading [4, 0] - [4, 24])
        atx_heading_marker [4, 0] - [4, 3])
        heading_content [4, 3] - [4, 24])
          text [4, 3] - [4, 24])

The current format seems to make certain things very hard, if not impossible, like for example proper folding.

Is there a proper way to achieve something like folding with the current tree, and if not, why was the design chosen to be the way it is, and would changing it be an option?

@ikatyang
Copy link
Owner

why this parser doesn't parse the markdown file into a proper hierarchy based on sections

Basically, that's what CommonMark spec described, ATX headings are considered leaf blocks instead of container blocks so there is no sub-block in ATX headings, the only container blocks are block quotes, list items, and lists as stated in the spec.

The current format seems to make certain things very hard, if not impossible, like for example proper folding.

I just checked how VSCode handled folding in markdown and it seems they apply an additional post-processing on the parsed tree to get the desired hierarchy.

Is there a proper way to achieve something like folding with the current tree

It seems there is a grouping sibling nodes feature that could be used for querying the "hierarchical headings", i.e., grouping all the nodes between two ATX headings with the same heading level, but I'll have to expose the literal aliases first so that you could give it a try, I've opened #20 to track the literal aliases issue.

@b3nj5m1n
Copy link
Author

@ikatyang Thanks for the quick response, are there any plans to include the folds.scm, hightlights.scm files etc. in this repo to make it easier for other users?

@ikatyang
Copy link
Owner

ikatyang commented Jan 24, 2021

I've added tokens that could be used for distinguishing heading level tokens in #22.


are there any plans to include the folds.scm, hightlights.scm files

folds.scm seems to be a nvim-tree-sitter-specific feature and I'm not sure if it works with non-hierarchical nodes, but if it does, here's the queries for heading contents:

(
  [
    (atx_heading (atx_h1_marker))
    (setext_heading (setext_h1_underline))
  ]
  ([
    (atx_heading (atx_h2_marker))
    (setext_heading (setext_h2_underline))
    (atx_heading (atx_h3_marker))
    (atx_heading (atx_h4_marker))
    (atx_heading (atx_h5_marker))
    (atx_heading (atx_h6_marker))
    (paragraph)
    (thematic_break)
    (link_reference_definition)
    (indented_code_block)
    (fenced_code_block)
    (html_block)
    (block_quote)
    (tight_list)
    (loose_list)
    (table)
  ])+ @h1_content
)
(
  [
    (atx_heading (atx_h2_marker))
    (setext_heading (setext_h2_underline))
  ]
  ([
    (atx_heading (atx_h3_marker))
    (atx_heading (atx_h4_marker))
    (atx_heading (atx_h5_marker))
    (atx_heading (atx_h6_marker))
    (paragraph)
    (thematic_break)
    (link_reference_definition)
    (indented_code_block)
    (fenced_code_block)
    (html_block)
    (block_quote)
    (tight_list)
    (loose_list)
    (table)
  ])+ @h2_content
)
(
  [
    (atx_heading (atx_h3_marker))
  ]
  ([
    (atx_heading (atx_h4_marker))
    (atx_heading (atx_h5_marker))
    (atx_heading (atx_h6_marker))
    (paragraph)
    (thematic_break)
    (link_reference_definition)
    (indented_code_block)
    (fenced_code_block)
    (html_block)
    (block_quote)
    (tight_list)
    (loose_list)
    (table)
  ])+ @h3_content
)
(
  [
    (atx_heading (atx_h4_marker))
  ]
  ([
    (atx_heading (atx_h5_marker))
    (atx_heading (atx_h6_marker))
    (paragraph)
    (thematic_break)
    (link_reference_definition)
    (indented_code_block)
    (fenced_code_block)
    (html_block)
    (block_quote)
    (tight_list)
    (loose_list)
    (table)
  ])+ @h4_content
)
(
  [
    (atx_heading (atx_h5_marker))
  ]
  ([
    (atx_heading (atx_h6_marker))
    (paragraph)
    (thematic_break)
    (link_reference_definition)
    (indented_code_block)
    (fenced_code_block)
    (html_block)
    (block_quote)
    (tight_list)
    (loose_list)
    (table)
  ])+ @h5_content
)
(
  [
    (atx_heading (atx_h6_marker))
  ]
  ([
    (paragraph)
    (thematic_break)
    (link_reference_definition)
    (indented_code_block)
    (fenced_code_block)
    (html_block)
    (block_quote)
    (tight_list)
    (loose_list)
    (table)
  ])+ @h6_content
)

Including highlights.scm in this repo sounds good to me, but the question is that I'm not sure what should be highlighted and how to name those queries, is there a standard way to write these queries?

@b3nj5m1n
Copy link
Author

Thanks, this also made it much easier to colour headings of different levels in different colours. I tried the query you provided, and it didn't work, I'll have a look at it later and see if I can find the error.

Including highlights.scm in this repo sounds good to me, but the question is that I'm not sure what should be highlighted and how to name those queries, is there a standard way to write these queries?

There's a few things that should definitely be highlighted, the most obvious one that comes to mind would be headings. It would also be helpful for styled text, liked bold, italic, etc., and instead of making the highlight a colour, it would be the text in bold, italic, etc.

I'm not sure if there's a standard way to write these, one could maybe look at the highlights.scm from some other repos.

Since my original issue is solved, or at least addressed, I'll close this issue now, maybe we should open a new one to track the inclusion of highlights.scm and possibly others. I guess if I can't figure out how to do folds, an issue on the nvim-tresitter repo would probably be better. Thanks again for your help.

@ikatyang
Copy link
Owner

I tried the query you provided, and it didn't work

Sorry, I forgot to mention that those queries only work if they're fed separately since tree-sitter somehow does not allow overlapped range in a single query.

instead of making the highlight a colour, it would be the text in bold, italic, etc.

Queries (i.e., highlights.scm) only control what can be highlighted, the style of highlighting is up to the one who uses those queries, which should probably be neovim/nvim-tree-sitter in your case.

maybe we should open a new one to track the inclusion of highlights.scm and possibly others.

I've opened #23 to track the highlights.scm issue. I'm not sure if we should also include others but I guess we can figure it out later.

@ikatyang ikatyang added question Further information is requested and removed awaiting response labels Jan 24, 2021
@b3nj5m1n
Copy link
Author

Queries (i.e., highlights.scm) only control what can be highlighted, the style of highlighting is up to the one who uses those queries, which should probably be neovim/nvim-tree-sitter in your case.

Correct, but generally there are pre-defined ones for things such as bold text. In the case of nvim-tresitter specifically, there are @text.emphasis, @text.uri, etc., but I'm not sure if this is universal to tree-sitter or just the nvim-treesitter plugin.

@ikatyang
Copy link
Owner

ikatyang commented Jan 24, 2021

I guess those are nvim-treesitter-specific feature since tree-sitter cannot even define how to style it as styles are environment-specific, e.g., ANSI in terminal, CSS on web, etc., though I'm not sure if there is a convention for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants