Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: annotated code blocks #65

Merged
merged 1 commit into from
Nov 21, 2022
Merged

feat: annotated code blocks #65

merged 1 commit into from
Nov 21, 2022

Conversation

clason
Copy link
Member

@clason clason commented Nov 5, 2022

This is a Lua block: >lua
    local foo = 'bar'

This is a Vimscript block: >vim
    au FileType lua setl sw=2

This is a C block: >c
    int *p = get_local_errno(); *p = EINVAL
<
This is a standard block: >
    $ nvim --clean +'q'
<

image

Note that the alternative syntax lua> is much trickier, since the rule for the language marker lua conflicts with normal text rule for the word lua. In either case, the added markers are trivial (and necessary) to add to the legacy syntax file.

Closes #2

grammar.js Outdated Show resolved Hide resolved
grammar.js Outdated Show resolved Hide resolved
@clason
Copy link
Member Author

clason commented Nov 5, 2022

@justinmk I think this is good to go. The next step is to adapt scripts/gen_help_html.lua to

  1. eat the language node
  2. only show the code node wrapped in a <code class='language-<language>'></code> block

and inject highlight.js to the neovim.io/doc/user pages. But for that the ball is in your corner :)

token.immediate('\n'),
repeat1(alias($.line_code, $.line)),
alias(repeat1(alias($.line_code, $.line)), $.code),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for semantics or does it have a functional purpose? Maybe $.codeline fits better with the existing naming patterns.

Copy link
Member Author

@clason clason Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary (as far as I can tell) so the whole block gets punted to the injected parser; you can't parse, e.g.,

for k,v, in pairs(foo) do
  print(k,
    v,
  )
end

purely line by line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this aliases the whole set of repeated lines as a single node, not each single line.

Copy link
Member

@justinmk justinmk Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do that with (codeblock) itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that includes the markers, which are not part of the injected code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But I'm not married to the name, of course; I just need a single node that I can capture as @contents; see the injections.scm query.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that includes the markers, which are not part of the injected code.

I want to believe that's solvable because they're anonymous :) Any hope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unless you give me some ;) Honestly, it would help me if you could explain the nature of the objection to doing it the way I was able to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just elegance/aesthetics (unnecessary nesting)

@justinmk
Copy link
Member

I found some cases in the help docs that we should test for:

runtime/bugreport.vim|20 col 20| :  !echo "uname -a" >bugreport.txt
runtime/doc/eval.txt|1051 col 2| >0 / 0  =  0x7fffffff	(like positive infinity)
runtime/doc/eval.txt|1057 col 2| >0 / 0  =  0x7fffffffffffffff	(like positive infinity)
runtime/doc/luaref.txt|460 col 24| while (  step  >0 and  var  <=  limit  )
runtime/doc/usr_02.txt|548 col 7| :help >cont
runtime/doc/usr_10.txt|707 col 17| sort <input.txt >output.txt

@clason
Copy link
Member Author

clason commented Nov 12, 2022

runtime/bugreport.vim|20 col 20| : !echo "uname -a" >bugreport.txt

Not a help file.

runtime/doc/eval.txt|1051 col 2| >0 / 0 = 0x7fffffff (like positive infinity)
runtime/doc/eval.txt|1057 col 2| >0 / 0 = 0x7fffffffffffffff (like positive infinity)

Not at the end of the line, so doesn't match.

runtime/doc/luaref.txt|460 col 24| while ( step >0 and var <= limit )
runtime/doc/usr_02.txt|548 col 7| :help >cont
runtime/doc/usr_10.txt|707 col 17| sort <input.txt >output.txt

Inside a codeblock (and not on the end of the line or not [a-z0-9]+), so doesn't match.

@justinmk
Copy link
Member

Not at the end of the line, so doesn't match.

worth adding an explicit case to the corpus?

@clason
Copy link
Member Author

clason commented Nov 21, 2022

Oh, I see what you mean. Yes, I can see about adding a case with whitespace and not at the end of the line; maybe variants of the usr_*.txt examples.

@@ -24,13 +24,15 @@ block3:
(block
(line
(codeblock
(line))))
(code
(line)))))
Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm admittedly going in circles, but: I guess it makes sense for (code) to be named (block). Because the block+lines here (nested in codeblock) is analogous to non-code (block (line...)).

I'm not sure what the best practice/convention is for grammars: is it better to re-use names for similar concepts, or should we use globally unique names like (code (codeline)) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it makes sense for (code) to be named (block).

But you can't, since (block) is already defined at this point (as something different) -- you'll get a conflict in the grammar you need to resolve somehow. And a different name is by far the easiest way to do that.

Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. It can be aliased, as done elsewhere (e.g. we already alias to (line) here). And conflicts are based on the "fully qualified node path", irrespective of the node names.

Anyways let's go with (code (codeline)) I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead, try it.

Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried this, it works fine (no conflicts):

alias(repeat1(alias($.line_code, $.line)), $.block),

But this is just academic. If you're ok with (code (codeline)) , I think I favor that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do realize that the alias is to lines (plural) of code?

Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But clearly I'm missing something or we're miscommunicating. Note that (codeline) should be understood as any number of children, e.g.:

(code
    (codeline)
    (codeline)
    …)

If I still misunderstood, sorry. Like I said, my comments aren't blockers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat, yes, because I don't get what the end goal is here, so it feels like stabbing in the dark, which is frustrating.

(I do admit that my last comment was sheer pedantry, born out of said feeling.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I started this thread by literally saying "I'm admittedly going in circles". The goal was just to explore options.

Copy link
Member Author

@clason clason Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried this, it works fine (no conflicts):

alias(repeat1(alias($.line_code, $.line)), $.block),

Interesting, that's what I tried. Looks like switching from optional to choice avoids the conflict then; that's good. Feel free to rename as you wish before tagging a release so I can downstream the queries. (And when you do, may want to update the description in the README, which I missed -- sorry!)

Note that (codeline) should be understood as any number of children, e.g.:
...
But this is just academic. If you're ok with (code (codeline)) , I think I favor that too.

There seems to be indeed some miscommunication. My point is that the contents must be a single block that you pass to the injected parser. Otherwise every single line gets parsed in isolation which a) is wasteful and b) leads to errors with constructs spanning multiple lines (as in my -- admittedly academic -- example above).

grammar.js Outdated Show resolved Hide resolved
Copy link
Member

@justinmk justinmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blockers, just some nits on naming

```
This is a Lua block: >lua
    local foo = 'bar'

This is a Vimscript block: >vim
    au FileType lua setl sw=2

This is a C block: >c
    int *p = get_local_errno(); *p = EINVAL
<
This is a standard block: >
    $ nvim --clean +'q'
<
```
@justinmk
Copy link
Member

justinmk commented Nov 21, 2022

not a blocker, but this bugs me about >lua :

  • more likely to have false positives than lua>
  • lua> is more visually noticeable (for non-concealed text) because > is at EOL as usual

This is just thinking out loud. Will merge this later this week.

@clason
Copy link
Member Author

clason commented Nov 21, 2022

Sorry, I tried to make that work; I failed.

@justinmk justinmk merged commit ce20f13 into neovim:master Nov 21, 2022
@clason clason deleted the injections branch November 22, 2022 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

embedded languages in codeblocks
3 participants