Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better built-in lexers #668

Closed
silversquirl opened this issue Feb 10, 2018 · 22 comments
Closed

Better built-in lexers #668

silversquirl opened this issue Feb 10, 2018 · 22 comments

Comments

@silversquirl
Copy link

The syntax highlighting lexers are currently missing support for various things (eg. labels in C, proper string highlighting for shell scripts, etc.)
I'm wondering whether it would be worth creating a wiki page with a list, so that people who want to contribute can look there.

@martanne
Copy link
Owner

If you feel it would help, after all it is a Wiki. But I'm not sure how many people actually read these pages. For discussion an issue (like this one) is probably more helpful? Maybe we could add a general page regarding syntax highlighting (similar to the README in lua/lexers). It could then also feature/reference a list of known issues.

There are some known problems/limitations of the current approach to syntax highlighting (e.g. #110, #603). Eventually I would like to address these, but performance and accuracy needs to be weighted against the introduced complexity.

@silversquirl
Copy link
Author

If you think an issue is better, then I'm happy with that. I'll try to get acquainted with how LPeg works when I have some time and start working on the issues I mentioned initially.

@p-e-w
Copy link
Contributor

p-e-w commented Feb 18, 2018

Have you considered tree-sitter?

While there is a lot of infrastructure around them, the parsers themselves are pure C, and designed for integration in text editors. Tree-sitter parsers are probably the highest quality syntax highlighting parsers available today, and are only going to get better because they are maintained by the creators of Atom.

I haven't seen any performance benchmarks yet, but I fully expect tree-sitter parsers to outperform all other incremental parsers of comparable quality eventually. Whether they are faster than vis' current lexers is hard to predict, but then those lexers are quite primitive and don't even come close to the accuracy that tree-sitter provides.

Having parsing of such quality in vis would also make new editing commands possible through accurate scope detection. See the examples in this Atom PR.

Thus far tree-sitter parsers exist only for the most popular languages, but integrating those seems like a potentially huge win for vis: High-quality parsers that are maintained by someone else.

@lanodan
Copy link
Contributor

lanodan commented Feb 19, 2018

Disclaimer/Warning: All of this is my opinion and is As Far As I Know. I’m also quite bad at phrasing this kind of stuff but I don’t want to ignore things and I don’t want things to escalate.

Have you considered tree-sitter?

  • The ~advertised output is DOM-style, while we need something for Terminals
  • Another language (C++, JSON, …)
  • I’m not sure efficiency is that much needed
  • There seems to be no release or tags !
  • This is apparently coming from the webapp universe which is pretty remote from where vis is now. (which I would describe as a traditional but modern Unix/Plan9 editor)

new editing commands possible through accurate scope detection.

The current keystrokes and regexes are already making it possible, I don’t really see something new here…

maintained by someone else.

LPeg lexers are coming from/shared with Scintillua.

@p-e-w
Copy link
Contributor

p-e-w commented Feb 19, 2018

A few misunderstandings there:

The ~advertised output is DOM-style, while we need something for Terminals

Tree-sitter offers a DOM-style interface, not DOM-style output. That is purely an API convention, and it is very convenient to use. The interface is specifically designed for the needs of syntax highlighting in editors and has nothing to do with the "web DOM" (or with terminals, for that matter).

Another language (C++, JSON, …)

Tree-sitter uses a compiler written in C++ that processes JSON input to generate a pure C parser without any non-C dependencies. The compiler and the JSON file are not needed for parsing, and don't have to be shipped with the editor.

There seems to be no release or tags !

Every single tree-sitter grammar is versioned, tagged and unit tested. The compiler currently doesn't have versioned releases, but language bindings that depend on the compiler again are versioned, and pin a specific commit for their tree-sitter submodule.

This is apparently coming from the webapp universe which is pretty remote from where vis is now. (which I would describe as a traditional but modern Unix/Plan9 editor)

Tree-sitter has nothing whatsoever to do with the "webapp universe". It is used by the Atom editor (which some people might consider a "webapp") as a dependency via Node.js bindings that are independent of the core parser and compiler. There are also bindings for Haskell and Ruby, and of course the native C API.

Using tree-sitter in vis doesn't bring vis any closer to the webapp universe than hosting its code on GitHub does.

The current keystrokes and regexes are already making it possible, I don’t really see something new here…

The whole reason this issue even exists is because vis/Scintillua parsers are of poor quality and do not even recognize lots of standard syntax. If structured text editing is to be reliable, quality syntax parsing is required first, and I don't ever see this happening with the kind of grammars that vis uses now.

@silversquirl
Copy link
Author

I'm skeptical of whether tree-sitter would work for vis. To use parsers written in C would either require recompilation every time you want to add a new language, or use of dynamic loading, neither of which seem appealing to me.

If you want the benefits provided by tree-sitter, perhaps writing a Lua plugin to add support for it would be better. That way, it's optional and people who don't want to use it don't have to.

@silversquirl
Copy link
Author

Finally had a chance to work on this. I've patched the Bash lexer to support variable expansion inside heredocs and double-quoted strings. Next I'll add highlighting for $() and correct the highlighting for ``, which is currently highlighted like a single-quoted string.

I'll submit a PR when I'm happy with it.

@mqudsi
Copy link

mqudsi commented Apr 26, 2018

I would strongly caution against working around n issues rather than addressing the root problem. You don't want to end up in a situation where the amount of effort invested in moving from a 20% solution to a 40% or 60% solution leads a project to stick to the former instead of moving to a 90% alternative.

The problem with C-based parsers is, however, a proper concern. One option is to offer "core" lexers powered by tree-sitter and compiled into vis, with fallback / user-defined lua-based lexers so a recompile wouldn't be necessary, but that's not ideal. What would be cool is if it were possible to use llvm to compile that parser to lua, but some googling does not reveal any attempts at making such an endeavour generally possible (and it would significantly complicate the build toolchain, unless a 3rd party project maintaining a lua-compiled version of all tree-sitter parsers were used, which would definitively address that issue).

Given that lexers aren't "core" functionality, I don't think ruling out dynamic loading without some hefty consideration is the way to go, but still, users having to compile C code each time they want to add a lexer is a burden...

@mcepl
Copy link
Contributor

mcepl commented Feb 29, 2020

  1. tree-sitter is being added to neovim ([WIP] Tree-sitter integration neovim/neovim#9219 and many other issues/PRs)

  2. I wonder whether @bfredl actually wrote Lua bindings for tree-sitter and whether it couldn’t be used by vis.

@bfredl
Copy link

bfredl commented Feb 29, 2020

That PR is pretty old, neovim/neovim#11724 is a better reference. But yes https://github.com/neovim/neovim/blob/master/src/nvim/lua/treesitter.c is relatively self-contained, except for directly reading neovim buffers in one function.

@erf erf mentioned this issue Apr 28, 2022
@mcepl
Copy link
Contributor

mcepl commented Nov 29, 2022

@ninewise, this is probably irrelevant now when we switched to the Scintillua lexers, right?

@mcepl
Copy link
Contributor

mcepl commented Aug 10, 2023

@rnpnr ??? This too should probably be closed, right?

@rnpnr
Copy link
Collaborator

rnpnr commented Aug 10, 2023

Yes I think this covered now that we use Scintillua.

@rnpnr rnpnr closed this as completed Aug 10, 2023
@erf
Copy link
Contributor

erf commented Sep 5, 2023

I know this issue was closed as we use Scintillua , but if there are limitations to our current solution (i saw a few issues mentioned in the 0.9 release), would it be interesting to open up this discussion again?

It seem performance and syntax highlighting could be improved using tree-sitter, although not sure as many languages are supported or how difficult it would be to add this support to vis.

Here is an intro video to tree-sitter and tree-sitter for dummies video

@mqudsi
Copy link

mqudsi commented Sep 5, 2023

@erf tree-sitter is (already) heavily discussed above.

@mcepl
Copy link
Contributor

mcepl commented Sep 5, 2023

I don’t think so. I have been following neovim development for years, and although treesitter originally promised to be just one C-file, it is certainly much more than that these days. Change to vis requiring this would be enormous and results are questionable (i.e., scintillua works just fine for most of the time). I also like handing over part of our code upstream.

Problems with the current merging of the latest development was that we were using a way too old version of scintillua in vis and we are now struggling with differences. If you want to help, the working code is at the devel branch of https://git.sr.ht/~mcepl/vis and we apparently need to work on themes, any patches would be more than welcome.

If the speed is really a problem, and I have never saw any evidence for that, it might be a fun project to make vis work with LuaJIT (it actually shows some signs of life again).

@erf
Copy link
Contributor

erf commented Sep 5, 2023

@mqudsi I can see it has been discussed, but i can't really see a conclusion, other than "we are currently using this other solution".

What i find most appealing with tree-sitter is that it is more intelligent about the types it gathers from the language and therefor gives better syntax hightlighting, for example if you make a c struct now, it's type is not recognized as a type like an int or bool, but tree-sitter would recognize this. Also if you have long sentences, it is difficult for regex based solutions, but tree-sitter has no problem with this and performance is good. Also tree-sitter ads new capabilities like code folding, and could also help with multiple selections.

I'm not necessarily pushing for changes on this in the near future, but thought i'll chip in on this topic.

Here's the difference in syntax highlighting between vis and helix based on the example in this video

Screenshot 2023-09-05 at 20 04 07
Screenshot 2023-09-05 at 21 03 26

@mcepl
Copy link
Contributor

mcepl commented Sep 5, 2023

What i find most appealing with tree-sitter is that it is more intelligent about the types it gathers from the language and therefor gives better syntax hightlighting, for example if you make a c struct now, it's type is not recognized as a type like an int or bool, but tree-sitter would recognize this. Also if you have long sentences, it is difficult for regex based solutions, but tree-sitter has no problem with this and performance is good. Also tree-sitter ads new capabilities like code folding, and could also help with multiple selections.

You are wasting your breath, as I said, I have been using neovim built from master for years, so I know perfectly well what it brings above regexp-based highlighting. If you feel strongly about it, make a fork.

@mcepl
Copy link
Contributor

mcepl commented Sep 5, 2023

@mqudsi I can see it has been discussed, but i can't really see a conclusion, other than "we are currently using this other solution".

That’s all what was said. What you mentioned, and also “the benefits of this change don’t balance costs of huge change to the vis codebase.” My comment about fork was not a flippant rejection of your idea, but I don’t think that such big change could be done with somebody who proves her ability to do it. To quote the classic, “Show us the code!”.

@mcepl
Copy link
Contributor

mcepl commented Sep 11, 2023

@mqudsi BTW, if you are interested in the speed of vis, it may be worthy to try to make vis work with LuaJIT (#291), I have unfinished (and not compiling ATM) and very unpolished patch for that in the luajit branch of https://git.sr.ht/~mcepl/vis/ . If you want to give it a stab, I would appreciate it a lot.

@mqudsi
Copy link

mqudsi commented Sep 13, 2023

@mcepl I've read the issue and looked at your branch: interesting work - and it looks like you've gotten very far already! I'm somewhat maxed out on my spare time right now, but I can put it on the backburner -- assuming you don't get around to it before I do.

@mcepl
Copy link
Contributor

mcepl commented Sep 13, 2023

I'm somewhat maxed out on my spare time right now, … assuming you don't get around to it before I do.

Unlikely, it is that time after holidays when you find out how much you put on the back burner and now you find out how much is there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants