Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native Syntax Highlighting #161

Closed
bryphe opened this issue Mar 16, 2019 · 16 comments
Closed

Native Syntax Highlighting #161

bryphe opened this issue Mar 16, 2019 · 16 comments
Assignees
Labels
enhancement New feature or request I-slow User-facing performance issue

Comments

@bryphe
Copy link
Member

bryphe commented Mar 16, 2019

We currently use the vscode-textmate library to facilitate syntax highlighting and theming. However, because it's node, that necessitates we run a separate node process for this, and we have to maintain duplicate state for the buffers, which is an expensive use of memory.

Ideally, we could run this in-proc in our ReasonML front-end. If we had a version of this library that was in reasonml or ocaml instead of node, we could facilitate this easily.

The work required is to create a reasonml version of the library, like reason-textmate that has a similar API. The library itself is a wrapper around the oniguruma regex library, so we'd probably want to preserve that and use the C FFI to talk to it.

@bryphe bryphe added the E-help-wanted Call for participation: Help is requested to fix this issue. label Mar 16, 2019
@mnowotnik
Copy link

mnowotnik commented Mar 20, 2019

What about TreeSitter? It's more robust than TextMate tokenization and has already bindings to several languages. All that's required is creating bindings to Reason through FFI and porting existing functionality to the new interfaces.
http://tree-sitter.github.io/tree-sitter/
https://github.com/tree-sitter/tree-sitter

Also, this:
microsoft/vscode#50140

@akinsho
Copy link
Member

akinsho commented Mar 20, 2019

@Mike-now I'm mega excited about tree sitter and have been thinking about writing reason bindings for it for a while now, just gotta get some free time to do it. Hopefully if we have a permissive enough syntax highlighting api we can integrate it as separate option for users, I'd personally be much more keen on tree sitter highlighting if I had the option

@CrossR
Copy link
Member

CrossR commented Mar 20, 2019

I think for VSCode extension compatibility, we are always going to have textmate highlighting around, so a ReasonML version is going to be useful regardless, to help those extensions where tree-sitter doesn't cover yet etc.

That isn't to say that for a lot of languages Tree-Sitter couldn't be much better, more that a ReasonML textmate highlighter would still be useful alongside it. I know personally there are a few weird languages I use that I doubt will ever get a tree-sitter syntax (unless I write one I guess) but they already have TextMate ones.

@mnowotnik
Copy link

mnowotnik commented Mar 21, 2019

Sure. My reasoning was it could be much easier and future-proof endeavour to integrate TreeSitter than to rewrite vscode-textmate extension given that preparing oni2 beta release would take considerable time.

@bryphe
Copy link
Member Author

bryphe commented May 20, 2019

@Apostolique mentioned this project: https://github.com/georgewfraser/vscode-tree-sitter - looks interesting!

@bryphe
Copy link
Member Author

bryphe commented May 20, 2019

My reasoning was it could be much easier and future-proof endeavour to integrate TreeSitter than to rewrite vscode-textmate extension given that preparing oni2 beta release would take considerable time.

I'm open to it! What would be helpful for me is to see a PoC of a tree-sitter grammar working in Reason / OCaml, if anyone wanted to explore that direction.

EDIT: It looks like there are syntaxes for major grammars, and even one for Reason support today:
https://github.com/IwanKaramazow/tree-sitter-reason/tree/master/src !

And @IwanKaramazow even set up some initial FFI for the tree-sitter API. Would be fun to try and plug this in!

@IwanKaramazow
Copy link

The grammar is incomplete and the c-bindings segfault at random points. There's a lot of work left there.

@TheSpyder
Copy link

vscode-textmate is using a 4 year old fork of Oniguruma, updating to the latest might be a good idea but it also might cause API issues (I haven't checked the API, but the versions are 5.9.6 vs 6.9.x as I write this)
https://github.com/atom/node-oniguruma/tree/master/deps/onig

Interestingly vscode-textmate also links to a WASM fork of onig to benchmark against, it might be easier to just port that to native reason/ocaml rather than binding to and then building/distributing with the C library 🤔
https://github.com/NeekSandhu/onigasm

All of these things are MIT or BSD so licensing won't be an issue either way.

@bryphe
Copy link
Member Author

bryphe commented Jul 30, 2019

Writing a native implementation of vscode-textmate seems like pretty intense work, so what I'm leaning towards for the next release is:

  • Investigate integrating tree-sitter - I think this could potentially give us better performance and better fidelity in syntax highlights.
  • Keep our legacy node-vscode-textmate strategy for cases where tree-sitter doesn't have a grammar. There is still a lot of room for improvement there, but it won't work as well as a totally-native solution like tree-sitter.

I actually think it might be less work to integrate tree-sitter than to write a native implementation of a textmate grammar, but I'll spend a day or so prototyping to clarify once all this release craziness is over 😎

@TheSpyder
Copy link

I was tempted to help with this as an excuse to learn C FFI, but their use of a fork concerns me that we'd hit some weird difference and need to FFI the fork instead (which all seems like a giant waste of time).

Having native textmate still sounds like a good idea long term, but perhaps as a low priority :)

@bryphe bryphe added enhancement New feature or request I-slow User-facing performance issue and removed P-backlog E-help-wanted Call for participation: Help is requested to fix this issue. labels Aug 1, 2019
@bryphe bryphe changed the title ReasonML port - vscode-textmate Native Syntax Highlighting Aug 1, 2019
@Khady
Copy link

Khady commented Aug 2, 2019

Wouldn’t tree-sitter also allow for better text edition? For example, selecting or deleting or highlighting a text object would really operate in something relevant syntax-wise

@Dreomite
Copy link
Contributor

Dreomite commented Aug 2, 2019

We have at least an example of clever auto-indentation (atom/atom#18321) based on tree-sitter. Having this in oni2 would be very nice.

@necabo
Copy link
Contributor

necabo commented Sep 19, 2019

Disclaimer: I don't have any knowledge on this topic so please excuse my ignorance. I feel however like I stumbled across some relevant issues.

I was trying to integrate a Rust TextMate syntax. Looking at previous pull requests for Python/JS/Go syntaxes I figured it should be as simple as copying some files over from VSCode. During my research I also came across this issue and more specifically this comment that seems to suggest that TextMate has some limitations that make writing a proper syntax very hard for some languages.

The former maintainer of the atom-language-rust repository (which is/was(?) apparently used by VSCode) has mentioned some issues as well here.

Getting it 95% right is easy, but getting it totally right while still being performant is tedious.

seems to also hint at writing a performant syntax being quite a challenging task.

Now as mentioned before I am far from knowledgeable on this topic but it almost seems like waiting for tree-sitter support would be the best option here. Maybe that's just my dislike for sub-optimal solutions though and having a mostly working TextMate syntax is better than none.

I can't comment on which other languages might be limited by what the TextMate syntax supports, maybe someone else can shed some light on this.

@CrossR
Copy link
Member

CrossR commented Sep 19, 2019

Now as mentioned before I am far from knowledgeable on this topic but it almost seems like waiting for tree-sitter support would be the best option here.

tree-sitter has started to land now already in Oni2 (See #663), so as it moves to being more general it should hopefully be useful for stuff like Rust, which does have a good tree-sitter grammar.

I can't comment on which other languages might be limited by what the TextMate syntax supports, maybe someone else can shed some light on this.

The issues / limitations usually come down to user / non-standard additions to the language. The talk on it shows some examples where your custom data type/class is highlighted differently to a standard dataclass or is just missing highlighting all together, since the regexes just aren't capable of dealing with them. On the other hand, a tree-sitter grammar is able to recognise any datatype be it a standard or a custom one, so you can highlight them both the same.

Maybe that's just my dislike for sub-optimal solutions though and having a mostly working TextMate syntax is better than none.

Just for completeness sake, should say that its not necessarily one or the other. Having tree-sitter for all the languages it supports is great, but also having a native textmate service to fall back for the missing languages (the language list for tree-sitter is pretty short right now) is also useful, and means vscode plugins work as expected etc.

@CrossR
Copy link
Member

CrossR commented Sep 26, 2019

@bryphe, how far complete would you say this is now?

I'd be tempted to close this out and open up specific issues for the remaining tasks, since the bulk of the work (actually adding native tm and tree-sitter) is done.

@bryphe
Copy link
Member Author

bryphe commented Sep 26, 2019

@CrossR - agree 100%! Thanks!

The textmate syntax highlighting is done (and enabled in master) - only work remaining would be bugs that we find (which can be tracked separately)

Tree-sitter needs some more testing before we enable; but tracking that in a separate issue makes sense for sure 👍

@bryphe bryphe closed this as completed Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request I-slow User-facing performance issue
Projects
None yet
Development

No branches or pull requests

9 participants