-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tree sitter chunking #7417
Fix tree sitter chunking #7417
Conversation
Call as bytes before slicing, that way you can take bytes that aren't aligned to chars. Should technically also be slightly faster since you don't have to check alignment...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix for TSchunk access is correct and needed 👍 I am not sure sure about the byte alignment in the grapheme functions. Where would that be needed? It's pretty slow and I can't think of a scenario where we would want to align invalid bytes?
Currently they're necessary to stop helix from crashing in certain situations, but I think they might just be a bandaid solution and not a fix for the actual bug... |
If you have a reproduction case with a tracebacj that would be great. I can tell a lot from that. Maybe a buggy TS grammar could cause this. I actually want to remove the manual grapheme alignment we currently do for TS highlights anyway. The doc formatter can do the byte/char/grapheme alignment on the fly which is a nice perf bump for long lines. Apart from that I think all other uses with an invalid byte pis are a bug and we would just be fixing the symptom |
Reproduction case, c/cpp syntax
With the emojy copy pasted a bunch |
A thanks yeah I saw the issue now. Let ne see if I can comeuo with a quick fix. |
Ah yeah it's related to what I meant. It shouldn't be too hard to track the byte pos while rendering too and simply use byte indecies in the entire syntax highlighting infrastructure. That would fix some inefficencies jn the rendering. I have been plannjng that for a while but didnt get around to it yet. If you are interested you can take a crack at this but it would be a bit more invokved. IMO it's still a bug in the C TS grammar the the byte index is not byte aligned tough 🤔 Unaligned bytes make sense as an input (since TS doesn't know ahead if time where char boundaries are) but not as an output (a highlighting starting in the middle of a Unicode codepoint makes no sense) |
This reverts commit 8d2c6d2.
I got rid of the bandaid and left only the relevant fix. Will file a bug report with the language servers, and see if I can rework syntax highlighting to use bytes everywhere |
Call as bytes before slicing, that way you can take bytes that aren't aligned to chars. Should technically also be slightly faster since you don't have to check alignment...
Call as bytes before slicing, that way you can take bytes that aren't aligned to chars. Should technically also be slightly faster since you don't have to check alignment...
Call as bytes before slicing, that way you can take bytes that aren't aligned to chars. Should technically also be slightly faster since you don't have to check alignment...
No description provided.