-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax highlighting engine quirks #2326
Comments
Your first observation has been discussed before in some place (probably the Packages repo). Iirc the final verdict was that it's weird but cannot be changed because of how much current syntax definitions have to rely or work around this and it would most likely be backwards-incompatible. I do would like to see this get some consistency because I stumble upon this issue quite frequently when working on a syntax definition. The second issue is probably very related. I don't think I've run into any of the other issues you describe, but they should be addressed imo. |
Having heard some perspectives about Given this, factoring out |
Currently it should be possible to pop out of an embed, if I recall correctly. |
Could you provide examples of these? I'm not understanding index vs position. In my mind earlier position == lower index, but I must be misunderstanding. |
I mean that it would be helpful for a single rule to have both an An example is HTML.sublime-syntax at line 363 (the In addition, there is a weird bug. The following is parsed wrong: <script></script >
const x = 1; The I can also imagine situations in which
This can occur with captures in lookaheads: - match: (?=foo(bar))(foo)bar
captures:
1: region.redish
2: region.bluish Given the text It can also occur with capture groups within quantifiers: - match: (?:(x)|(y))+
captures:
1: region.redish
2: region.bluish In a run of Admittedly, these are weird situations. (I only ran into them when writing a reference implementation of the sublime-syntax parsing algorithm.) Fixing the capture behavior in the general case would basically require a sort, which could be slow. If a fix is impractical, then merely documenting the current behavior is probably fine. |
What's the argument for this specific order as opposed to the opposite? I also didn't know capture groups work within look-aheads, tbh. |
At one point I tried to remove that functionality from sregex as an optimization to prevent the need for capturing in look-aheads, but I believe it was being used in the default packages somewhere. |
If an embed is conceptually a special kind of push, then logically embed-pop should be a no-op, whereas pop-embed should be like a set. |
for reference, capture groups within lookarounds were also discussed before at #1796 (comment) |
Fixing the first issue causes 882 assertions to fail in the default syntaxes. Fixing the second issue causes 193 assertions to fail in the default syntaxes. Based on this, I think most, if not all, of these changes will have to be gated behind a version flag. Even though we can control and fix the default syntaxes, there are most certainly third party syntaxes relying on these. |
Agree. This is definitly an issue with a propably wide range of effects with regards of backwards compatibility. I'd guess most of the failing test cases might be What I can say about those "issues" is, that it heavily depends on the strategy how a syntax is designed whether they may be considdered "quirks" or intended behavior or even less relevant. They have a less impact when using multi-push or multi-set statements. One can work with those behavior in the one situation while it seems complicating things in others. But finally it might propably be most consistent to have |
In regards to this general area: |
Honestly I am uncertain how to tackle these kinds of issue best, but my feeling is the same as yours in that case. As we are using lookaheads to avoid overlapping No more lookahead required to avoid overlapping metas.
|
Note also that not applying a scope can be easily worked around by re-specifying the scope in the pattern's |
As of build 4075: parts 1, 2, 3 and 5 now act as expected when a .sublime-syntax contains The request for While It is also possible to combine |
I've run into some really obscure issues with the syntax highlighting engine.
When you
pop
a context with ameta_content_scope
, thatmeta_content_scope
is not applied to the characters matched by the rule. When youset
, themeta_content_scope
is applied to those characters. I would expectset
to work likepop
in this case.When you
push
a context withclear_scopes
, scopes are cleared for the matched characters. When youset
, scopes are not cleared for the matched characters. I would expectset
to work likepush
in this case.In general, it would be nice if this were refactored a bit so that
pop
could coexist withpush
. Then,set
could be implemented as apush
andpop
. This might also make it easy to allow anembed
rule with apop
, a common request.When a rule
push
es multiple scopes containing bothclear_scopes
andmeta_scope
, when scoping the characters matched by the rule, meta scopes are applied in the wrong order. First, all of theclear_scopes
values are applied, then all of themeta_scope
values. After the characters are matched, themeta_scope
values are removed, then theclear_scopes
values are undone. Then the contexts are properly pushed onto the stack and the rules applied in the expected order.Instead, I would expect that for each context being pushed, first
clear_scopes
should be applied, and thenmeta_scope
should be applied, and then the next context should be handled. After the characters are matched, this should be undone in reverse order.When a rule has captures, but some of those captures do not apply any scopes, then the tokenization behavior varies. If the rule uses
push
,pop
, orset
, then tokens are always broken on capture group boundaries, even when there is no corresponding scope for the capture rule. If the rule does not use any of those, then capture groups that do not apply a scope are ignored for tokenization. (I have not tested this withembed
, so I don't know the behavior there.)I would expect that capture groups that don't apply a scope either always cause token breaks, or never, regardless of the type of rule. (Never is probably preferable.)
When a rule has multiple captures, and those captures overlap in various ways, the engine will produce surprising results:
These are odd situations to begin with, and any cleaner solution might have performance implications, but it might be helpful to document the behavior.
If a context is pushed or set from any rule in the
prototype
, that context will implicitly havemeta_include_prototype
, as will any context pushed or set from that context, and so on. This applies even if such a context is used elsewhere. This is potentially very surprising, as the behavior of a context varies depending on whether it is transitively included byprototype
.The text was updated successfully, but these errors were encountered: