Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc Conversation #185

Closed
jeff-hykin opened this issue May 21, 2019 · 98 comments
Closed

Misc Conversation #185

jeff-hykin opened this issue May 21, 2019 · 98 comments

Comments

@jeff-hykin
Copy link
Owner

jeff-hykin commented May 21, 2019

This issue is just a place for chat that doesn't necessarily relate to an issue. I'm going to close it immediately so that it doesn't add to the normal issue count, but it can still be used for general conversation.

@jeff-hykin
Copy link
Owner Author

I'll be working on the grammar today. @matter123 did you seen the new tree sitter extension?
microsoft/vscode#50140 (comment)

@matter123
Copy link
Collaborator

I haven't had a chance to actually use it yet, but it looks pretty nice. I did notice that the syntax highlighting is rough in a few areas.for example parameter highlighting. Do you know if that's a tree sitter issue, a issue with the extension, or a bad theme choice?

@georgewfraser
Copy link

@matter123 I'm pretty sure that's just a minor issue with the extension---it's only 3 days old! We need a C++ expert to come tweak the function that traverses the tree-sitter tree and applies colors:

https://github.com/georgewfraser/vscode-tree-sitter/blob/3b2d4cd2e5433af8620fe32cb35bab4479e15a60/src/extension.ts#L82

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented May 21, 2019

@matter123 So I looked into the Atom one a bit ago, because they're missing a lot of scopes. From what I understand the Tree sitter parses the whole thing and basically has its own tree-scopes, then there is an additional file that maps those scopes to the traditional TextMate names.

So for example, in Atom the lambda is correctly parsed by the tree sitter, but Atom incorrectly marks the -> it as member-access.

I'm not sure if the C++ tree sitter implementation is currently marking the difference between things like parameter-variables and just normal variable usage.

@jeff-hykin
Copy link
Owner Author

@matter123
tommorrow I'll look at lots of example files and log all of the scope changes before publishing. I plan on taking a look at the trigraph/digraph support and creating the @std_space. Then I'll start work on the dont_backtrack? feature for the textmate tools and look at the add-tag issue.

@jeff-hykin jeff-hykin pinned this issue May 22, 2019
@matter123
Copy link
Collaborator

@jeff-hykin unless your planning on doing some more big changes today. I'm going to work on fixing the tests.

@jeff-hykin
Copy link
Owner Author

No big changes, I was just going to completely regenerate the tests though since so much has changed.

@matter123
Copy link
Collaborator

Yeah, it's taking me about 7 minutes per fixture so at 267 fixtures that comes out to 31 hours. regeneration is probably needed.

@jeff-hykin
Copy link
Owner Author

I added you to the experimental tree sitter repo @matter123 just so you'd have access if you wanted it. https://github.com/jeff-hykin/experimental-tree-sitter/invitations

@jeff-hykin
Copy link
Owner Author

@matter123 is there a way to automate the tagging releases?

@matter123
Copy link
Collaborator

matter123 commented May 23, 2019

npm version creates a tag with the new version, but that is an un-annotated tag (without a list of fixes and additions)

Edit: githubs topic on release automation https://github.com/topics/release-automation

@matter123
Copy link
Collaborator

git config push.followTags true will make the tags produced by npm end up in the releases page. From there they should be able to be promoted to an annotated release.

@matter123
Copy link
Collaborator

Not sure what happened but your tagged version was from 184 commits ago

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented May 23, 2019

sigh haha. How you you make the tags? I just used the "new draft" button on the tags page

@jeff-hykin
Copy link
Owner Author

Although I did it after running npm version and git config push.followTags true so that might have something to do with it

@matter123
Copy link
Collaborator

Pressing new draft is all I do. Was the branch you were on happen to be behind?

@matter123
Copy link
Collaborator

Oh I know what happened. You release 1.9 with the new enum pattern then reverted.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented May 28, 2019

@matter123 I generated the spec files for Objective C/C++ but for some reason all of them are missing the source tag. Any ideas why that is?

@matter123
Copy link
Collaborator

source is always the bottom of every single token, so the spec generator/tester just assumes that it's there and the spec files don't need it. Is something breaking?

@jeff-hykin
Copy link
Owner Author

The tests just fail because of the source scope
Screen Shot 2019-05-28 at 6 50 08 PM

@matter123
Copy link
Collaborator

huh, let me take a look.

@matter123
Copy link
Collaborator

matter123 commented May 28, 2019

So the issue is that there are more scopes popped off in scopeEnd than are pushed on in scopeBegin
lines like:

  • attempted to pop meta.interface-or-protocol off scope stack, top of stack is storage.type
  • attempted to pop storage.type off scope stack, top of stack is source
  • attempted to pop meta.interface-or-protocol off scope stack, top of stack is undefined

show the problem.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented May 29, 2019

source is added in by the textmate parser https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L442 this shouldn't be needed and produces duplicate source.cpp

The reason I added a manual one is so that it exists even inside of other languages, like markdown.
(I'll explain my thinking more by the end of the day)

@matter123
Copy link
Collaborator

Since your going to investigate the performance regression, I pushed to Add/perf-inspect a very WIP perf inspector. run npm run perf -- /path/to/file
Screenshot from 2019-05-29 21-49-14

@matter123
Copy link
Collaborator

@jeff-hykin What is the purpose of the hidden portion of scope resolution. And why does it rematch the visible portion of the scope resolution?

@jeff-hykin
Copy link
Owner Author

Somehow they are slightly different and to be honest I've forgotten how exactly. I tried to combine them when I fixed the dot-access operator tag

@jeff-hykin jeff-hykin reopened this May 31, 2019
@matter123
Copy link
Collaborator

The most recent push to Misc/update/vscode-textmate restores the ability to run npm run perf, I decided against using onigasm as their embedded version of onigurua is 6.8.2 while vscode-onigurua's embedded version is 6.9.5. Instead I utilized the ace that that OnigScanner with a single pattern acts very similarly to OnigRegExp.

The new version is however noticeably slower (~18.3s for previous, ~23s for new), I am unsure what the cause is.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

Quickly looked at the Multi/tokens branch, Looks Good. I left a few comments, but I'll give a more thorough code review when the PR is made.

Sounds good, the whole branch is definitely a big WIP. You might want to look at the latest cpp/generate.rb to see the usage of the new selector system and if there's any design concerns. I ran into some small design problems, but overall I think its really helping cleanup the code. Outside of that, I'll probably have --Jeff in the comments on anything that is notable.

As a side note: when there are grammar-usage bugs, some of them can be really hard to debug, they fail far away from their source with an unrelated error. So I might work on adding more checks, logging, and descriptive error messages before release. Examples:

  • Adding any dont_match: to one_scope_resolution causes a "can't use numbered backref error" error in the linter that crashes the program. Commenting the check out then causes another error with placeholders not being replaced, no idea why haha.
  • In the experimental better-c-syntax, there's a weird infinite recursion bug, related to ExportedGrammar's and their non-exported patterns I think. It happens when:

./patterns/comments.rb

grammar.exports = [
    :comment, # NOTE: no "s" AKA not :comments
    :inline_comment,
]

generate.rb

grammar.import("./patterns/comments.rb")
# NOTE: there is an "s"
grammar[:comments] # returns a placeholder even though it shouldn't 
grammar[:comments] = grammar[:comments] 

@matter123
Copy link
Collaborator

should the grammar complain if an imported grammar's exports end up not being used?

as far as infinite recursion in placeholders, do you think it is worthwhile to implement some sort of cycle detection algorithm?

I'll see if I can figure out what's going on with dont_match.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

I'll see if I can figure out what's going on with dont_match.

I did make partial progress on this. /(?<blah_group>)()\g<1>/ throws a "numbered backref/call is not allowed. (use name)" and the dont_match uses named capture groups. So in the start_match_empty.rb linter when it goes from a string to regex Regexp.new(pattern.start_pattern.evaluate.gsub("\\G", '\uFFFF')) it throws the error

Idk if Onigurua also will throw an error with numbered backref's and named capture groups, but it probably will since its Ruby based. Idk why that wouldn't be allowed in Ruby either.

@jeff-hykin
Copy link
Owner Author

The most recent push to Misc/update/vscode-textmate restores the ability to run npm run perf, I decided against using onigasm as their embedded version of onigurua is 6.8.2 while vscode-onigurua's embedded version is 6.9.5. Instead I utilized the ace that that OnigScanner with a single pattern acts very similarly to OnigRegExp.

Thanks 👍 I'll probably test it out in the next few days to see exactly what you mean with the versions

@jeff-hykin
Copy link
Owner Author

as far as infinite recursion in placeholders, do you think it is worthwhile to implement some sort of cycle detection algorithm?

Yeah, we can make a placeholder set {} that is empty at the top level. Then when a placeholder is being resolved it adds the PlaceholderPattern object to the set, and if its already on the set, then its an infinite recursion.

@matter123
Copy link
Collaborator

matter123 commented Jun 20, 2020

slightly simpler might be to borrow from dfs mapping and mark each placeholder node as ("not resolved", "resolved", or "resolving") and attempting to resolve a "resolving" placeholder is a loop. Otherwise I belive you would have to clear the set between resolves to prevent false positives on already resolved patterns.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

as far as infinite recursion in placeholders, do you think it is worthwhile to implement some sort of cycle detection algorithm?

There might also be a way to do it before any loop/recursion by changing the resolve strategy. Every pattern in the repository, like grammar[:thing] has to point to a real pattern by the end. If grammar.repository[:thing_being_resolved] isn't a real pattern by the time the grammar is being compiled, an error could be thrown then.

Something like

# Grammar#compile_to_json
for k,v in grammar.repository
       raise if v.isPlaceholder
end

@matter123
Copy link
Collaborator

matter123 commented Jun 20, 2020

So looking at the new resolve_placeholder transformation, I am not sure how it is recursive.

if repository[arguments[:placeholder]] is itself a placeholder, wouldn't it be a placeholder by the end?

Imagine the following grammar:

g[:a] = Patterb.new(/a/).then(g[:b])
g[:b] = Pattern.new(/b/).then(g[:c])
g[:c] = Pattern.new(/c/)

Prior to calling save the grammar has the following

{
a: Pattern.new(/a/).then(PlaceholderPattern(:b))
b: Pattern.new(/b/).then(PlaceholderPattern(:c))
c: Pattern.new(/c/)
}

Save_to is then called resolving in the resolve_placeholder plugin running.
It encounters :a and rewrites :a to Pattern.new(/a/).then(Pattern.new(/b/).then(PlaceholderPattern(:c)))
When it encounters :b the placeholder for :c is resolved but that isn't propagated back up to :a

Edit: does #map! end up traveling through the new :match and resolve them?

@jeff-hykin
Copy link
Owner Author

if repository[arguments[:placeholder]] is itself a placeholder, wouldn't it be a placeholder by the end?

I'm not 100% sure

repository[arguments[:placeholder]] = PlaceholderPattern.new(*args)

unless repository[arguments[:placeholder]].is_a? PatternBase
    raise ":#{arguments[:placeholder]} is not a pattern and cannot be substituted"
end
#
# infinite __deep_clone__ recursion I think
#
each_pattern_like.arguments[:match] = repository[arguments[:placeholder]].__deep_clone__

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

Edit: does #map! end up traveling through the new :match and resolve them?

This I'm not sure of either, which is partly why I worked on creating .recursive_pattern_chain so that a list could be generated (and basically frozen) first before starting to resolve/transform things

@matter123
Copy link
Collaborator

matter123 commented Jun 20, 2020

I don't think it can be frozen, several transformations alter the chain fairly significantly.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

I don't think it can be frozen, several transformations alter the chain fairly significantly.

Sorry I meant just top-level. Basically no elements are added/removed/re-ordered on the list, but each can be mutated.
Is that what you meant or are you saying things do get added dynamically?

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

I pushed the recursive bug on master on the experimental better-c-syntax repo if you want to see the stack overflow bug. The grammar only has 1 pattern in it at the moment. I'm still not sure how its being caused by repository[arguments[:placeholder]].__deep_clone__ so I imagine it might have to do more with #map!

@matter123
Copy link
Collaborator

Freezing at the top level should cover any existing plugins but ideally, bailout would be a part of the grammar and that is going to add more names to the grammar. There could be two stages, in the first stage, the pattern objects are immutable but the repository is mutable and in the second stage its the opposite.

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

Freezing at the top level should cover any existing plugins but ideally, bailout would be a part of the grammar and that is going to add more names to the grammar. There could be two stages, in the first stage, the pattern objects are immutable but the repository is mutable and in the second stage its the opposite.

Yeah that might work since the bailout should be independent of any placeholders or other resolutions

@matter123
Copy link
Collaborator

I pushed the recursive bug on master on the experimental better-c-syntax repo if you want to see the stack overflow bug. The grammar only has 1 pattern in it at the moment. I'm still not sure how its being caused by repository[arguments[:placeholder]].deep_clone so I imagine it might have to do more with #map!

is grammar[:comments] = grammar[:comments] a simplification of a more complex recusion issue, or is semantically meningful?

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

Well a few things happened. (I pushed 1 more change (actually a revert))

  1. The code was more removed than that, so it could look more like comment = grammar[:comments], followed later by grammar[:comments] = comment
  2. It was originally using imports grammar[:comments] = Grammar.import("./patterns/comments.rb")[:comments]
  3. There is a misspelling, it was supposed to be grammar[:comments] = Grammar.import("./comments.rb")[:comment]

The misspelling is important cause it tried to access a non-exported pattern

@matter123
Copy link
Collaborator

I pushed a change that elminates the issue with the recusion in deep clone. With the change this is the new error for better-c-syntax:

Traceback (most recent call last):
	13: from main/generator.rb:75:in `<main>'
	12: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:495:in `save_to'
	11: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:329:in `generate'
	10: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:260:in `run_pre_transform_stage'
	 9: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:260:in `each'
	 8: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:261:in `block in run_pre_transform_stage'
	 7: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:261:in `transform_values'
	 6: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/grammar.rb:278:in `block (2 levels) in run_pre_transform_stage'
	 5: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/transforms/resolve_placeholders.rb:10:in `pre_transform'
	 4: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_variations/base_pattern.rb:162:in `map!'
	 3: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/transforms/resolve_placeholders.rb:21:in `block in pre_transform'
	 2: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_variations/base_pattern.rb:645:in `=='
	 1: from /Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_variations/base_pattern.rb:640:in `eql?'
/Users/fosdick/cpp-textmate-grammar/gem/lib/textmate_grammar/pattern_extensions/placeholder.rb:41:in `to_tag': Attempting to create a tag from an unresolved placeholder `:comments' (RuntimeError)

@jeff-hykin
Copy link
Owner Author

Based on that doc you sent sent earlier. Should we be using \k for backreferences instead of \g?

@matter123
Copy link
Collaborator

backreferences are \N where N is a number. \g is used for subexpressions.

@jeff-hykin
Copy link
Owner Author

Does the cpp code have subexpressions?

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

I thought \N was problematic for values > 9 so \g<N> had to be used instead

@matter123
Copy link
Collaborator

just the template angle brackets, but that's planning on being removed?

@matter123
Copy link
Collaborator

matter123 commented Jun 20, 2020

I thought \N was problematic for values > 9 so \g had to be used instead

the docs say n is valid when (n >= 1)

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

Oh yeah I forgot I used it there, maybe that's where I'm seeing the \g from.

As a side note, a lot later on, we could probably use subexpressions to compress and maybe optimize the regex for single patterns based on how they reuse patterns inside themself

@jeff-hykin
Copy link
Owner Author

jeff-hykin commented Jun 20, 2020

I thought \N was problematic for values > 9 so \g had to be used instead

the docs say n is valid when (n >= 1)

Hmm I wonder how you're supposed to match a backreference followed by a digit literal. Non capture groups?

\11
vs
\1(?:1)

That could create some edge case bugs, depending on how we replace the backreferences

$backref(1)1
Converted to
\11
Instead of
\1(?:1)

@matter123
Copy link
Collaborator

Backreferences actually become (?:\1)1

@RedCMD
Copy link

RedCMD commented May 23, 2022

I guess just for future reference:
\N, \k<N> is a backreference; with N >= 1. Will consume all digits. Cannot have leading 0's \\011 is the char point 9 (tab), not capture group 11
https://github.com/kkos/oniguruma/blob/d484916f3ca2b6bbc81accba63054625dfe26b6b/doc/RE#L411

\N is octal; but only 1-3 digits, must only be [0-7] and must be atleast 1 higher than the amount of capture groups available. It does not consume any literal digits after it
https://github.com/kkos/oniguruma/blob/d484916f3ca2b6bbc81accba63054625dfe26b6b/doc/RE#L24

Has there been any development in the performance loss of applying a pattern inside a capture item in the C highlighter?
microsoft/vscode-textmate#167

@jeff-hykin
Copy link
Owner Author

\k<N> is a backreference

@RedCMD
Yeah I probably should've known. I went ahead and updated the grammar builder library with that fix.

Has there been any development in the performance loss of applying a pattern inside a capture item in the C highlighter?

Sadly no

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants