Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search for matching while should be top-down not bottom-up #110

Closed
matter123 opened this issue Aug 22, 2019 · 9 comments
Closed

Search for matching while should be top-down not bottom-up #110

matter123 opened this issue Aug 22, 2019 · 9 comments
Assignees

Comments

@matter123
Copy link

matter123 commented Aug 22, 2019

Given the following grammar.

"patterns": [
	{
		"match": "//.*",
		"name": "comment"
	},
	{
		"name": "meta.outer-pattern",
		"begin": "if",
		"end": "^\\s*+endif",
		"patterns": [
			{
				"name": "meta.inner-pattern",
				"begin": "\\G",
				"while": "^\\s*+(?!endif)",
				"patterns": [
					{
						"name": "keyword.control.test",
						"match": "test"
					},
					{
						"include": "$self"
					}
				]
			}
		]
	}
]

And the following test file (comments are of expected highlighting)

// outside of if, no highlighting
test
if
// inside of if, highlighting
test
endif

// scopes close properly, no highlighting
test

if
if
// inside of nested if, highlighting
test
endif
// inside of outer if, highlighting
test
endif

// scopes close properly, no highlighting
test

The 5th and 6th test are highlighted incorrectly as compared to TextMate.

The 5th test is not highlighted because vscode-textmate searches from the bottom up and closes both the outer and inner begin/while rule.

The 6th test is highlighted because the unconsumed final endif matches /if/.

Textmate has the 5th test highlighted and the 6th not.
Screen Shot 2019-08-21 at 20 19 19

@msftrncs
Copy link
Contributor

Analyzing this I am pretty sure the correct way to check the rules is from bottom to top.

While rules are specifically for constructs such as MarkDown Quote Block, which becomes nested many times.

nest 1

nest 2 (while rule tests # 1 then # 2 on each line.)

nest 1

Now, with that said, I think TextMate might be detecting that the same rule is nested, and it only pulls the stack back to the point of the top most instance. This would also work for MarkDown just fine. However, this is more complicated than it sounds. If there are any other while rules between the highest instance of the tested rule and the furthest down one, those have to be tested too.

So, flip the stack of rules to a new pile (like currently done), starting with the bottom most while rule (now on the top of the while stack), test if it matches, if not, search the remaining flipped stack for the furthest down instance of the same rule, and if found, pull the main stack back to this point and keep only the top of the flipped stack above the furthest instance, then resume testing the flipped stack, else (if no further instance found) pull the stack back to this point and return.

I think I have a working demonstration of this concept now.

msftrncs added a commit to msftrncs/vscode-textmate that referenced this issue Aug 22, 2019
Find the highest instance of the rejected while rule, and return the rule
stack to that condition.  Otherwise nested while rules simply wipe
themselves out starting with the lowest rule.

Fix microsoft#110
msftrncs added a commit to msftrncs/vscode-textmate that referenced this issue Aug 22, 2019
Find the highest instance of the rejected while rule, and return the rule
stack to that condition.  Otherwise nested while rules simply wipe
themselves out starting with the lowest rule.

Fix microsoft#110
msftrncs added a commit to msftrncs/vscode-textmate that referenced this issue Aug 22, 2019
Find the highest instance of the rejected while rule, and return the rule
stack to that condition.  Otherwise nested while rules simply wipe
themselves out starting with the lowest rule.

Fix microsoft#110
msftrncs added a commit to msftrncs/vscode-textmate that referenced this issue Aug 25, 2019
Find the highest instance of the rejected while rule, and return the rule
stack to that condition.  Otherwise nested while rules simply wipe
themselves out starting with the lowest rule.

Fix microsoft#110
@msftrncs
Copy link
Contributor

For more history on the topic, see previous issue #25.

@mjbvz, do you have any input on this issue or the PR #111?

referring back to #25:

  1. On a new line, check all while conditions from bottom of the stack to the top.
  2. If any of them fails, then pop everything after that while rule off the stack, along with the failed rule.

By nothing more than analytical thinking, I am suspecting its actually:

  1. On a new line, check all while conditions from bottom of the stack to the top.
  2. For each while condition that fails:
    1. Search stack from top to bottom for another instance of the same while condition.
    2. Pop everything after and including the first (highest) instance found.
    3. Continue checking remaining while conditions.

msftrncs added a commit to msftrncs/vscode-textmate that referenced this issue Aug 25, 2019
Find the highest instance of the rejected while rule, and return the rule
stack to that condition.  Otherwise nested while rules simply wipe
themselves out starting with the lowest rule.

Fix microsoft#110
msftrncs added a commit to msftrncs/vscode-textmate that referenced this issue Aug 25, 2019
Find the highest instance of the rejected while rule, and return the rule
stack to that condition.  Otherwise nested while rules simply wipe
themselves out starting with the lowest rule.

Fix microsoft#110
@msftrncs
Copy link
Contributor

I've had a revelation of thought while trying to understand why VS Code displays Markdown quotes in preview differently than in the editor.

Example:

> one nest.
>> second nest, new paragraph.
> continues paragraph of second nest.
also continues paragraph of second nest.

Actually (via GitHub):

one nest.

second nest, new paragraph.
continues paragraph of second nest.
also continues paragraph of second nest.

VS Code Editor highlight:
image

From VS Code Preview:
image

I believe VS Code's Markdown grammar is wrong. The while rule is expecting to find additional lines starting with > but that is not how the previewer works. Using the flawed Markdown grammar as a basis for the correct processing order of while conditions copies the flaw to that processing. I believe the correct test is, at a minimum, a non-blank line. but possibly also tests for other kinds of markup, or precisely the same as the while clause of list_paragraph. This is the correct while condition for blockquote: "while": "(^|\\G)([ ]{0,3}(>) ?|(?!\\s*$|#|[ ]{0,3}([-*_>][ ]{2,}){3,}[ \\t]*$\\n?|[ ]{0,3}[*+->]|[ ]{0,3}[0-9]+\\.))". With this change, I get the correct scoping, and it would appear to me that I would get the correct scoping if the while conditions were tested in the other order.

I think there is a chance that the while conditions should be tested top down, not bottom up.

I have closed PR #111 for the time being, at least until I can put more thought/testing in to whether it is correct.

@mjbvz
Copy link
Contributor

mjbvz commented Aug 26, 2019

I don't know specifically how TextMate implements while rules but it would be worth exploring to see if running top-down works for our existing grammars. The key parts are:

  • We always want to test all while rules when starting a line
  • If there are rule failures, we want to make sure we cut off the "while stack" at above that failure.

Here more context on why we use bottom-up testing of while rules:

One heavy user of while rules is for markdown fenced code blocks. We use while rules here because we want to make sure that—no matter what the embedded grammar does—a closing ``` still closes the fenced code block.

You cannot implement this behavior with a simple begin/end rule. Consider:

```js
(
```

The js grammar will consume the initial ( and wants to then start consuming the ```. If we used begin/`end` to implement fenced code blocks, until the js grammar stops matching, the markdown grammar's `end` rule for fenced codes block would never be checked. This often results in the entire rest of the file after the fenced code block being highlighted as the embedded language.

So as a first step, here's some pseudo code for how a while rule could behave:

if we found starting ```:
    while we have not found the closing ```:
        run the js grammar on the line

But that pseudo code is misleading because the js grammar itself could contains a while rule too. In that case, we do not want:

if we found starting ```:
    while we have not found the closing ```:
        while some condition in js grammar
            run the sub section of js grammar on the line

because that'd mean that our js grammar would still be able to leak outside of the code block (the js grammar's inner while rule would just keep running without ever yielding to the markdown grammar)

Instead, whenever we are on a new line we first want to check to see if our fenced code block has ended. That means that our pseudo code would really be more like:

if we found starting ```:
    while true:
        if found closing ```:
            break

        if not some condition in js grammar:
           continue
         else:
           run the sub section of js grammar on the line

And that is what testing while rules bottom-up does.

We also use the bottom-up nature of while rules to implement markdown inside jsdoc comments:

/**
 * ```js
 * 1 + 1
 * ```
 */

In the above example, an outer js-doc while rule runs first to consume the leading *. That means that the markdown grammar never has to know about the leading stars at all; as far as it is concerned, the file content is just:

```js
1 + 1
```

Again, I'm not sure if we can actually change from bottom-up to top-down. I'd like to understand if it makes sense and what impact of the change would be. Would our existing approach to fenced-code blocks and jsdoc work for example? Would we need to update our grammars? Could we still prevent leaks?

@matter123
Copy link
Author

matter123 commented Aug 26, 2019

@mjbvz From what I understand the entire while stack is checked for failure each line, so a failure of a lower while rule would not be ignored.

The actual check: https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L703
The call to _checkWhileConditions: https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L757

I do agree that while rules consuming leading characters, necessitates testing the while rules needs to happen bottom-up. Textmate seems to do different less useful behavior:

Example Grammar:
{	patterns = (
		{	name = 'meta.while';
			begin = 'A';
			while = '(?:^|\G)\s*a';
			patterns = ( { include = '$self'; } );
		},
		{	name = 'meta.while2';
			begin = 'B';
			while = '(?:^|\G)\s*b';
			patterns = ( { include = '$self'; } );
		},
		{
			name = 'string.unquoted.a.b';
			match = 'a b';
		},
		{
			name = 'keyword.control.b';
			match = 'b';
		}
	);
}

Image:
Screen Shot 2019-08-26 at 15 12 34

With the above grammar, the first block works as expected.
However in the second block, both while rules have ended at line 9, and no combination of a and b causes either while rule to still be in effect at line 9.

@msftrncs
Copy link
Contributor

Now having read some of the spec for Markdown, and having altered the blockquote rule to better behave as the spec reads, I find that Markdown does not need bottom up testing. Actually, it works better with top down, as it fixes one last detail:

> first nest
    first nest paragraph continued

according to GitHub (which does paragraphs wrong)

first nest
first nest paragraph continued

In VS Code with an improved but not perfect blockquote:
image

But if you set the testing of the rules to be top down, suddenly the issue shown above goes away. (per the inspect script anyway)

Yes, the entire while rule stack needs to be tested when using top down.

To be clear, the current markdown grammar may fail using the top down approach, but it technically wasn't working right anyway. I'll have to test a few other patterns listed above, including I would like to visit the sample in #25.

@msftrncs
Copy link
Contributor

Consider the below examples of markdown quoted fenced code blocks. Top down while rule will fail in the second example where the blockquoting has escaped the fenced code block, but NOT because the fence block bleeds, but because the > was consumed, scoped by a higher while rule when a lower while rule fails because it cannot find another > (which is required) and threw away the entire stack. In this case, the characters that were consumed need to be returned, so they can be retested with new rules. To be exact, the previous while tests that passed need to be rolled back (if they captured anything), and then the failing while rule probably needs to be tested again. I am not sure this makes much difference, top-down or bottom-up.

Back to the #endif. Its not just a matter of top-down. There also needs to be a mechanism that can handle nested BeginWhile rules. Otherwise, if you have two identical non-consuming/negative look-ahead while conditions in a row, they will both pass or fail because they are making the same test at the same cursor position. The fence on a fenced code block is the exact same condition. Even though its possible to fence in markdown in markdown, its not possible to fence in another block inside the fenced in markdown because both blocks will be ended with the same end fence. (there are two possible fence characters though)

Examples

> > ~~~PowerShell
> > $a = (
> > ~~~
$a = (
> > ~~~PowerShell
> $a = (
> > ~~~

$a = (

@msftrncs
Copy link
Contributor

Summary: I was able to fix everything in Markdown that I thought might have only been possible if the while conditions were tested top down. Of course, this doesn't mean any of my changes would work correctly on TextMate, Sublime or Atom.

@mjbvz
Copy link
Contributor

mjbvz commented Nov 20, 2019

I believe the current behavior is the expected design. Since you can workaround the original problem , let me know if you still think we should revisit this

@mjbvz mjbvz closed this as completed Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants