Search for matching while should be top-down not bottom-up #110

matter123 · 2019-08-22T03:18:31Z

Given the following grammar.

"patterns": [
	{
		"match": "//.*",
		"name": "comment"
	},
	{
		"name": "meta.outer-pattern",
		"begin": "if",
		"end": "^\\s*+endif",
		"patterns": [
			{
				"name": "meta.inner-pattern",
				"begin": "\\G",
				"while": "^\\s*+(?!endif)",
				"patterns": [
					{
						"name": "keyword.control.test",
						"match": "test"
					},
					{
						"include": "$self"
					}
				]
			}
		]
	}
]

And the following test file (comments are of expected highlighting)

// outside of if, no highlighting
test
if
// inside of if, highlighting
test
endif

// scopes close properly, no highlighting
test

if
if
// inside of nested if, highlighting
test
endif
// inside of outer if, highlighting
test
endif

// scopes close properly, no highlighting
test

The 5th and 6th test are highlighted incorrectly as compared to TextMate.

The 5th test is not highlighted because vscode-textmate searches from the bottom up and closes both the outer and inner begin/while rule.

The 6th test is highlighted because the unconsumed final endif matches /if/.

Textmate has the 5th test highlighted and the 6th not.

The text was updated successfully, but these errors were encountered:

msftrncs · 2019-08-22T07:18:20Z

Analyzing this I am pretty sure the correct way to check the rules is from bottom to top.

While rules are specifically for constructs such as MarkDown Quote Block, which becomes nested many times.

nest 1

nest 2 (while rule tests # 1 then # 2 on each line.)

nest 1

Now, with that said, I think TextMate might be detecting that the same rule is nested, and it only pulls the stack back to the point of the top most instance. This would also work for MarkDown just fine. However, this is more complicated than it sounds. If there are any other while rules between the highest instance of the tested rule and the furthest down one, those have to be tested too.

So, flip the stack of rules to a new pile (like currently done), starting with the bottom most while rule (now on the top of the while stack), test if it matches, if not, search the remaining flipped stack for the furthest down instance of the same rule, and if found, pull the main stack back to this point and keep only the top of the flipped stack above the furthest instance, then resume testing the flipped stack, else (if no further instance found) pull the stack back to this point and return.

I think I have a working demonstration of this concept now.

Find the highest instance of the rejected while rule, and return the rule stack to that condition. Otherwise nested while rules simply wipe themselves out starting with the lowest rule. Fix microsoft#110

msftrncs · 2019-08-25T06:08:57Z

For more history on the topic, see previous issue #25.

@mjbvz, do you have any input on this issue or the PR #111?

referring back to #25:

On a new line, check all while conditions from bottom of the stack to the top.

If any of them fails, then pop everything after that while rule off the stack, along with the failed rule.

By nothing more than analytical thinking, I am suspecting its actually:

On a new line, check all while conditions from bottom of the stack to the top.
For each while condition that fails:
1. Search stack from top to bottom for another instance of the same while condition.
2. Pop everything after and including the first (highest) instance found.
3. Continue checking remaining while conditions.

Find the highest instance of the rejected while rule, and return the rule stack to that condition. Otherwise nested while rules simply wipe themselves out starting with the lowest rule. Fix microsoft#110

msftrncs · 2019-08-26T07:49:25Z

I've had a revelation of thought while trying to understand why VS Code displays Markdown quotes in preview differently than in the editor.

Example:

> one nest.
>> second nest, new paragraph.
> continues paragraph of second nest.
also continues paragraph of second nest.

Actually (via GitHub):

one nest.

second nest, new paragraph.
continues paragraph of second nest.
also continues paragraph of second nest.

VS Code Editor highlight:

From VS Code Preview:

I believe VS Code's Markdown grammar is wrong. The while rule is expecting to find additional lines starting with > but that is not how the previewer works. Using the flawed Markdown grammar as a basis for the correct processing order of while conditions copies the flaw to that processing. I believe the correct test is, at a minimum, a non-blank line. but possibly also tests for other kinds of markup, or precisely the same as the while clause of list_paragraph. This is the correct while condition for blockquote: "while": "(^|\\G)([ ]{0,3}(>) ?|(?!\\s*$|#|[ ]{0,3}([-*_>][ ]{2,}){3,}[ \\t]*$\\n?|[ ]{0,3}[*+->]|[ ]{0,3}[0-9]+\\.))". With this change, I get the correct scoping, and it would appear to me that I would get the correct scoping if the while conditions were tested in the other order.

I think there is a chance that the while conditions should be tested top down, not bottom up.

I have closed PR #111 for the time being, at least until I can put more thought/testing in to whether it is correct.

mjbvz · 2019-08-26T21:23:29Z

I don't know specifically how TextMate implements while rules but it would be worth exploring to see if running top-down works for our existing grammars. The key parts are:

We always want to test all while rules when starting a line
If there are rule failures, we want to make sure we cut off the "while stack" at above that failure.

Here more context on why we use bottom-up testing of while rules:

One heavy user of while rules is for markdown fenced code blocks. We use while rules here because we want to make sure that—no matter what the embedded grammar does—a closing ``` still closes the fenced code block.

You cannot implement this behavior with a simple begin/end rule. Consider:

```js
(
```

The js grammar will consume the initial ( and wants to then start consuming the ```. If we used begin/`end` to implement fenced code blocks, until the js grammar stops matching, the markdown grammar's `end` rule for fenced codes block would never be checked. This often results in the entire rest of the file after the fenced code block being highlighted as the embedded language.

So as a first step, here's some pseudo code for how a while rule could behave:

if we found starting ```:
    while we have not found the closing ```:
        run the js grammar on the line

But that pseudo code is misleading because the js grammar itself could contains a while rule too. In that case, we do not want:

if we found starting ```:
    while we have not found the closing ```:
        while some condition in js grammar
            run the sub section of js grammar on the line

because that'd mean that our js grammar would still be able to leak outside of the code block (the js grammar's inner while rule would just keep running without ever yielding to the markdown grammar)

Instead, whenever we are on a new line we first want to check to see if our fenced code block has ended. That means that our pseudo code would really be more like:

if we found starting ```:
    while true:
        if found closing ```:
            break

        if not some condition in js grammar:
           continue
         else:
           run the sub section of js grammar on the line

And that is what testing while rules bottom-up does.

We also use the bottom-up nature of while rules to implement markdown inside jsdoc comments:

/**
 * ```js
 * 1 + 1
 * ```
 */

In the above example, an outer js-doc while rule runs first to consume the leading *. That means that the markdown grammar never has to know about the leading stars at all; as far as it is concerned, the file content is just:

```js
1 + 1
```

Again, I'm not sure if we can actually change from bottom-up to top-down. I'd like to understand if it makes sense and what impact of the change would be. Would our existing approach to fenced-code blocks and jsdoc work for example? Would we need to update our grammars? Could we still prevent leaks?

matter123 · 2019-08-26T22:16:02Z

@mjbvz From what I understand the entire while stack is checked for failure each line, so a failure of a lower while rule would not be ignored.

The actual check: https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L703
The call to _checkWhileConditions: https://github.com/microsoft/vscode-textmate/blob/master/src/grammar.ts#L757

I do agree that while rules consuming leading characters, necessitates testing the while rules needs to happen bottom-up. Textmate seems to do different less useful behavior:

Example Grammar:

{	patterns = (
		{	name = 'meta.while';
			begin = 'A';
			while = '(?:^|\G)\s*a';
			patterns = ( { include = '$self'; } );
		},
		{	name = 'meta.while2';
			begin = 'B';
			while = '(?:^|\G)\s*b';
			patterns = ( { include = '$self'; } );
		},
		{
			name = 'string.unquoted.a.b';
			match = 'a b';
		},
		{
			name = 'keyword.control.b';
			match = 'b';
		}
	);
}

Image:

With the above grammar, the first block works as expected.
However in the second block, both while rules have ended at line 9, and no combination of a and b causes either while rule to still be in effect at line 9.

msftrncs · 2019-08-27T04:30:27Z

Now having read some of the spec for Markdown, and having altered the blockquote rule to better behave as the spec reads, I find that Markdown does not need bottom up testing. Actually, it works better with top down, as it fixes one last detail:

> first nest
    first nest paragraph continued

according to GitHub (which does paragraphs wrong)

first nest
first nest paragraph continued

In VS Code with an improved but not perfect blockquote:

But if you set the testing of the rules to be top down, suddenly the issue shown above goes away. (per the inspect script anyway)

Yes, the entire while rule stack needs to be tested when using top down.

To be clear, the current markdown grammar may fail using the top down approach, but it technically wasn't working right anyway. I'll have to test a few other patterns listed above, including I would like to visit the sample in #25.

msftrncs · 2019-08-27T07:02:51Z

Consider the below examples of markdown quoted fenced code blocks. Top down while rule will fail in the second example where the blockquoting has escaped the fenced code block, but NOT because the fence block bleeds, but because the > was consumed, scoped by a higher while rule when a lower while rule fails because it cannot find another > (which is required) and threw away the entire stack. In this case, the characters that were consumed need to be returned, so they can be retested with new rules. To be exact, the previous while tests that passed need to be rolled back (if they captured anything), and then the failing while rule probably needs to be tested again. I am not sure this makes much difference, top-down or bottom-up.

Back to the #endif. Its not just a matter of top-down. There also needs to be a mechanism that can handle nested BeginWhile rules. Otherwise, if you have two identical non-consuming/negative look-ahead while conditions in a row, they will both pass or fail because they are making the same test at the same cursor position. The fence on a fenced code block is the exact same condition. Even though its possible to fence in markdown in markdown, its not possible to fence in another block inside the fenced in markdown because both blocks will be ended with the same end fence. (there are two possible fence characters though)

Examples

> > ~~~PowerShell
> > $a = (
> > ~~~

$a = (

> > ~~~PowerShell
> $a = (
> > ~~~

$a = (

msftrncs · 2019-08-27T07:37:45Z

Summary: I was able to fix everything in Markdown that I thought might have only been possible if the while conditions were tested top down. Of course, this doesn't mean any of my changes would work correctly on TextMate, Sublime or Atom.

mjbvz · 2019-11-20T19:51:56Z

I believe the current behavior is the expected design. Since you can workaround the original problem , let me know if you still think we should revisit this

matter123 mentioned this issue Aug 22, 2019

#endif incorrectly closes comments. jeff-hykin/better-cpp-syntax#359

Open

msftrncs mentioned this issue Aug 22, 2019

Pop only highest instance of While Rule #111

Closed

msftrncs mentioned this issue Aug 26, 2019

BlockQuote while condition is incorrect, scoping doesn't match preview or GitHub microsoft/vscode-markdown-tm-grammar#56

Open

alexdima assigned mjbvz Nov 20, 2019

mjbvz closed this as completed Nov 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search for matching while should be top-down not bottom-up #110

Search for matching while should be top-down not bottom-up #110

matter123 commented Aug 22, 2019 •

edited

Loading

msftrncs commented Aug 22, 2019

msftrncs commented Aug 25, 2019

msftrncs commented Aug 26, 2019

mjbvz commented Aug 26, 2019 •

edited

Loading

matter123 commented Aug 26, 2019 •

edited

Loading

msftrncs commented Aug 27, 2019

msftrncs commented Aug 27, 2019

msftrncs commented Aug 27, 2019

mjbvz commented Nov 20, 2019

Search for matching while should be top-down not bottom-up #110

Search for matching while should be top-down not bottom-up #110

Comments

matter123 commented Aug 22, 2019 • edited Loading

msftrncs commented Aug 22, 2019

msftrncs commented Aug 25, 2019

msftrncs commented Aug 26, 2019

mjbvz commented Aug 26, 2019 • edited Loading

matter123 commented Aug 26, 2019 • edited Loading

msftrncs commented Aug 27, 2019

msftrncs commented Aug 27, 2019

msftrncs commented Aug 27, 2019

mjbvz commented Nov 20, 2019

matter123 commented Aug 22, 2019 •

edited

Loading

mjbvz commented Aug 26, 2019 •

edited

Loading

matter123 commented Aug 26, 2019 •

edited

Loading