Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Don't accept spaces after heredoc beginning identifier. (was: Make Heredoc syntax highlighting work) #77

Closed
wants to merge 5 commits into from

Conversation

Talon1024
Copy link

I've modified some of the regular expressions so that Heredoc syntax highlighting now works.

Also, I've disabled the "internal heredocs" because, as far as I know, there is no such thing in PHP.

@Ingramz
Copy link
Contributor

Ingramz commented May 25, 2015

heredoc_interior rule is supposed to provide a semantic way of embedding different sources. While not specifically a language feature, it makes an useful convention to follow and the highlighting is there to help with that.

This works with linguist and textmate's PHP bundle in its current form.

<?php

$a = <<<GITHUB
This is a plain string.
SELECT * FROM github WHERE octocat = 'awesome' and ID = 1;
<strong>rainbows</strong>

if(awesome) {
    doSomething(10, function(x){
      console.log(x*x);
    });
}
GITHUB;

$b = <<<SQL
SELECT * FROM github WHERE octocat = 'awesome' and ID = 1;
SQL;

$c = <<<HTML
<strong>rainbows</strong>
HTML;

$d = <<<JAVASCRIPT
if(awesome) {
    doSomething(10, function(x){
      console.log(x*x);
    });
}
JAVASCRIPT;

$e = <<<JSON
{"menu": {
  "id": "file",
  "value": "File",
  "popup": {
    "menuitem": [
      {"value": "New", "onclick": "CreateNewDoc()"},
      {"value": "Open", "onclick": "OpenDoc()"},
      {"value": "Close", "onclick": "CloseDoc()"}
    ]
  }
}}
JSON;

?>

As for the actual issue, I think it has something to do with the lookahead + (?!\\G) plainly not working in this case (possibly an issue with first-mate and not with the grammar itself), but right now I can't really comment on that.

@Talon1024
Copy link
Author

I think the issue was largely to do with the lookahead/lookbehind being used incorrectly in the first place. If you take a close look at the original regex, you'll see that it does a lookahead for the entire here/nowdoc, without matching the here/nowdoc itself. On a related note, lookaround doesn't work well with whitespace, at least, in my experience.

Which brings me to the point: why was lookaround being used in the first place?

@Ingramz
Copy link
Contributor

Ingramz commented May 28, 2015

The main reason is because textmate grammar had it - why change something that has been proven to work, right?

The actual reason however is because if you match something without lookaround, you cannot match it again. Lookaheads in this case allow to match a generic version of the heredoc/nowdow beginning and then more specific versions can be matched instead inside the block, a nifty way of enabling the functionality for heredoc_interior here. As far as I know the whitespace is no problem.

What confuses me a little the usage of (?!\\G) pattern in combination with the lookahead.

Unfortunately I cannot look into this issue before Saturday.

@Talon1024
Copy link
Author

Now that I think of it, the point of doing a lookahead for the beginning of the heredoc is to match the position before the start of the heredoc.

I guess the "ending" token should then match the position after the ending identifier.

On the other hand, Atom uses the Oniguruma regex engine, so the \\G will match the "start position", which I'm assuming is the start of the line which is the position where the match began. Read more about Oniguruma's special regexs here.

@Ingramz
Copy link
Contributor

Ingramz commented May 30, 2015

Either way for some odd reason right now only the first character from <<<SOMETHING is in the heredoc scope and not all of the block.

Pinging @kevinsawicki / @nathansobo for some help with demystifying first-mate behavior in this case.

@Talon1024
Copy link
Author

I think the pattern (?!\\G) will match any position/character that isn't proceded by the start position of the match. The beginning and end regular expressions are, of course, evaluated in that order, and the ending regular expression can only be evaluated once. Since the first < sign isn't proceded by the start of the match (it appears after the start of the match), it is the only character in the heredoc scope.

I'll commit some changes to my repo that (partially) fix the highlighting, as well as preserve the lookaround.

@Ingramz
Copy link
Contributor

Ingramz commented May 30, 2015

Hmm... after sleeping on it, you are right.

@Ingramz
Copy link
Contributor

Ingramz commented May 30, 2015

Still I do not believe that changing the grammar is the solution here. We are talking of at least 3 2 implementations (as Github website uses PCRE) where this exact regex works and now it just doesn't.

@nathansobo
Copy link
Contributor

I probably broke this with some changes to the parser intended to save memory. I will look into it Monday at the latest.

@Ingramz
Copy link
Contributor

Ingramz commented May 30, 2015

@nathansobo it's possible that heredoc strings have never worked in Atom.

@Talon1024
Copy link
Author

@Ingramz
Copy link
Contributor

Ingramz commented May 30, 2015

@Talon1024 Thanks~~, my bad.~~

@Ingramz
Copy link
Contributor

Ingramz commented May 30, 2015

@Talon1024 0.174 did not break it, so the search for a working version continues.

Since I only had a Windows PC handy, I tried all the Windows releases where first-mate version was changed, v0.109.0-current all behave the same way.

@Talon1024
Copy link
Author

I just tried Atom 0.45, and heredoc syntax highlighting didn't work in that version either. On the other hand, maybe this problem could be fixed in first-mate without changing the original regular expressions by making it match as much as possible in between the beginning and end tokens instead of as little as possible... Just a thought.

@Talon1024
Copy link
Author

So, I've compiled my own build of TextMate using the source code from their GitHub repo, and I've played around with the TextMate PHP bundle using the built-in bundle editor.

It works properly if no changes are made to the bundle.

If I remove the #heredoc_interior include from the TextMate grammar, the highlighting breaks, and the code ends up being highlighted in the same way it is in Atom now (only the first < character is in the heredoc scope)

It is partially fixed, with no syntax highlighting for SQL, HTML, JS, or JSON if I make the equivalent changes that I made to the TextMate bundle, as well as remove the #heredoc_interior include.

These findings have led me to believe that "includes" in TextMate work differently than they do in Atom. I think the way it works in TextMate is that, if a match for the included grammar element is found at a position within the parent grammar element, it will highlight from that point using the included grammar element regexes, regardless of whether the included grammar element is completely within the boundaries of the parent grammar element matches or not.

I can provide HTML exports from my experimentation if wanted.

However, The grammar for this package has been changed in the past, and it's ultimately up to whoever is in charge of Atom and this bundle if they want 1:1 TextMate bundle compatibility.

@winstliu
Copy link
Contributor

This no longer merges cleanly; @Talon1024 would you mind rebasing?

@Talon1024
Copy link
Author

OK, I've changed my repo to be up to date with this repo, and I redid my changes from scratch.

EDIT: Here's a screenshot of the effects of my changes (using adventurous-syntax theme):
atomphpheredoc

@winstliu
Copy link
Contributor

I'll try to review this tonight.

@Ingramz
Copy link
Contributor

Ingramz commented Sep 16, 2015

Not that I'd like to see this to be merged, I believe that this workaround won't motivate to solve the underlying cause in first-mate or underlying regular expression engine. I hope that merging this right now won't do that.

Also someone should open an issue in first-mate for that (I can do it later tonight).

@Talon1024 Talon1024 closed this Sep 16, 2015
@Talon1024 Talon1024 reopened this Sep 16, 2015
@Talon1024
Copy link
Author

Oh my goodness. I was able to fix this without changing any of the expressions, but simply moving the heredoc_interior rule out of a nested repository.

Take a look at the structure of the JSON generated from php.cson:
phpgrammarjsonstructure

Notice the heredoc_interior rule is in a nested repository? If I move it out of the nested repository, heredocs will highlight correctly.
phpgrammarhackfixed

So, the problem is really first-mate ignoring nested repositories.

@Ingramz
Copy link
Contributor

Ingramz commented Sep 16, 2015

This seems awfully familiar when I tried to fix it. I ended up posting this: atom/first-mate#38 (comment)

So it seems that there's more than one property that doesn't work when nested.

…ng identifier, as that will cause a PHP syntax error.
@Talon1024
Copy link
Author

In this commit, I've restored the original structure of php.cson, but I've modified the expressions so that they won't accept spaces after the heredoc beginning identifier, since that will cause a PHP syntax error.

@Talon1024 Talon1024 changed the title Make Heredoc syntax highlighting work. Don't accept spaces after heredoc beginning identifier. (was: Make Heredoc syntax highlighting work) Sep 17, 2015
@Ingramz
Copy link
Contributor

Ingramz commented Sep 17, 2015

LGTM 👍

One thing to note though, while syntactically not correct in PHP, the spaces in the end of the line could be highlighted differently. Eg. match ([ \t]*) using invalid.illegal.whitespace-in-end-of-line.php this would highlight the erroneous spot without directly breaking the highlighting.

@winstliu
Copy link
Contributor

Please add tests to this 😄.

@Ingramz
Copy link
Contributor

Ingramz commented Sep 18, 2015

Also note that we don't want to highlight newline characters incorrectly, which is what capture group (\s*) does, although it's fine for the case when you aren't highlighting those characters. That is why I suggested ([ \t]*) instead. Atom doesn't have highlighting support for end-of-lines right now (although it still matches them), but in order to keep the grammar accurate, I suggest you updating all the corresponding capturing groups to match space and tab only.

Talon1024 and others added 2 commits September 18, 2015 15:55
Use [ \t] instead of \s to be more accurate, as \s matches newlines.
…h string at the end of the line does not get tokenized as invalid/illegal.
@Talon1024
Copy link
Author

I just added some breaking tests and modified the grammar so that a zero-length string at the end of the line will not get tokenized as invalid/illegal.

Note that the Travis CI builds will fail until the issue in first-mate, as I noted above, is solved.

@GuilhemN
Copy link

@Talon1024 did you stop working on this ?
It would be amazing to see this working and merged 😄

@Kroc Kroc mentioned this pull request May 22, 2016
@Aldlevine
Copy link

Any progress on this? It would be an awesome feature to have! 👍

@Wardevour
Copy link

I have a block like:

$tst = <<<HTML
<script type="text/javascript">
try {
    // code goes here...
}
catch(e) {
    alert(e.message);
}
</script>
HTML;

JS in a heredoc is probably frowned upon, but the syntax highlighting stops working on line 6 because it's not a correct PHP catch block (missing Exception class type). Will this PR fix my issue or am I having a different problem?

@UziTech
Copy link
Contributor

UziTech commented Feb 19, 2017

Notice the heredoc_interior rule is in a nested repository? If I move it out of the nested repository, heredocs will highlight correctly.

If there was a fix for this over a year ago why hasn't this been fixed yet? I realize it would only be a temporary but it sure would've beaten no fix while we wait on an issue that some other package isn't even working on.

Can we please just get a temporary fix done while we wait on this pull request?

@winstliu
Copy link
Contributor

I would much rather the underlying issue be fixed than piling on "temporary" workarounds, because from experience it then becomes "oh, there's a workaround, so we can deprioritize fixing the actual issue" and the workaround becomes not-so-temporary.

At any rate, I can't merge this even if I wanted to because it has conflicts.

@UziTech
Copy link
Contributor

UziTech commented Feb 19, 2017

If it is between no fix for a year+ (and god knows how much longer) or a temporary fix, I would rather have the temporary fix. Especially since this obviously affects a lot of people

If the underlying issue was a priority, I would agree with you. But it is obviously not.

I created a new pull request #184

expect(tokens[1][6]).toEqual value: 'HEREDOC', scopes: ['text.html.php', 'meta.embedded.block.php', 'source.php', 'string.unquoted.heredoc.php', 'keyword.operator.heredoc.php']
expect(tokens[2][0]).toEqual value: 'I am a heredoc', scopes ['text.html.php', 'meta.embedded.block.php', 'source.php', 'string.unquoted.heredoc.php']
expect(tokens[3][0]).toEqual value: 'HEREDOC', scopes ['text.html.php', 'meta.embedded.block.php', 'source.php', 'string.unquoted.heredoc.php', 'keyword.operator.heredoc.php']
expect(tokens[3][1]).toEqual value: ';', scopes ['text.html.php', 'meta.embedded.block.php', 'source.php', 'punctuation.terminator.expression.php']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 454 - 456

There should be a colon (:) after scopes

@winstliu
Copy link
Contributor

Superseded by #184.

@winstliu winstliu closed this Mar 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants