-
Notifications
You must be signed in to change notification settings - Fork 120
Don't accept spaces after heredoc beginning identifier. (was: Make Heredoc syntax highlighting work) #77
Conversation
heredoc_interior rule is supposed to provide a semantic way of embedding different sources. While not specifically a language feature, it makes an useful convention to follow and the highlighting is there to help with that. This works with linguist and textmate's PHP bundle in its current form. <?php
$a = <<<GITHUB
This is a plain string.
SELECT * FROM github WHERE octocat = 'awesome' and ID = 1;
<strong>rainbows</strong>
if(awesome) {
doSomething(10, function(x){
console.log(x*x);
});
}
GITHUB;
$b = <<<SQL
SELECT * FROM github WHERE octocat = 'awesome' and ID = 1;
SQL;
$c = <<<HTML
<strong>rainbows</strong>
HTML;
$d = <<<JAVASCRIPT
if(awesome) {
doSomething(10, function(x){
console.log(x*x);
});
}
JAVASCRIPT;
$e = <<<JSON
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
JSON;
?> As for the actual issue, I think it has something to do with the lookahead + |
I think the issue was largely to do with the lookahead/lookbehind being used incorrectly in the first place. If you take a close look at the original regex, you'll see that it does a lookahead for the entire here/nowdoc, without matching the here/nowdoc itself. On a related note, lookaround doesn't work well with whitespace, at least, in my experience. Which brings me to the point: why was lookaround being used in the first place? |
The main reason is because textmate grammar had it - why change something that has been proven to work, right? The actual reason however is because if you match something without lookaround, you cannot match it again. Lookaheads in this case allow to match a generic version of the heredoc/nowdow beginning and then more specific versions can be matched instead inside the block, a nifty way of enabling the functionality for heredoc_interior here. As far as I know the whitespace is no problem. What confuses me a little the usage of Unfortunately I cannot look into this issue before Saturday. |
Now that I think of it, the point of doing a lookahead for the beginning of the heredoc is to match the position before the start of the heredoc. I guess the "ending" token should then match the position after the ending identifier. On the other hand, Atom uses the Oniguruma regex engine, so the |
Either way for some odd reason right now only the first character from Pinging @kevinsawicki / @nathansobo for some help with demystifying first-mate behavior in this case. |
I think the pattern I'll commit some changes to my repo that (partially) fix the highlighting, as well as preserve the lookaround. |
Hmm... after sleeping on it, you are right. |
Still I do not believe that changing the grammar is the solution here. We are talking of at least |
I probably broke this with some changes to the parser intended to save memory. I will look into it Monday at the latest. |
@nathansobo it's possible that heredoc strings have never worked in Atom. |
@Talon1024 Thanks~~, my bad.~~ |
@Talon1024 0.174 did not break it, so the search for a working version continues. Since I only had a Windows PC handy, I tried all the Windows releases where first-mate version was changed, v0.109.0-current all behave the same way. |
I just tried Atom 0.45, and heredoc syntax highlighting didn't work in that version either. On the other hand, maybe this problem could be fixed in first-mate without changing the original regular expressions by making it match as much as possible in between the beginning and end tokens instead of as little as possible... Just a thought. |
So, I've compiled my own build of TextMate using the source code from their GitHub repo, and I've played around with the TextMate PHP bundle using the built-in bundle editor. It works properly if no changes are made to the bundle. If I remove the It is partially fixed, with no syntax highlighting for SQL, HTML, JS, or JSON if I make the equivalent changes that I made to the TextMate bundle, as well as remove the These findings have led me to believe that "includes" in TextMate work differently than they do in Atom. I think the way it works in TextMate is that, if a match for the included grammar element is found at a position within the parent grammar element, it will highlight from that point using the included grammar element regexes, regardless of whether the included grammar element is completely within the boundaries of the parent grammar element matches or not. I can provide HTML exports from my experimentation if wanted. However, The grammar for this package has been changed in the past, and it's ultimately up to whoever is in charge of Atom and this bundle if they want 1:1 TextMate bundle compatibility. |
This no longer merges cleanly; @Talon1024 would you mind rebasing? |
I'll try to review this tonight. |
Not that I'd like to see this to be merged, I believe that this workaround won't motivate to solve the underlying cause in first-mate or underlying regular expression engine. I hope that merging this right now won't do that. Also someone should open an issue in first-mate for that (I can do it later tonight). |
This seems awfully familiar when I tried to fix it. I ended up posting this: atom/first-mate#38 (comment) So it seems that there's more than one property that doesn't work when nested. |
…ng identifier, as that will cause a PHP syntax error.
In this commit, I've restored the original structure of php.cson, but I've modified the expressions so that they won't accept spaces after the heredoc beginning identifier, since that will cause a PHP syntax error. |
LGTM 👍 One thing to note though, while syntactically not correct in PHP, the spaces in the end of the line could be highlighted differently. Eg. match |
Please add tests to this 😄. |
Also note that we don't want to highlight newline characters incorrectly, which is what capture group |
Use [ \t] instead of \s to be more accurate, as \s matches newlines.
…h string at the end of the line does not get tokenized as invalid/illegal.
I just added some breaking tests and modified the grammar so that a zero-length string at the end of the line will not get tokenized as invalid/illegal. Note that the Travis CI builds will fail until the issue in first-mate, as I noted above, is solved. |
@Talon1024 did you stop working on this ? |
Any progress on this? It would be an awesome feature to have! 👍 |
I have a block like: $tst = <<<HTML
<script type="text/javascript">
try {
// code goes here...
}
catch(e) {
alert(e.message);
}
</script>
HTML; JS in a heredoc is probably frowned upon, but the syntax highlighting stops working on line 6 because it's not a correct PHP catch block (missing Exception class type). Will this PR fix my issue or am I having a different problem? |
If there was a fix for this over a year ago why hasn't this been fixed yet? I realize it would only be a temporary but it sure would've beaten no fix while we wait on an issue that some other package isn't even working on. Can we please just get a temporary fix done while we wait on this pull request? |
I would much rather the underlying issue be fixed than piling on "temporary" workarounds, because from experience it then becomes "oh, there's a workaround, so we can deprioritize fixing the actual issue" and the workaround becomes not-so-temporary. At any rate, I can't merge this even if I wanted to because it has conflicts. |
If it is between no fix for a year+ (and god knows how much longer) or a temporary fix, I would rather have the temporary fix. Especially since this obviously affects a lot of people If the underlying issue was a priority, I would agree with you. But it is obviously not. I created a new pull request #184 |
expect(tokens[1][6]).toEqual value: 'HEREDOC', scopes: ['text.html.php', 'meta.embedded.block.php', 'source.php', 'string.unquoted.heredoc.php', 'keyword.operator.heredoc.php'] | ||
expect(tokens[2][0]).toEqual value: 'I am a heredoc', scopes ['text.html.php', 'meta.embedded.block.php', 'source.php', 'string.unquoted.heredoc.php'] | ||
expect(tokens[3][0]).toEqual value: 'HEREDOC', scopes ['text.html.php', 'meta.embedded.block.php', 'source.php', 'string.unquoted.heredoc.php', 'keyword.operator.heredoc.php'] | ||
expect(tokens[3][1]).toEqual value: ';', scopes ['text.html.php', 'meta.embedded.block.php', 'source.php', 'punctuation.terminator.expression.php'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 454 - 456
There should be a colon (:) after scopes
Superseded by #184. |
I've modified some of the regular expressions so that Heredoc syntax highlighting now works.
Also, I've disabled the "internal heredocs" because, as far as I know, there is no such thing in PHP.