-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode code point escapes #4248
Comments
A temporary fix is to go to lib/lexer.js and remove (u(?![\da-fA-F]{4}).{0,4})) section (last part of the line) from the INVALID_ESCAPE variable. FROM: |
A better thing to do is to fix that regex instead. Here's my suggestion: INVALID_ESCAPE = ///
( (?:^|[^\\]) (?:\\\\)* ) # make sure the escape isn’t escaped
\\ (
?: (0[0-7]|[1-7]) # octal escape
| (x(?![\da-fA-F]{2}).{0,2}) # hex escape
| (u\{(?![\da-fA-F]{1,6}\}).{0,7}) # unicode code point escape
| (u(?!\{|[\da-fA-F]{4}).{0,4}) # unicode escape
)
/// |
Celebrating the first anniversary of this (rather trivial, IMHO) bug. Come to think of it, it is not immediately clear to me why CS has to parse the escape sequence at all. Couldn't the compiler just pass everything through to JS and let the runtime worry about it? The same goes for RegExp flags, where |
@helixbass this looks very similar to #4489. Care to take a look? |
Well, yes, in a way. Thing is that at least for those specific and non-trivial sub-syntaxes as (single-slashed) RegExps, CS should, I think, just keep out and only do the bare minimum (e.g. find the closing slash and hop over the trailing flag letters); it can then pass the entire construct through to the JS source it generates. Likewise, string literals are instances of a (comparatively minimal) embedded syntax (this point was made very clear by Larry Wall speaking about Perl 5 and 6 a few years ago, and I think his point of view is a justified and valuable one in this case). Arguably, CS should do nothing to those parts of the source beyond cherry-picking whatever enhancements it implements (such as string interpolation). Because, hey, it's "just JavaScript", right? This does have the drawback that a given source may compile without error, and then fail with a nasty SyntaxError only upon getting loaded by whatever JS runtime it is to be consumed by. On the bright side, it also means one less point of failure for the parser; what's more, users can then just use new features they know are supported by their targeted engines and do not have to wait for CS to catch up. The occasional late syntax error that may creep into the process might make some devs adversary to this proposal; OTOH, we already have that feature / problem in the guise of backtick-quoted JS literals where you can put anything at all and eschew any CS checks. |
Update Coming to think of it—while it is noble and notable that CS supports arbitrary CS within variable interpolations (e.g. coffee> "hello #{ "foo" } world"
'hello foo world'
coffee> "hello #{ "foo #{ 42 ** 3 } wat" } world"
'hello foo 74088 wat world'
coffee> "hello #{ "foo #{ 42 ** "-#{ 1 + 1 + 1 }" } wat" } world"
'hello foo 0.000013497462477054314 wat world' all work) it is also questionable whether this amount of syntactical recursivity is ever needed in ecologically responsible and cat-friendly source. I have much more of an urgent need to nest my block comments (which CS does not support) than to deeply nest my string interpolations. Granted, the line between 'sensible, if rare' and 'mad hatter meets march hare' is hard to draw. |
Oh? Do tell. |
@GeoffreyBooth I think @lydell has the right idea above, submitted a pull request based on it |
@jashkenas it is often that you want to temporarily comment out entire swathes of source code, to mark it for imminent removal or just make sure it doesn't interfere with whatever you're experimenting with. One way to do that is to mark the affected lines and hit a shortcut key to make all of them line comments; another one is to put them inside block comments. The first method always works, but it has the disadvantage of changing a lot of single lines, something you have to undo later; you also want to have a suitable editor to do that. The second method is going to fail in case there already are block comments in the portions to be hidden. In CS, there seems to be no easy fix for that, since both ends of block comments use the same markup. In other languages / syntaxes, block comments also seem to be regularly non-nesting only. In JS, nested Of course, one might argue that if block comments could be nested in a given language—they would still only work if the out-commented parts did not contain stray end-of-comment marks, so that reduces the utility of nested block comments. OTOH, MIME boundaries in emails, Perl here docs and PostgreSQL dollar-quoted string constants are constructs that do allow you to quote / comment anything Sorry for the lengthy text. Just wanting to say that CoffeeScript's syntax is awesome because it allows me to have matroshka code with nested interpolations inside nested interpolations. It is also too complex at this particular point (although ideally it could be a by-product of a general recursive rule, and thus not burden the compiler) unless it can be demonstrated that nested interpolations have some useful property (which to me they have not; I even replace simple interpolations with explicit concatenation where I think it is clearer). |
Update Is this useful or wat? "hello #{ "foo #{ 42 ** "-#{ 1 + 1 + 1 }" } wat" } world"
"""hello #{ "foo #{ 42 ** "-#{
sum = 0
for x in [ 1 .. 10 ]
console.log "oops ##{x}"
sum += x
x
}" } wat" } world""" var sum, x;
"hello " + ("foo " + (Math.pow(42, "-" + (1 + 1 + 1))) + " wat") + " world";
"hello " + ("foo " + (Math.pow(42, "-" + ((function() {
var i;
sum = 0;
for (x = i = 1; i <= 10; x = ++i) {
console.log("oops #" + x);
sum += x;
}
return x;
})()))) + " wat") + " world"; |
@loveencounterflow One option is to use triple backticks and
|
Even with different delimiters for the start and end of block comments — your approach #2 still fails, as you note, because the end delimiter of the interior block comment closes out the outer comment prematurely. Highlighting + the hot key to line-comment all of the lines, is the right way to do this. |
@GeoffreyBooth neat, didn't think about that. @jashkenas you're probably right and this is how I do it all the time. |
* Fix #4248: Unicode code point escapes * rewrite unicode code point escapes as unicode escapes * smarter defaults * and resimplify * correct surrogate pairs * fixes from code review * handle adjacent code point escapes * smarter regex * fix from code review * refactor toJS() to shared test helper
New ECMAScript 6 unicode code point escapes ("\u{1F4A9}") are invalid in v1.10, but works in Node.
They should be passed from CS to JS as is, I believe.
The text was updated successfully, but these errors were encountered: