Unicode code point escapes #4248

ukoloff · 2016-04-11T17:45:18Z

New ECMAScript 6 unicode code point escapes ("\u{1F4A9}") are invalid in v1.10, but works in Node.

They should be passed from CS to JS as is, I believe.

gfung · 2016-06-28T21:32:57Z

A temporary fix is to go to lib/lexer.js and remove (u(?![\da-fA-F]{4}).{0,4})) section (last part of the line) from the INVALID_ESCAPE variable.

FROM:
INVALID_ESCAPE = /((?:^|[^\\])(?:\\\\)*)\\(?:(0[0-7]|[1-7])|(x(?![\da-fA-F]{2}).{0,2})|(u(?![\da-fA-F]{4}).{0,4}))/;
TO:
INVALID_ESCAPE = ((?:^|[^\\])(?:\\\\)*)\\(?:(0[0-7]|[1-7])|(x(?![\da-fA-F]{2}).{0,2}))

lydell · 2016-06-29T06:00:02Z

A better thing to do is to fix that regex instead. Here's my suggestion:

INVALID_ESCAPE      = ///
  ( (?:^|[^\\]) (?:\\\\)* )        # make sure the escape isn’t escaped
  \\ (
     ?: (0[0-7]|[1-7])             # octal escape
      | (x(?![\da-fA-F]{2}).{0,2}) # hex escape
      | (u\{(?![\da-fA-F]{1,6}\}).{0,7}) # unicode code point escape
      | (u(?!\{|[\da-fA-F]{4}).{0,4}) # unicode escape
  )
///

loveencounterflow · 2017-04-11T23:48:14Z

Celebrating the first anniversary of this (rather trivial, IMHO) bug.

Come to think of it, it is not immediately clear to me why CS has to parse the escape sequence at all. Couldn't the compiler just pass everything through to JS and let the runtime worry about it? The same goes for RegExp flags, where /x/y surfaced another long-standing bug in CS, if memory serves.

GeoffreyBooth · 2017-04-12T08:03:42Z

@helixbass this looks very similar to #4489. Care to take a look?

loveencounterflow · 2017-04-12T10:57:34Z

Well, yes, in a way. Thing is that at least for those specific and non-trivial sub-syntaxes as (single-slashed) RegExps, CS should, I think, just keep out and only do the bare minimum (e.g. find the closing slash and hop over the trailing flag letters); it can then pass the entire construct through to the JS source it generates.

Likewise, string literals are instances of a (comparatively minimal) embedded syntax (this point was made very clear by Larry Wall speaking about Perl 5 and 6 a few years ago, and I think his point of view is a justified and valuable one in this case). Arguably, CS should do nothing to those parts of the source beyond cherry-picking whatever enhancements it implements (such as string interpolation). Because, hey, it's "just JavaScript", right?

This does have the drawback that a given source may compile without error, and then fail with a nasty SyntaxError only upon getting loaded by whatever JS runtime it is to be consumed by. On the bright side, it also means one less point of failure for the parser; what's more, users can then just use new features they know are supported by their targeted engines and do not have to wait for CS to catch up.

The occasional late syntax error that may creep into the process might make some devs adversary to this proposal; OTOH, we already have that feature / problem in the guise of backtick-quoted JS literals where you can put anything at all and eschew any CS checks.

loveencounterflow · 2017-04-12T11:04:22Z

Update Coming to think of it—while it is noble and notable that CS supports arbitrary CS within variable interpolations (e.g.

coffee> "hello #{ "foo" } world"
'hello foo world'
coffee> "hello #{ "foo #{ 42 ** 3 } wat" } world"
'hello foo 74088 wat world'
coffee> "hello #{ "foo #{ 42 ** "-#{ 1 + 1 + 1 }" } wat" } world"
'hello foo 0.000013497462477054314 wat world'

all work) it is also questionable whether this amount of syntactical recursivity is ever needed in ecologically responsible and cat-friendly source. I have much more of an urgent need to nest my block comments (which CS does not support) than to deeply nest my string interpolations. Granted, the line between 'sensible, if rare' and 'mad hatter meets march hare' is hard to draw.

jashkenas · 2017-04-12T17:14:11Z

I have much more of an urgent need to nest my block comments (which CS does not support) than to deeply nest my string interpolations.

Oh? Do tell.

helixbass · 2017-04-13T01:04:35Z

@GeoffreyBooth I think @lydell has the right idea above, submitted a pull request based on it

loveencounterflow · 2017-04-13T10:33:25Z

@jashkenas it is often that you want to temporarily comment out entire swathes of source code, to mark it for imminent removal or just make sure it doesn't interfere with whatever you're experimenting with. One way to do that is to mark the affected lines and hit a shortcut key to make all of them line comments; another one is to put them inside block comments.

The first method always works, but it has the disadvantage of changing a lot of single lines, something you have to undo later; you also want to have a suitable editor to do that.

The second method is going to fail in case there already are block comments in the portions to be hidden. In CS, there seems to be no easy fix for that, since both ends of block comments use the same markup. In other languages / syntaxes, block comments also seem to be regularly non-nesting only. In JS, nested /*.../*...*/...*/ comments are misunderstood as malformed regexen (NodeJS at least). In HTML with its insane comment definition taken over from SGML, this is especially annoying b/c HTML has no line comments, so there's often no way to just disable part of a page other than to delete those portions.

Of course, one might argue that if block comments could be nested in a given language—they would still only work if the out-commented parts did not contain stray end-of-comment marks, so that reduces the utility of nested block comments. OTOH, MIME boundaries in emails, Perl here docs and PostgreSQL dollar-quoted string constants are constructs that do allow you to quote / comment anything ~~regardless of content~~ provided you can come up with a proper unique string of characters, which you always can.

Sorry for the lengthy text. Just wanting to say that CoffeeScript's syntax is awesome because it allows me to have matroshka code with nested interpolations inside nested interpolations. It is also too complex at this particular point (although ideally it could be a by-product of a general recursive rule, and thus not burden the compiler) unless it can be demonstrated that nested interpolations have some useful property (which to me they have not; I even replace simple interpolations with explicit concatenation where I think it is clearer).

loveencounterflow · 2017-04-13T10:42:17Z

Update Is this useful or wat?

"hello #{ "foo #{ 42 ** "-#{ 1 + 1 + 1 }" } wat" } world"

"""hello #{ "foo #{ 42 ** "-#{
sum = 0
for x in [ 1 .. 10 ]
  console.log "oops ##{x}"
  sum += x
x
}" } wat" } world"""

var sum, x;

"hello " + ("foo " + (Math.pow(42, "-" + (1 + 1 + 1))) + " wat") + " world";

"hello " + ("foo " + (Math.pow(42, "-" + ((function() {
  var i;
  sum = 0;
  for (x = i = 1; i <= 10; x = ++i) {
    console.log("oops #" + x);
    sum += x;
  }
  return x;
})()))) + " wat") + " world";

GeoffreyBooth · 2017-04-13T16:20:17Z

@loveencounterflow One option is to use triple backticks and /* … */. Assuming you don’t have any /* … */ style comments or triple backticks within the block you’re trying to comment out, you could comment out huge swaths this way, including CoffeeScript block comments:

foo()
``` /*
###
  A CoffeeScript block comment
###
bar()
*/ ```
baz()

jashkenas · 2017-04-13T17:05:42Z

@loveencounterflow

Even with different delimiters for the start and end of block comments — your approach #2 still fails, as you note, because the end delimiter of the interior block comment closes out the outer comment prematurely.

Highlighting + the hot key to line-comment all of the lines, is the right way to do this.

loveencounterflow · 2017-04-13T19:28:13Z

@GeoffreyBooth neat, didn't think about that.

@jashkenas you're probably right and this is how I do it all the time.

* Fix #4248: Unicode code point escapes * rewrite unicode code point escapes as unicode escapes * smarter defaults * and resimplify * correct surrogate pairs * fixes from code review * handle adjacent code point escapes * smarter regex * fix from code review * refactor toJS() to shared test helper

helixbass added a commit to helixbass/copheescript that referenced this issue Apr 13, 2017

Fix jashkenas#4248: Unicode code point escapes

3170352

helixbass added a commit to helixbass/copheescript that referenced this issue Apr 13, 2017

Fix jashkenas#4248: Unicode code point escapes

5e16979

helixbass mentioned this issue Apr 13, 2017

Fix #4248: Unicode code point escapes #4498

Merged

lydell closed this as completed in #4498 Apr 20, 2017

lydell pushed a commit that referenced this issue Apr 20, 2017

Fix #4248: Unicode code point escapes (#4498)

96b6c5f

helixbass mentioned this issue Apr 20, 2017

[CS2] Unicode code point escapes on 2, fixes #4248 #4520

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode code point escapes #4248

Unicode code point escapes #4248

ukoloff commented Apr 11, 2016

gfung commented Jun 28, 2016 •

edited

Loading

lydell commented Jun 29, 2016

loveencounterflow commented Apr 11, 2017

GeoffreyBooth commented Apr 12, 2017

loveencounterflow commented Apr 12, 2017

loveencounterflow commented Apr 12, 2017 •

edited

Loading

jashkenas commented Apr 12, 2017

helixbass commented Apr 13, 2017

loveencounterflow commented Apr 13, 2017 •

edited

Loading

loveencounterflow commented Apr 13, 2017

GeoffreyBooth commented Apr 13, 2017 •

edited

Loading

jashkenas commented Apr 13, 2017

loveencounterflow commented Apr 13, 2017

Unicode code point escapes #4248

Unicode code point escapes #4248

Comments

ukoloff commented Apr 11, 2016

gfung commented Jun 28, 2016 • edited Loading

lydell commented Jun 29, 2016

loveencounterflow commented Apr 11, 2017

GeoffreyBooth commented Apr 12, 2017

loveencounterflow commented Apr 12, 2017

loveencounterflow commented Apr 12, 2017 • edited Loading

jashkenas commented Apr 12, 2017

helixbass commented Apr 13, 2017

loveencounterflow commented Apr 13, 2017 • edited Loading

loveencounterflow commented Apr 13, 2017

GeoffreyBooth commented Apr 13, 2017 • edited Loading

jashkenas commented Apr 13, 2017

loveencounterflow commented Apr 13, 2017

gfung commented Jun 28, 2016 •

edited

Loading

loveencounterflow commented Apr 12, 2017 •

edited

Loading

loveencounterflow commented Apr 13, 2017 •

edited

Loading

GeoffreyBooth commented Apr 13, 2017 •

edited

Loading