You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the example, I'm using replacement characters for \a, \b, \v, \f because they don't make sense as a display character. The replacements may show up as question marks, depending on font support.
And that's the point really. Using these control characters (and there are more of them) in literals is pretty useless and unreadable. We have escape sequences and those should be used. Even if they're not as short as the ones described here and require \u escapes.
For string literals, some might still be useful, like \n for multiline literals. IMO they could be entirely replaced by heredocs, but apparently, they're quite commonly used 🤷
Even the formatter does not transform literal control characters to escape sequences.
I think that should be the first step: Have the formatter replace control characters (or maybe other non-printables as well) in literals by their respective escape sequences. At some point later, we can have the parser reject literal control characters.
For Char literals, I think all control characters should be transformed and eventually become invalid. Probably also some or even all non-printables (i.e. white space characters).
For String literals, there could be a bigger margin. \n, \r\n and \t are probably fine. I'm not entirely convinced about the other control or non-printable characters. But having the formatter transform is probably a good idea for most.
The text was updated successfully, but these errors were encountered:
The original implementation of #11520 showed problems with grapheme clusters after being merged and was subsequently reverted (#11603).
Grapheme clusters are glued together by non-printable characters like zero-width-joiner (U+200D). Escaping them would tear the graphemes apart. The semantics stay the same, but the source code would no longer show a single grapheme. Instead, you get a deconstructed series of characters, with non-printables escaped.
-"🏳️🌈"+🏳️\u200D🌈- "ܐܒܓܕ "+"ܐ\u070Fܒܓܕ"
That's obviously not good. When used as such in graphemes, non-printable characters can actually have a visible effect and escaping them would destroy that. We shouldn't do that.
I think this is relatively easy to improve by not escaping non-printables when they are part of a multi-codepoint grapheme clusters. The new String::Grapheme API allows to do that. As a result, only single non-printables would be escaped.
And probably also graphemes such consiting of only non-printables, such as \r\n which has no visual difference from \n.
There is a slight complication with a text renderer's grapheme support. A text renderer that does not recognize a grapheme cluster just shows the individual characters. The first example would be indistinguishable from a string without the zero-width joiner between the emojis ("🏳️🌈"). This can't be fixed by this change or any other way, really. But that's okay and it's the text renderer's fault. However, there's little practical relevance because modern text renderers should have decent support for most use cases of graphemes. Escaping non-printables outside of grapheme clusters is definitely an improvement.
The following
Char
literals are currently valid:In the example, I'm using replacement characters for
\a
,\b
,\v
,\f
because they don't make sense as a display character. The replacements may show up as question marks, depending on font support.And that's the point really. Using these control characters (and there are more of them) in literals is pretty useless and unreadable. We have escape sequences and those should be used. Even if they're not as short as the ones described here and require
\u
escapes.For string literals, some might still be useful, like
\n
for multiline literals. IMO they could be entirely replaced by heredocs, but apparently, they're quite commonly used 🤷Even the formatter does not transform literal control characters to escape sequences.
I think that should be the first step: Have the formatter replace control characters (or maybe other non-printables as well) in literals by their respective escape sequences. At some point later, we can have the parser reject literal control characters.
For Char literals, I think all control characters should be transformed and eventually become invalid. Probably also some or even all non-printables (i.e. white space characters).
For String literals, there could be a bigger margin.
\n
,\r\n
and\t
are probably fine. I'm not entirely convinced about the other control or non-printable characters. But having the formatter transform is probably a good idea for most.The text was updated successfully, but these errors were encountered: