From 752dc444f31a7f457820bd18103e6e5afd428ac1 Mon Sep 17 00:00:00 2001 From: Richard Gibson Date: Mon, 13 Jun 2022 13:01:11 -0400 Subject: [PATCH] Clarify the absolute nature of "any code point" (#282) Adds explicit mention of cases that are often overlooked. --- spec/message.ebnf | 2 +- spec/syntax.md | 11 ++++++++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/spec/message.ebnf b/spec/message.ebnf index d70b4caad..4e4e3857b 100644 --- a/spec/message.ebnf +++ b/spec/message.ebnf @@ -33,7 +33,7 @@ PlainEnd ::= PlainChar - WhiteSpace /* Text */ Text ::= (TextChar | TextEscape)+ TextChar ::= AnyChar - ('[' | ']' | '{' | '}' | Esc) -AnyChar ::= . +AnyChar ::= [#x0-#x10FFFF] /* Names */ Variable ::= '$' Name /* ws: explicit */ diff --git a/spec/syntax.md b/spec/syntax.md index c4bda5862..0706a1f53 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -406,7 +406,7 @@ and `\` (which starts an escape sequence). ```ebnf Text ::= (TextChar | TextEscape)+ /* ws: explicit */ TextChar ::= AnyChar - ('[' | ']' | '{' | '}' | Esc) -AnyChar ::= . +AnyChar ::= [#x0-#x10FFFF] ``` ### Names @@ -446,6 +446,15 @@ Any Unicode code point is allowed in literals, with the exception of its delimiters `(` and `)`, and `\` (which starts an escape sequence). +This includes line-breaking characters (such as U+000A LINE FEED and U+000D CARRIAGE RETURN), +other control characters (such as U+0000 NULL and U+0009 TAB), +permanently reserved noncharacters (U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10), +surrogate code points (U+D800 through U+DBFF), +private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD), +and unassigned code points. + +All code points of a literal are preserved. + ```ebnf Literal ::= '(' (LiteralChar | LiteralEscape)* ')' /* ws: explicit */ LiteralChar ::= AnyChar - ('(' | ')' | Esc)