diff --git a/spec/appendices.md b/spec/appendices.md
index e94544596..b65036c6c 100644
--- a/spec/appendices.md
+++ b/spec/appendices.md
@@ -14,12 +14,10 @@ host environments, their serializations and resource formats,
that might be sufficient to prevent most problems.
However, MessageFormat itself does not supply such a restriction.
-MessageFormat _messages_ permit nearly all Unicode code points,
-with the exception of surrogates,
+MessageFormat _messages_ permit nearly all Unicode code points
to appear in _literals_, including the text portions of a _pattern_.
This means that it can be possible for a _message_ to contain invisible characters
-(such as bidirectional controls,
-ASCII control characters in the range U+0000 to U+001F,
+(such as bidirectional controls, ASCII control characters in the range U+0000 to U+001F,
or characters that might be interpreted as escapes or syntax in the host format)
that abnormally affect the display of the _message_
when viewed as source code, or in resource formats or translation tools,
diff --git a/spec/message.abnf b/spec/message.abnf
index 8ab7b5b23..a9293040c 100644
--- a/spec/message.abnf
+++ b/spec/message.abnf
@@ -76,8 +76,7 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
/ %x41-5B ; omit \ (%x5C)
/ %x5D-7A ; omit { | } (%x7B-7D)
/ %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000)
- / %x3001-D7FF ; omit surrogates
- / %xE000-10FFFF
+ / %x3001-10FFFF ; allowing surrogates is intentional
; Character escapes
escaped-char = backslash ( backslash / "{" / "|" / "}" )
diff --git a/spec/syntax.md b/spec/syntax.md
index a31c3f921..38725a053 100644
--- a/spec/syntax.md
+++ b/spec/syntax.md
@@ -60,7 +60,8 @@ The syntax specification takes into account the following design restrictions:
control characters such as U+0000 NULL and U+0009 TAB, permanently reserved noncharacters
(U+FDD0 through U+FDEF and U+nFFFE and U+nFFFF where n is 0x0 through 0x10),
private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and
- U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content.
+ U+100000 through U+10FFFD), unassigned code points, unpaired surrogates (U+D800 through U+DFFF),
+ and other potentially confusing content.
## Messages and their Syntax
@@ -113,6 +114,22 @@ A **_local variable_** is a _variable_ created as the result of a _lo
> In particular, it avoids using quote characters common to many file formats and formal languages
> so that these do not need to be escaped in the body of a _message_.
+> [!NOTE]
+> _Text_ and _quoted literals_ allow unpaired surrogate code points
+> (`U+D800` to `U+DFFF`).
+> This is for compatibility with formats or data structures
+> that use the UTF-16 encoding
+> and do not check for unpaired surrogates.
+> (Strings in Java or JavaScript are examples of this.)
+> These code points SHOULD NOT be used in a _message_.
+> Unpaired surrogate code points are likely an indication of mistakes
+> or errors in the creation, serialization, or processing of the _message_.
+> Many processes will convert them to
+> � U+FFFD REPLACEMENT CHARACTER
+> during processing or display.
+> Implementations not based on UTF-16 might not be able to represent
+> a _message_ containing such code points.
+
> [!NOTE]
> In general (and except where required by the syntax), whitespace carries no meaning in the structure
> of a _message_. While many of the examples in this spec are written on multiple lines, the formatting
@@ -274,8 +291,8 @@ A _quoted pattern_ MAY be empty.
### Text
**_text_** is the translateable content of a _pattern_.
-Any Unicode code point is allowed, except for U+0000 NULL
-and the surrogate code points U+D800 through U+DFFF inclusive.
+Any Unicode code point is allowed, except for U+0000 NULL.
+
The characters U+005C REVERSE SOLIDUS `\`,
U+007B LEFT CURLY BRACKET `{`, and U+007D RIGHT CURLY BRACKET `}`
MUST be escaped as `\\`, `\{`, and `\}` respectively.
@@ -301,10 +318,14 @@ content-char = %x01-08 ; omit NULL (%x00), HTAB (%x09) and LF (%x0A)
/ %x41-5B ; omit \ (%x5C)
/ %x5D-7A ; omit { | } (%x7B-7D)
/ %x7E-2FFF ; omit IDEOGRAPHIC SPACE (%x3000)
- / %x3001-D7FF ; omit surrogates
- / %xE000-10FFFF
+ / %x3001-10FFFF ; allowing surrogates is intentional
```
+> [!NOTE]
+> Unpaired surrogate code points (`U+D800` through `U+DFFF` inclusive)
+> are allowed for compatibility with UTF-16 based implementations
+> that do not check for this encoding error.
+
When a _pattern_ is quoted by embedding the _pattern_ in curly brackets, the
resulting _message_ can be embedded into
various formats regardless of the container's whitespace trimming rules.
@@ -691,8 +712,7 @@ A _literal_ can appear
as a _key_ value,
as the _operand_ of a _literal-expression_,
or in the value of an _option_.
-A _literal_ MAY include any Unicode code point
-except for U+0000 NULL or the surrogate code points U+D800 through U+DFFF.
+A _literal_ MAY include any Unicode code point except for U+0000 NULL.
All code points are preserved.
@@ -714,6 +734,11 @@ A **_quoted literal_** begins and ends with U+005E VERTICAL BAR `|`.
The characters `\` and `|` within a _quoted literal_ MUST be
escaped as `\\` and `\|`.
+> [!NOTE]
+> Unpaired surrogate code points (`U+D800` through `U+DFFF` inclusive)
+> are allowed in _quoted literals_ for compatibility with UTF-16 based
+> implementations that do not check for this encoding error.
+
An **_unquoted literal_** is a _literal_ that does not require the `|`
quotes around it to be distinct from the rest of the _message_ syntax.
An _unquoted literal_ MAY be used when the content of the _literal_