This repository has been archived by the owner on Mar 20, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 17
/
spec.html
209 lines (184 loc) · 12.4 KB
/
spec.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
<!DOCTYPE html>
<meta charset="utf-8">
<pre class="metadata">
title: Unicode property escapes in regular expressions
status: proposal
stage: 4
location: https://tc39.github.io/proposal-regexp-unicode-property-escapes/
copyright: false
contributors: Mathias Bynens
</pre>
<script src="ecmarkup.js" defer></script>
<link rel="stylesheet" href="ecmarkup.css">
<style>
ins.block {
font-weight: bold;
font-style: italic;
}
hr {
height: 0.25em;
background: #ccc;
border: 0;
margin: 2em 0;
}
.unicode-property-table {
table-layout: fixed;
width: 100%;
font-size: 80%;
}
.unicode-property-table ul {
padding-left: 0;
list-style: none;
}
</style>
<p><ins class="block">The syntax listed in <a href="https://tc39.github.io/ecma262/#sec-patterns">21.2.1 Patterns</a> is modified as follows.</ins></p>
<emu-grammar>
ControlLetter :: one of
`a` `b` `c` `d` `e` `f` `g` `h` `i` `j` `k` `l` `m` `n` `o` `p` `q` `r` `s` `t` `u` `v` `w` `x` `y` `z`
`A` `B` `C` `D` `E` `F` `G` `H` `I` `J` `K` `L` `M` `N` `O` `P` `Q` `R` `S` `T` `U` `V` `W` `X` `Y` `Z`
CharacterClassEscape[U] ::
`d`
`D`
`s`
`S`
`w`
`W`
<ins>[+U] `p{` UnicodePropertyValueExpression `}`
[+U] `P{` UnicodePropertyValueExpression `}`</ins>
<ins>UnicodePropertyValueExpression ::
UnicodePropertyName `=` UnicodePropertyValue
LoneUnicodePropertyNameOrValue</ins>
<ins>UnicodePropertyNameCharacter ::
ControlLetter
`_`</ins>
<ins>UnicodePropertyNameCharacters ::
UnicodePropertyNameCharacter UnicodePropertyNameCharacters?</ins>
<ins>UnicodePropertyName ::
UnicodePropertyNameCharacters</ins>
<ins>UnicodePropertyValueCharacter ::
UnicodePropertyNameCharacter
`0`
`1`
`2`
`3`
`4`
`5`
`6`
`7`
`8`
`9`</ins>
<ins>UnicodePropertyValueCharacters ::
UnicodePropertyValueCharacter UnicodePropertyValueCharacters?</ins>
<ins>UnicodePropertyValue ::
UnicodePropertyValueCharacters</ins>
<ins>LoneUnicodePropertyNameOrValue ::
UnicodePropertyValueCharacters</ins>
</emu-grammar>
<emu-clause id="sec-static-semantics-sourcetext">
<h1>Static Semantics: SourceText</h1>
<emu-grammar>UnicodePropertyNameCharacters :: UnicodePropertyNameCharacter UnicodePropertyNameCharacters?</emu-grammar>
<emu-grammar>UnicodePropertyValueCharacters :: UnicodePropertyValueCharacter UnicodePropertyValueCharacters?</emu-grammar>
<emu-alg>
1. Return the List, in source text order, of Unicode code points in the source text matched by this production.
</emu-alg>
</emu-clause>
<hr>
<p><ins class="block">The following items are appended to <a href="https://tc39.github.io/ecma262/#sec-patterns-static-semantics-early-errors">21.2.1.1 Static Semantics: Early Errors</a>.</ins></p>
<emu-grammar>UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue</emu-grammar>
<ul>
<li>
It is a Syntax Error if the List of Unicode code points that is SourceText of <emu-nt>UnicodePropertyName</emu-nt> is not identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>.
</li>
<li>
It is a Syntax Error if the List of Unicode code points that is SourceText of <emu-nt>UnicodePropertyValue</emu-nt> is not identical to a List of Unicode code points that is a value or value alias for the Unicode property or property alias given by SourceText of <emu-nt>UnicodePropertyName</emu-nt> listed in the “Property value and aliases” column of the corresponding tables <emu-xref href="#table-unicode-general-category-values"></emu-xref> or <emu-xref href="#table-unicode-script-values"></emu-xref>.
</li>
</ul>
<emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar>
<ul>
<li>
It is a Syntax Error if the List of Unicode code points that is SourceText of <emu-nt>LoneUnicodePropertyNameOrValue</emu-nt> is not identical to a List of Unicode code points that is a Unicode general category or general category alias listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, nor a binary property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
</li>
</ul>
<hr>
<p><ins class="block">The following two abstract operations are appended to <a href="https://tc39.github.io/ecma262/#sec-atom">21.2.2.8 Atom</a>.</ins></p>
<emu-clause id="sec-runtime-semantics-unicodematchproperty-p" aoid="UnicodeMatchProperty">
<h1>Runtime Semantics: UnicodeMatchProperty ( _p_ )</h1>
<p>The algorithm uses values from the following tables, which associate supported Unicode property names and property aliases and their canonical property names.</p>
<p>Implementations must support the following non-binary Unicode properties and their property aliases:</p>
<emu-import href="table-nonbinary-unicode-properties.html"></emu-import>
<p>Additionally, implementations must support the following binary Unicode properties and their property aliases:</p>
<emu-import href="table-binary-unicode-properties.html"></emu-import>
<p>The abstract operation UnicodeMatchProperty takes a parameter _p_ that is a List of Unicode code points and performs the following steps:</p>
<emu-alg>
1. Assert: _p_ is a List of Unicode code points that is identical to a List of Unicode code points that is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref> or <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
1. Let _p_ be the canonical property name of _p_ as given in the “Canonical property name” column of the corresponding row.
1. Return the List of Unicode code points of _p_.
</emu-alg>
<p>To ensure interoperability, implementations must not extend Unicode property support to the remaining properties.</p>
<p>Implementations must only recognize the property aliases listed in <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref> and <emu-xref href="#table-binary-unicode-properties"></emu-xref>.</p>
<p>Implementations must only recognize the property value aliases and canonical property value names listed in <emu-xref href="#table-unicode-general-category-values"></emu-xref> and <emu-xref href="#table-unicode-script-values"></emu-xref>.</p>
<emu-note>
<p>For example, `Script_Extensions` (property name) and `scx` (property alias) are valid, but `script_extensions` or `Scx` aren’t.</p>
</emu-note>
<emu-note>
<p>The listed properties form a superset of what <a href="https://unicode.org/reports/tr18/#RL1.2">UTS18 RL1.2</a> requires.</p>
</emu-note>
</emu-clause>
<emu-clause id="sec-runtime-semantics-unicodematchpropertyvalue-p-v" aoid="UnicodeMatchPropertyValue">
<h1>Runtime Semantics: UnicodeMatchPropertyValue ( _p_, _v_ )</h1>
<p>The algorithm uses values from the following tables, which associate canonical Unicode property names and their supported values and value aliases:</p>
<emu-import href="table-unicode-general-category-values.html"></emu-import>
<emu-import href="table-unicode-script-values.html"></emu-import>
<p>The abstract operation UnicodeMatchPropertyValue takes two parameters _p_ and _v_, each of which is a List of Unicode code points, and performs the following steps:</p>
<emu-alg>
1. Assert: _p_ is a List of Unicode code points that is identical to a List of Unicode code points that is a canonical, unaliased Unicode property name listed in the “Canonical property name” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>.
1. Assert: _v_ is a List of Unicode code points that is identical to a List of Unicode code points that is a property value or property value alias for Unicode property _p_ listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref> or <emu-xref href="#table-unicode-script-values"></emu-xref>.
1. Let _value_ be the canonical property value of _v_ as given in the “Canonical property value” column of the corresponding row.
1. Return the List of Unicode code points of _value_.
</emu-alg>
<p>Only the canonical property values and property value aliases listed in <emu-xref href="#table-unicode-general-category-values"></emu-xref> and <emu-xref href="#table-unicode-script-values"></emu-xref> must be recognized.</p>
<emu-note>
<p>For example, `Xpeo` and `Old_Persian` are valid `Script_Extension` values, but `xpeo` and `Old Persian` aren’t.</p>
</emu-note>
<emu-note>
<p>This algorithm differs from <a href="https://unicode.org/reports/tr44/#Matching_Symbolic">the matching rules for symbolic values listed in UAX44</a>: case, <emu-xref href="#sec-white-space">white space</emu-xref>, U+002D (HYPHEN-MINUS), and U+005F (LOW LINE) are not ignored, and the `Is` prefix is not supported.</p>
</emu-note>
</emu-clause>
<hr>
<p><ins class="block">The following is appended to the list of productions in <a href="https://tc39.github.io/ecma262/#sec-characterclassescape">21.2.2.12 CharacterClassEscape</a>.</ins></p>
<p>The production <emu-grammar>CharacterClassEscape :: `\p{` UnicodePropertyValueExpression `}`</emu-grammar> evaluates by returning the CharSet containing all Unicode code points included in the CharSet returned by <emu-nt>UnicodePropertyValueExpression</emu-nt>.</p>
<p>The production <emu-grammar>CharacterClassEscape :: `\P{` UnicodePropertyValueExpression `}`</emu-grammar> evaluates by returning the CharSet containing all Unicode code points not included in the CharSet returned by <emu-nt>UnicodePropertyValueExpression</emu-nt>.</p>
<p>The production <emu-grammar>UnicodePropertyValueExpression :: UnicodePropertyName `=` UnicodePropertyValue</emu-grammar> evaluates as follows:</p>
<emu-alg>
1. Let _p_ be ! UnicodeMatchProperty(_UnicodePropertyName_).
1. Assert: _p_ is a Unicode property name or property alias listed in the “Property name and aliases” column of <emu-xref href="#table-nonbinary-unicode-properties"></emu-xref>.
1. Let _v_ be ! UnicodeMatchPropertyValue(_p_, _UnicodePropertyValue_).
1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value _v_.
</emu-alg>
<p>The production <emu-grammar>UnicodePropertyValueExpression :: LoneUnicodePropertyNameOrValue</emu-grammar> evaluates as follows:</p>
<emu-alg>
1. If ! UnicodeMatchPropertyValue(`"General_Category"`, _LoneUnicodePropertyNameOrValue_) is identical to a List of Unicode code points that is the name of a Unicode general category or general category alias listed in the “Property value and aliases” column of <emu-xref href="#table-unicode-general-category-values"></emu-xref>, then
1. Return the CharSet containing all Unicode code points whose character database definition includes the property `General_Category` with value _LoneUnicodePropertyNameOrValue_.
1. Let _p_ be ! UnicodeMatchProperty(_LoneUnicodePropertyNameOrValue_).
1. Assert: _p_ is a binary Unicode property or binary property alias listed in the “Property name and aliases” column of <emu-xref href="#table-binary-unicode-properties"></emu-xref>.
1. Return the CharSet containing all Unicode code points whose character database definition includes the property _p_ with value |True|.
</emu-alg>
<p><ins class="block">The following is appended to <a href="https://tc39.github.io/ecma262/#sec-bibliography">the bibliography</a>.</ins></p>
<!-- es6num="F" -->
<emu-annex id="sec-bibliography">
<h1>Bibliography</h1>
<ol>
<li>
<i>Unicode Standard Annex #18: Unicode Regular Expressions</i>, available at <<a href="https://unicode.org/reports/tr18/">https://unicode.org/reports/tr18/</a>>
</li>
<li>
<i>Unicode Standard Annex #24: Unicode `Script` Property</i>, available at <<a href="https://unicode.org/reports/tr24/">https://unicode.org/reports/tr24/</a>>
</li>
<li>
<i>Unicode Standard Annex #44: Unicode Character Database</i>, available at <<a href="https://unicode.org/reports/tr44/">https://unicode.org/reports/tr44/</a>>
</li>
<li>
<i>Unicode Technical Report #51: Unicode Emoji</i>, available at <<a href="https://unicode.org/reports/tr51/">https://unicode.org/reports/tr51/</a>>
</li>
</ol>
</emu-annex>