Skip to content

Commit

Permalink
Assign Sentence_Terminal and Terminal_Punctuation for 16.0 characters (
Browse files Browse the repository at this point in the history
…unicode-org#698)

* half a dozen dandas from 16.0 to Sentence_Terminal

* Regenerate UCD

* Another invariant

* sentence terminal punctuation is terminal punctuation

* Regenerate UCD

* Balinese inverted carik like their right-side-up brethren

* Regenerate UCD

* Make the new panti Sentence_Terminal like the existing two
  • Loading branch information
eggrobin authored May 6, 2024
1 parent a5c5716 commit 3b339fd
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 8 deletions.
16 changes: 11 additions & 5 deletions unicodetools/data/ucd/dev/PropList.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# PropList-16.0.0.txt
# Date: 2024-05-05, 01:21:00 GMT
# Date: 2024-05-06, 12:17:26 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -151,9 +151,10 @@ FF63 ; Quotation_Mark # Pe HALFWIDTH RIGHT CORNER BRACKET
1808..1809 ; Terminal_Punctuation # Po [2] MONGOLIAN MANCHU COMMA..MONGOLIAN MANCHU FULL STOP
1944..1945 ; Terminal_Punctuation # Po [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
1AA8..1AAB ; Terminal_Punctuation # Po [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU
1B4E..1B4F ; Terminal_Punctuation # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1B5A..1B5B ; Terminal_Punctuation # Po [2] BALINESE PANTI..BALINESE PAMADA
1B5D..1B5F ; Terminal_Punctuation # Po [3] BALINESE CARIK PAMUNGKAH..BALINESE CARIK PAREREN
1B7D..1B7E ; Terminal_Punctuation # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B7D..1B7F ; Terminal_Punctuation # Po [3] BALINESE PANTI LANTANG..BALINESE PANTI BAWAK
1C3B..1C3F ; Terminal_Punctuation # Po [5] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION TSHOOK
1C7E..1C7F ; Terminal_Punctuation # Po [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD
2024 ; Terminal_Punctuation # Po ONE DOT LEADER
Expand Down Expand Up @@ -204,6 +205,7 @@ FF64 ; Terminal_Punctuation # Po HALFWIDTH IDEOGRAPHIC COMMA
111DE..111DF ; Terminal_Punctuation # Po [2] SHARADA SECTION MARK-1..SHARADA SECTION MARK-2
11238..1123C ; Terminal_Punctuation # Po [5] KHOJKI DANDA..KHOJKI DOUBLE SECTION MARK
112A9 ; Terminal_Punctuation # Po MULTANI SECTION MARK
113D4..113D5 ; Terminal_Punctuation # Po [2] TULU-TIGALARI DANDA..TULU-TIGALARI DOUBLE DANDA
1144B..1144D ; Terminal_Punctuation # Po [3] NEWA DANDA..NEWA COMMA
1145A..1145B ; Terminal_Punctuation # Po [2] NEWA DOUBLE COMMA..NEWA PLACEHOLDER MARK
115C2..115C5 ; Terminal_Punctuation # Po [4] SIDDHAM DANDA..SIDDHAM SEPARATOR BAR
Expand All @@ -224,11 +226,12 @@ FF64 ; Terminal_Punctuation # Po HALFWIDTH IDEOGRAPHIC COMMA
16AF5 ; Terminal_Punctuation # Po BASSA VAH FULL STOP
16B37..16B39 ; Terminal_Punctuation # Po [3] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN CIM CHEEM
16B44 ; Terminal_Punctuation # Po PAHAWH HMONG SIGN XAUS
16D6E..16D6F ; Terminal_Punctuation # Po [2] KIRAT RAI DANDA..KIRAT RAI DOUBLE DANDA
16E97..16E98 ; Terminal_Punctuation # Po [2] MEDEFAIDRIN COMMA..MEDEFAIDRIN FULL STOP
1BC9F ; Terminal_Punctuation # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA87..1DA8A ; Terminal_Punctuation # Po [4] SIGNWRITING COMMA..SIGNWRITING COLON

# Total code points: 278
# Total code points: 285

# ================================================

Expand Down Expand Up @@ -1531,9 +1534,10 @@ FF65 ; Other_ID_Continue # Po HALFWIDTH KATAKANA MIDDLE DOT
1809 ; Sentence_Terminal # Po MONGOLIAN MANCHU FULL STOP
1944..1945 ; Sentence_Terminal # Po [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
1AA8..1AAB ; Sentence_Terminal # Po [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU
1B4E..1B4F ; Sentence_Terminal # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1B5A..1B5B ; Sentence_Terminal # Po [2] BALINESE PANTI..BALINESE PAMADA
1B5E..1B5F ; Sentence_Terminal # Po [2] BALINESE CARIK SIKI..BALINESE CARIK PAREREN
1B7D..1B7E ; Sentence_Terminal # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B7D..1B7F ; Sentence_Terminal # Po [3] BALINESE PANTI LANTANG..BALINESE PANTI BAWAK
1C3B..1C3C ; Sentence_Terminal # Po [2] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION NYET THYOOM TA-ROL
1C7E..1C7F ; Sentence_Terminal # Po [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD
2024 ; Sentence_Terminal # Po ONE DOT LEADER
Expand Down Expand Up @@ -1572,6 +1576,7 @@ FF61 ; Sentence_Terminal # Po HALFWIDTH IDEOGRAPHIC FULL STOP
11238..11239 ; Sentence_Terminal # Po [2] KHOJKI DANDA..KHOJKI DOUBLE DANDA
1123B..1123C ; Sentence_Terminal # Po [2] KHOJKI SECTION MARK..KHOJKI DOUBLE SECTION MARK
112A9 ; Sentence_Terminal # Po MULTANI SECTION MARK
113D4..113D5 ; Sentence_Terminal # Po [2] TULU-TIGALARI DANDA..TULU-TIGALARI DOUBLE DANDA
1144B..1144C ; Sentence_Terminal # Po [2] NEWA DANDA..NEWA DOUBLE DANDA
115C2..115C3 ; Sentence_Terminal # Po [2] SIDDHAM DANDA..SIDDHAM DOUBLE DANDA
115C9..115D7 ; Sentence_Terminal # Po [15] SIDDHAM END OF TEXT MARK..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
Expand All @@ -1588,11 +1593,12 @@ FF61 ; Sentence_Terminal # Po HALFWIDTH IDEOGRAPHIC FULL STOP
16AF5 ; Sentence_Terminal # Po BASSA VAH FULL STOP
16B37..16B38 ; Sentence_Terminal # Po [2] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS TSHAB CEEB
16B44 ; Sentence_Terminal # Po PAHAWH HMONG SIGN XAUS
16D6E..16D6F ; Sentence_Terminal # Po [2] KIRAT RAI DANDA..KIRAT RAI DOUBLE DANDA
16E98 ; Sentence_Terminal # Po MEDEFAIDRIN FULL STOP
1BC9F ; Sentence_Terminal # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA88 ; Sentence_Terminal # Po SIGNWRITING FULL STOP

# Total code points: 157
# Total code points: 164

# ================================================

Expand Down
Binary file not shown.
9 changes: 6 additions & 3 deletions unicodetools/data/ucd/dev/auxiliary/SentenceBreakProperty.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# SentenceBreakProperty-16.0.0.txt
# Date: 2024-04-30, 21:48:41 GMT
# Date: 2024-05-06, 12:18:03 GMT
# © 2024 Unicode®, Inc.
# Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries.
# For terms of use and license, see https://www.unicode.org/terms_of_use.html
Expand Down Expand Up @@ -2702,9 +2702,10 @@ FF0E ; ATerm # Po FULLWIDTH FULL STOP
1809 ; STerm # Po MONGOLIAN MANCHU FULL STOP
1944..1945 ; STerm # Po [2] LIMBU EXCLAMATION MARK..LIMBU QUESTION MARK
1AA8..1AAB ; STerm # Po [4] TAI THAM SIGN KAAN..TAI THAM SIGN SATKAANKUU
1B4E..1B4F ; STerm # Po [2] BALINESE INVERTED CARIK SIKI..BALINESE INVERTED CARIK PAREREN
1B5A..1B5B ; STerm # Po [2] BALINESE PANTI..BALINESE PAMADA
1B5E..1B5F ; STerm # Po [2] BALINESE CARIK SIKI..BALINESE CARIK PAREREN
1B7D..1B7E ; STerm # Po [2] BALINESE PANTI LANTANG..BALINESE PAMADA LANTANG
1B7D..1B7F ; STerm # Po [3] BALINESE PANTI LANTANG..BALINESE PANTI BAWAK
1C3B..1C3C ; STerm # Po [2] LEPCHA PUNCTUATION TA-ROL..LEPCHA PUNCTUATION NYET THYOOM TA-ROL
1C7E..1C7F ; STerm # Po [2] OL CHIKI PUNCTUATION MUCAAD..OL CHIKI PUNCTUATION DOUBLE MUCAAD
203C..203D ; STerm # Po [2] DOUBLE EXCLAMATION MARK..INTERROBANG
Expand Down Expand Up @@ -2740,6 +2741,7 @@ FF61 ; STerm # Po HALFWIDTH IDEOGRAPHIC FULL STOP
11238..11239 ; STerm # Po [2] KHOJKI DANDA..KHOJKI DOUBLE DANDA
1123B..1123C ; STerm # Po [2] KHOJKI SECTION MARK..KHOJKI DOUBLE SECTION MARK
112A9 ; STerm # Po MULTANI SECTION MARK
113D4..113D5 ; STerm # Po [2] TULU-TIGALARI DANDA..TULU-TIGALARI DOUBLE DANDA
1144B..1144C ; STerm # Po [2] NEWA DANDA..NEWA DOUBLE DANDA
115C2..115C3 ; STerm # Po [2] SIDDHAM DANDA..SIDDHAM DOUBLE DANDA
115C9..115D7 ; STerm # Po [15] SIDDHAM END OF TEXT MARK..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
Expand All @@ -2756,11 +2758,12 @@ FF61 ; STerm # Po HALFWIDTH IDEOGRAPHIC FULL STOP
16AF5 ; STerm # Po BASSA VAH FULL STOP
16B37..16B38 ; STerm # Po [2] PAHAWH HMONG SIGN VOS THOM..PAHAWH HMONG SIGN VOS TSHAB CEEB
16B44 ; STerm # Po PAHAWH HMONG SIGN XAUS
16D6E..16D6F ; STerm # Po [2] KIRAT RAI DANDA..KIRAT RAI DOUBLE DANDA
16E98 ; STerm # Po MEDEFAIDRIN FULL STOP
1BC9F ; STerm # Po DUPLOYAN PUNCTUATION CHINOOK FULL STOP
1DA88 ; STerm # Po SIGNWRITING FULL STOP

# Total code points: 153
# Total code points: 160

# ================================================

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -507,6 +507,9 @@ $combiningExclusions ⊇ [$firstNonStarter & \p{dt=canonical}]

\p{Terminal_Punctuation} ⊃ \p{STerm}

# We forgot about Sentence_Terminal while working on 16.0α; this would have caught us.
\p{Name=/DANDA/} ⊆ \p{Sentence_Terminal}

# Sentence terminal punctuation is exactly
# ambiguous sentence terminal punctuation and unambiguous sentence terminal punctuation.
\p{Sentence_Terminal} = [\p{sb=ATerm}\p{sb=STerm}]
Expand Down

0 comments on commit 3b339fd

Please sign in to comment.