-
Notifications
You must be signed in to change notification settings - Fork 25
Update the ISO-8859 encodings to match the current standards #92
Conversation
// ignore: missing_whitespace_between_adjacent_strings | ||
const _ascii = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e' | ||
'\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20' | ||
r"""!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcd""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use '''
instead of """
, to (better) match the other '
-quoted strings.
|
||
- Require Dart 3.0 | ||
- Add chunked decoding support (`startChunkedConversion`) for `CodePage` | ||
encodings. | ||
- Update the ISO-8859 mappings to the latest version published by the Unicode | ||
consortium. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not just "new", update to actually use the official Unicode mappings, which we didn't before.
(Mainly because I was not aware they existed, and based these tables directly on the ISO standards.)
So, suggested phrasing
- Update the ISO-8859 mappings to use the Unicode
consortium's recommended mappings between Unicode
text and one-byte encodings.
@@ -146,17 +147,21 @@ const _top8859_16 = '\xa0ĄąŁ€„Š§š©Ș«Ź\xadźŻ°±ČłŽ”¶·žč | |||
'ÀÁÂĂÄĆÆÇÈÉÊËÌÍÎÏĐŃÒÓÔŐÖŚŰÙÚÛÜĘȚß' | |||
'àáâăäćæçèéêëìíîïđńòóôőöśűùúûüęțÿ'; | |||
|
|||
const _top8859Controls = '\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c' | |||
'\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be room for 16 hex-escapes per line (64 chars + start/end quote and indent, easily within 70 chars),
so split in multiples of 16, for ease of reading:
const _top8859controls =
'\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f'
'\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f'
'\xa0';
CodePage._bmp('latin-3', '$_ascii$_noControls$_top8859_3'); | ||
/// | ||
/// See https://unicode.org/Public/MAPPINGS/ISO8859/8859-3.TXT | ||
final CodePage latin3 = CodePage._bmp('latin-3', '$_ascii$_top8859_3'); | ||
|
||
/// The ISO-8859-4/Latin-4 (North European) code page. | ||
final CodePage latin4 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do the rest too.
(I'd write a script to fetch and parse the Unicode tables, rather than doing it manually.)
... OK, so I did that.
#93
Closing as the dart-lang/convert repository is merged into the dart-lang/core monorepo. Please re-open this PR there! |
I generated the mappings using this script:
Contribution guidelines:
dart format
.Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.