Skip to content

Commit

Permalink
CLDR-18065 site: typo fixes (#4164)
Browse files Browse the repository at this point in the history
  • Loading branch information
srl295 authored Oct 31, 2024
1 parent b5aee86 commit f1aa656
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ Issues with current EOR rules:
1. The ignoring rules for currency etc. should be filtered out in the CLDR context. ( Mark, John, Åke)
2. The rule for U+029F SMALL CAPITAL L is missing (typo in standard). ( Åke )
3. There are relevant comments by Kent Karlsson in ticket #[763](http://unicode.org/cldr/trac/ticket/763) (2010-10-27), with a modified proposal
1. --- \⃩(\⃩ = [U+20E9](http://unicode.org/cldr/utility/character.jsp?a=20E9) ( ⃩ ) COMBINING WIDE BRIDGE ABOVE) is the (currently) weightiest, at level 2, non-letter general purpose combining mark
2. --- \⃩ is used in the proposal to make all "variants" come after all single-accented versions of letters
1. --- ⃩(⃩ = [U+20E9](http://unicode.org/cldr/utility/character.jsp?a=20E9) ( ⃩ ) COMBINING WIDE BRIDGE ABOVE) is the (currently) weightiest, at level 2, non-letter general purpose combining mark
2. --- ⃩ is used in the proposal to make all "variants" come after all single-accented versions of letters
3. --- resetting to just A, B, etc. would make variant versions come before accented versions
4. ( Åke ) The current reset rules work fine with MimerSQL, but I think you must check the ICU behaviour. Kent might have a vital point here.
5. (Kent) (digraphs) ----tertiary difference in DUCET; keep it that way
Expand Down
14 changes: 8 additions & 6 deletions docs/site/index/cldr-spec/transliteration-guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Transliteration is the general process of converting characters from one script
Transliteration is *not* translation. Rather, transliteration is the conversion of letters from one script to another without translating the underlying words. The following shows a sample of transliteration systems:

Sample Transliteration Systems

| Source | Translation | Transliteration | System |
|:---:|:---:|:---:|:---:|
| Αλφαβητικός | Alphabetic | Alphabētikós | Classic |
Expand All @@ -32,6 +33,7 @@ While an English speaker may not recognize that the Japanese word kyanpasu is eq
- When a service engineer is sent a program dump that is filled with characters from foreign scripts, it is much easier to diagnose the problem when the text is transliterated and the service engineer can recognize the characters.

Sample Transliterations

| Source | Transliteration |
|---|---|
| 김, 국삼 | Gim, Gugsam |
Expand Down Expand Up @@ -322,7 +324,7 @@ If you are interested in providing transliterations for one or more scripts, fil

For submission to CLDR, the data needs to supplied in the correct XML format or in the ICU format, and should follow an accepted standard (like UNGEGN, BGN, or others).

- The format for rules is specified in [Transform\_Rules](http://www.unicode.org/reports/tr35/#Transform_Rules). It is best if the results are tested using the [ICU Transform Demo](https://icu4c-demos.unicode.org/icu-bin/translit) first, since if the data doesn't validate it would not be accepted into CLDR.
- The format for rules is specified in [Transform\_Rules](https://www.unicode.org/reports/tr35/#Transform_Rules). It is best if the results are tested using the [ICU Transform Demo](https://icu4c-demos.unicode.org/icu-bin/translit) first, since if the data doesn't validate it would not be accepted into CLDR.
- As mentioned above, even if a transliteration is only used in certain countries or contexts CLDR can provide for them with different variant tags.
- For comparison, you can see what is currently in CLDR in the [transforms]() folder online. For example, see [Hebrew\-Latin.xml]().
- Script transliterators should cover every character in the exemplar sets for the CLDR locales using that script.
Expand All @@ -331,10 +333,10 @@ For submission to CLDR, the data needs to supplied in the correct XML format or

| Shavian | Relation | Latin | Comments |
|:---:|:---:|:---:|---|
| \𐑐 || p | Map all uppercase to lowercase first |
| \𐑚 || b | |
| \𐑑 || t | |
| \𐑒\𐑕 || x | fallback |
| 𐑐 || p | Map all uppercase to lowercase first |
| 𐑚 || b | |
| 𐑑 || t | |
| 𐑒𐑕 || x | fallback |
| ... | | | |

## More Information
Expand All @@ -349,5 +351,5 @@ For more information, see:
- [ISO\-15915 (Gujarati)](http://transliteration.eki.ee/pdf/Gujarati.pdf)
- [ISO\-15915 (Kannada)](http://transliteration.eki.ee/pdf/Kannada.pdf)
- [ISCII\-91](http://www.cdacindia.com/html/gist/down/iscii_d.asp)
- [UTS \#35: Locale Data Markup Language (LDML)](http://www.unicode.org/reports/tr35/)
- [UTS \#35: Locale Data Markup Language (LDML)](https://www.unicode.org/reports/tr35/)

0 comments on commit f1aa656

Please sign in to comment.