-
Notifications
You must be signed in to change notification settings - Fork 31
Should I use the Arabic Presentation Forms provided in Unicode?
No. These forms were provided to provide round-trip conversions to legacy encodings, but you should ignore these and use the characters in the main Arabic block for your content.
Doing so means that your content can be understood better by applications, especially when it comes to searching and similar operations.
[I think there are actually one or two characters embedded in the presentation area that are for normal use, but i can't remember what they are at the moment. They include FD3E/F]
The Unicode Standard contains two blocks of presentation forms for Arabic, Arabic Presentation Forms-A, and Arabic Presentation Forms-B. The characters in these blocks are contextually-determined joining forms, ligatures, letter+diacritic combinations, etc., such as U+FB75 ARABIC LETTER DYEH MEDIAL FORM and U+FC76 ARABIC LIGATURE THEH WITH REH FINAL FORM.
If you want to prevent joining that would normally happen by default, for example in the Persian word یونیکُد, you should use U+200C ZERO WIDTH NON-JOINER immediately after 06CC ARABIC LETTER FARSI YEH. Likewise, if you need to display a joining form where there would not normally be one, such as in the Persian phrase ۱۳۹۵ ه.ش., you would use U+200D ZERO WIDTH JOINER (in this case, immediately after U+0647 ARABIC LETTER HEH).
If your font or application doesn't provide the ligatures you want, you should try to find another font or application, rather than use the presentation forms. This is because the presence of the presentation character corrupts the meaning of the text. If you try to search or sort data containing presentation characters you are unlikely to find what you are expecting. There are also potential problems for display and editing, because presentation forms may be displayed using a different fallback font, they don't decompose in a way users might expect, they may not produce expected different forms when a different font is applied, the ligatures are not flexible for justification, etc.