-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad letter boundary detection for complex scrips #2115
Comments
This is a case of CodeMirror's simplistic grapheme cluster algorithm not handling the language. Unfortunately, JavaScript does not provide the primitives needed to do sane cluster-boundary detection (finding character properties, etc).
Not all. Some, like Arabic, should work. |
I would like to understand it a bit more. What exact algorithm you need to place the cursor at a logically correct position? If we want to support a lot of languages, we should leave this kind of primitive functionality to browsers. Trying to imitate such behavior will reach no where. Also. how Chrome gives different output than Firefox in this case? |
To know how to move the cursor through a text, and which ranges of codepoints to use when measuring character positions, CodeMirror needs to know where clusters start and end. The browser knows this, but doesn't expose this information to JavaScript. Telling me that what I'm doing "will reach no where" without actually understanding the problem isn't really the right tone to take here. |
I have faced the cursor movement, logical cluster issues in the development of Visual Editor for Wikimedia. Thought of understanding the problem in detail so that I might be able to help. Will check later, don't have time to find out the details now. Thanks. |
Attached patch fixes some known problems with handling of extending code points, and appears to help with #2125 (Hindi), but does not fix your example. I will need some input from someone who is familiar with this language's Unicode encoding, because the behavior of this string baffles me. Characters "ന്തോ" act as a single unit, as far as cursor movement is concerned, but only the second code point in that string is an extending character. If I read the document at http://www.unicode.org/reports/tr29/ correctly, this should count as three grapheme clusters, not one. What is going on? |
CC'ing @pauldhunt and @miguelsousa, who have worked on some of Adobe's open-source typography efforts -- just in case they have any quick insights to share :-) |
I have removed my previous comment. This language is Malayalam. Fix for #2125 is not fixing positioning for this language. |
You cannot rely on TR29 for getting grapheme clusters for the purpose of the counting or cursor movement. TR29 clearly explains this. You have to use tailored logic to meet your purpose. That too is not enough since in Indic scripts, depdending on the font, multiple consonants with the help of a joining character like VIRAMA can create single ligatures. Sometime stacking of characters happens. Chrome and FF does not agree on the implementation of character movement on Indic scripts. Chrome allows you to move your cursor as per logical boundaries. FF also follow the same rule, but FF allows placing cursor if you try to do it using a program. You have to ask the browser whether you can place a cursor here or not. Iterating that question over a range of text will give you a reliable cursor placing positions. This can be used for creating a stack of edits useful for undo redo etc. |
By 'ask the browser' you mean create a textarea and try to set the cursor in the textarea there? Or is there a more efficient/convenient way to do it on (non-editable) DOM nodes? Is there an easy/cheap way to determine whether a string might have stacking? |
Yes, create an editable node and keep on trying to place cursor. Of course it is inefficient and hacky.
No, that is not possible. It not only depends on the data but also the font used. |
Well, I meant a way to weed out strings that obviously don't need the expensive treatment, and simply have a cursor position between every code point. |
On Firefox, it seems that |
(That is, I'm using a textarea now, because there i can play with |
@marijnh Arabic doesn't work correctly same as Thai. |
Have you considered filing a bug for this at https://bugzilla.mozilla.org/ or https://code.google.com/p/chromium/ |
Wondering if there is any update or workaround to this bug yet? |
Nope, I still haven't found a hack that works halfway acceptably. |
I still have same issue, if you set a custom font, like Inconsolata, the line height or cursor positioning is way off (until you start to make some interaction/typing/clicking in the textarea rendered into .CodeMirror class. |
Can't make RTL for arabic? |
This is a issue that's difficult if not impossible to solve with the fundamental approach currently taken by CodeMirror. We are working on a rewrite (CodeMirror 6) that might address this issue, and we are currently raising money for this work: See the announcement for more information about the rewrite and a demo. Note that CodeMirror 6 is by no means stable or usable in production, yet. It's highly unlikely that we pick up this issue for CodeMirror 5, though. |
Paste the following text to brackets, and see where the cursor is placed
സന്തോഷ്
Cursor is supposed to place at end of the word, but in brackets it is after 4 or 5 character width.
Happens with all non-latin complex scripts
Works fine in Firefox, but issue exist in chrome.
(duplicated from adobe/brackets#6301)
The text was updated successfully, but these errors were encountered: