Bad letter boundary detection for complex scrips #2115

santhoshtr · 2014-01-07T05:20:28Z

Paste the following text to brackets, and see where the cursor is placed

സന്തോഷ്

Cursor is supposed to place at end of the word, but in brackets it is after 4 or 5 character width.

Happens with all non-latin complex scripts

Works fine in Firefox, but issue exist in chrome.

marijnh · 2014-01-07T11:18:05Z

This is a case of CodeMirror's simplistic grapheme cluster algorithm not handling the language. Unfortunately, JavaScript does not provide the primitives needed to do sane cluster-boundary detection (finding character properties, etc).

Happens with all non-latin complex scripts

Not all. Some, like Arabic, should work.

santhoshtr · 2014-01-07T11:43:52Z

This is a case of CodeMirror's simplistic grapheme cluster algorithm not handling the language. Unfortunately, JavaScript does not provide the primitives needed to do sane cluster-boundary detection (finding character properties, etc).

I would like to understand it a bit more. What exact algorithm you need to place the cursor at a logically correct position? If we want to support a lot of languages, we should leave this kind of primitive functionality to browsers. Trying to imitate such behavior will reach no where.

Also. how Chrome gives different output than Firefox in this case?

marijnh · 2014-01-07T11:56:08Z

To know how to move the cursor through a text, and which ranges of codepoints to use when measuring character positions, CodeMirror needs to know where clusters start and end.

The browser knows this, but doesn't expose this information to JavaScript. Telling me that what I'm doing "will reach no where" without actually understanding the problem isn't really the right tone to take here.

santhoshtr · 2014-01-07T12:50:39Z

I have faced the cursor movement, logical cluster issues in the development of Visual Editor for Wikimedia. Thought of understanding the problem in detail so that I might be able to help. Will check later, don't have time to find out the details now. Thanks.

Issue #2125 Issue #2115

marijnh · 2014-01-09T22:54:14Z

Attached patch fixes some known problems with handling of extending code points, and appears to help with #2125 (Hindi), but does not fix your example.

I will need some input from someone who is familiar with this language's Unicode encoding, because the behavior of this string baffles me. Characters "ന്തോ" act as a single unit, as far as cursor movement is concerned, but only the second code point in that string is an extending character. If I read the document at http://www.unicode.org/reports/tr29/ correctly, this should count as three grapheme clusters, not one. What is going on?

peterflynn · 2014-01-09T23:05:08Z

CC'ing @pauldhunt and @miguelsousa, who have worked on some of Adobe's open-source typography efforts -- just in case they have any quick insights to share :-)

Jaygiri · 2014-01-09T23:57:54Z

I have removed my previous comment.

This language is Malayalam. Fix for #2125 is not fixing positioning for this language.

santhoshtr · 2014-01-10T11:01:48Z

Characters "ന്തോ" act as a single unit, as far as cursor movement is concerned, but only the second code point in that string is an extending character. If I read the document at http://www.unicode.org/reports/tr29/ correctly, this should count as three grapheme clusters, not one. What is going on?

You cannot rely on TR29 for getting grapheme clusters for the purpose of the counting or cursor movement. TR29 clearly explains this. You have to use tailored logic to meet your purpose. That too is not enough since in Indic scripts, depdending on the font, multiple consonants with the help of a joining character like VIRAMA can create single ligatures. Sometime stacking of characters happens. Chrome and FF does not agree on the implementation of character movement on Indic scripts. Chrome allows you to move your cursor as per logical boundaries. FF also follow the same rule, but FF allows placing cursor if you try to do it using a program. You have to ask the browser whether you can place a cursor here or not. Iterating that question over a range of text will give you a reliable cursor placing positions. This can be used for creating a stack of edits useful for undo redo etc.

marijnh · 2014-01-10T11:48:29Z

By 'ask the browser' you mean create a textarea and try to set the cursor in the textarea there? Or is there a more efficient/convenient way to do it on (non-editable) DOM nodes?

Is there an easy/cheap way to determine whether a string might have stacking?

santhoshtr · 2014-01-12T05:49:41Z

By 'ask the browser' you mean create a textarea and try to set the cursor in the textarea there? Or is there a more efficient/convenient way to do it on (non-editable) DOM nodes?

Yes, create an editable node and keep on trying to place cursor. Of course it is inefficient and hacky.

Is there an easy/cheap way to determine whether a string might have stacking?

No, that is not possible. It not only depends on the data but also the font used.

marijnh · 2014-01-16T12:39:47Z

Is there an easy/cheap way to determine whether a string might have stacking?

No, that is not possible. It not only depends on the data but also the font used.

Well, I meant a way to weed out strings that obviously don't need the expensive treatment, and simply have a cursor position between every code point. /[^\x00-\x7f]/ would work to spot ascii strings, but maybe we can do better, and enumerate the ranges of the languages in this occurs (by using broad ranges to keep the string size under control, false positives aren't bad).

marijnh · 2014-01-27T16:52:24Z

@santhoshtr

Yes, create an editable node and keep on trying to place cursor. Of course it is inefficient and hacky.

On Firefox, it seems that selectionEnd can be set to any value, even one that's not a valid cursor position. Do you have any example of this technique actually being applied?

marijnh · 2014-01-27T16:57:13Z

(That is, I'm using a textarea now, because there i can play with selectionEnd without actually breaking the existing selection in the document. Using getSelection().addRange() is just too horribly disruptive—will cause tons of side effects on mobile, and also cause spurious deselects/reselects on desktop.)

Issue codemirror#2125 Issue codemirror#2115

ghost · 2014-02-23T17:17:42Z

@marijnh Arabic doesn't work correctly same as Thai.

peterkroon · 2014-03-04T13:08:01Z

@marijnh
#2115 (comment)

The browser knows this, but doesn't expose this information to JavaScript.

Have you considered filing a bug for this at https://bugzilla.mozilla.org/ or https://code.google.com/p/chromium/

alicoding · 2014-04-26T23:33:01Z

Wondering if there is any update or workaround to this bug yet?

marijnh · 2014-04-28T08:56:38Z

Nope, I still haven't found a hack that works halfway acceptably.

niftylettuce · 2016-01-14T06:17:32Z

I still have same issue, if you set a custom font, like Inconsolata, the line height or cursor positioning is way off (until you start to make some interaction/typing/clicking in the textarea rendered into .CodeMirror class.

niftylettuce · 2016-01-14T06:24:24Z

sadig41 · 2018-01-09T08:07:55Z

Can't make RTL for arabic?

adrianheine · 2018-09-06T06:47:24Z

This is a issue that's difficult if not impossible to solve with the fundamental approach currently taken by CodeMirror.

We are working on a rewrite (CodeMirror 6) that might address this issue, and we are currently raising money for this work: See the announcement for more information about the rewrite and a demo.

Note that CodeMirror 6 is by no means stable or usable in production, yet. It's highly unlikely that we pick up this issue for CodeMirror 5, though.

HTGAzureX1212 · 2020-10-30T06:33:14Z

Same issue here, the cursor seem to be completely mispositioned... I have used codeMirror.getDoc().setValue() though.

Windows 10 1909
Chrome 86.0.4240.111

santhoshtr mentioned this issue Jan 7, 2014

Wrong cursor positioning with complex scrips adobe/brackets#6301

Open

marijnh added a commit that referenced this issue Jan 9, 2014

Fix measuring of extending characters, include all extending code points

eacf20f

Issue #2125 Issue #2115

Jaygiri mentioned this issue Jan 9, 2014

Incorrect cursor positioning for Hindi #2125

Closed

marijnh mentioned this issue Jan 27, 2014

Cursor mispositioning when using a custom font with unicode text #1813

Closed

anaran pushed a commit to anaran/CodeMirror that referenced this issue Feb 22, 2014

Fix measuring of extending characters, include all extending code points

f2afabe

Issue codemirror#2125 Issue codemirror#2115

ErisDS mentioned this issue Feb 23, 2014

Wrong cursor position in Markdown editor for Unicode TryGhost/Ghost#2245

Closed

marijnh mentioned this issue Mar 23, 2016

CodeMirror's default CSS should disable font ligatures #3899

Closed

adrianheine added this to the Fixed in rewrite (hopefully) milestone Sep 6, 2018

bmurr mentioned this issue Sep 10, 2020

Backspace behaviour on characters created using combining diacritical marks differs to that of a normal browser textarea. #6408

Closed

core-ai-bot mentioned this issue Aug 31, 2021

Wrong cursor positioning with complex scrips brackets-archive/bracketsIssues#12472

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad letter boundary detection for complex scrips #2115

Bad letter boundary detection for complex scrips #2115

santhoshtr commented Jan 7, 2014

marijnh commented Jan 7, 2014

santhoshtr commented Jan 7, 2014

marijnh commented Jan 7, 2014

santhoshtr commented Jan 7, 2014

marijnh commented Jan 9, 2014

peterflynn commented Jan 9, 2014

Jaygiri commented Jan 9, 2014

santhoshtr commented Jan 10, 2014

marijnh commented Jan 10, 2014

santhoshtr commented Jan 12, 2014

marijnh commented Jan 16, 2014

marijnh commented Jan 27, 2014

marijnh commented Jan 27, 2014

ghost commented Feb 23, 2014

peterkroon commented Mar 4, 2014

alicoding commented Apr 26, 2014

marijnh commented Apr 28, 2014

niftylettuce commented Jan 14, 2016

niftylettuce commented Jan 14, 2016

sadig41 commented Jan 9, 2018

adrianheine commented Sep 6, 2018

HTGAzureX1212 commented Oct 30, 2020

Bad letter boundary detection for complex scrips #2115

Bad letter boundary detection for complex scrips #2115

Comments

santhoshtr commented Jan 7, 2014

marijnh commented Jan 7, 2014

santhoshtr commented Jan 7, 2014

marijnh commented Jan 7, 2014

santhoshtr commented Jan 7, 2014

marijnh commented Jan 9, 2014

peterflynn commented Jan 9, 2014

Jaygiri commented Jan 9, 2014

santhoshtr commented Jan 10, 2014

marijnh commented Jan 10, 2014

santhoshtr commented Jan 12, 2014

marijnh commented Jan 16, 2014

marijnh commented Jan 27, 2014

marijnh commented Jan 27, 2014

ghost commented Feb 23, 2014

peterkroon commented Mar 4, 2014

alicoding commented Apr 26, 2014

marijnh commented Apr 28, 2014

niftylettuce commented Jan 14, 2016

niftylettuce commented Jan 14, 2016

sadig41 commented Jan 9, 2018

adrianheine commented Sep 6, 2018

HTGAzureX1212 commented Oct 30, 2020