-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word segmenter with generic locale #136
Comments
What you need is line break. Some languages are using word break to determine line break but some are not. For example, Chinese and Japanese are not needed to line break at word boundary but English, Thai, Arabic and Khmer are. Notice for JavaScript Intl.Segmenter, there are no support for line break but only word break. |
Yes, but my understanding was that it would take years for anyone to agree on an API for line breaking. I assumed the compromise was that the word segmenting API was designed to be rich enough to allow programmers to construct a rudimentary line segmenter themselves. The no/generic locale thing is important to multilingual users who are forced to type Chinese/Arabic/etc. into English web apps. Or English expats typing English into local Japanese websites. etc. Not everything can be localized properly for all languages. |
No. That is exactly why I think it is a bad idea to not support Line Break. The compromise was Apple engineer believe if we do support line break, then there are no body will attempt to use ECMA to build a line break and use html and css to line break instead. They claim if we support line break, then people will NOT use CSS + html to do line break and not support line break and no one will use word break to do that. I am afraid if we do not support line break, then people will use word break incorrectly to implement line break. And your reasonable totally fulfill my prediction. Word Break is NOT a subset of Line Break. Line Break is neither a subset of Word Break. They follow two different systems. For SOME language, the line break may depend on word break but that is only on those languages. It will be nice if you can argue your use cases of WHY you need to break the line by yourself but not depend on CSS to line break for you instead. Apple engineer believe all who need line break COULD use CSS line break facility to do that and should not use JavaScript to perform that by themselves, in particular line break also need the information of glyph boundary, which is not accessible from JavaScript. |
That is not the reason- the reason is in order to decide line breaking, it require two thing together
|
Well, that’s a little silly then. There’s a lot of people trying to standardize custom font rendering that need this. HTML/CSS are too high level for modern HTML5 web apps. Anyone doing HTML5 canvas games, WebGL, WebVR, WebXR, svg charts and graphs, visualization, or anything graphical will have their own low-level text rendering. I, myself, have written my own vector graphics tool in HTML5, and I had to use low-level libraries Typr.js and Harfbuzz to do my text rendering. And I need a low-level internationalization library to do my word breaking and line breaking, but the full ICU with all its dictionaries and stuff is too big to include on a web page. Since all web browsers already include the ICU in some way, I always thought the whole purpose of this standardization effort was to provide an API to let programmers access it instead of forcing them to download it or to use some server solution.
Yeah, I assumed the hold-up was that you couldn’t agree on an API for this, not that you couldn’t agree on whether line breaking was actually needed. |
Actually, THAT IS the disagreement. I believe it is needed but there are OTHERS believe line break is not needed as a JavaScript API. It is very easy for me who implement V8 to add the line break support, but we have hard time to convince Apple to agree w/ us. They believe the problems should only be solved by html and css but not in the level of JavaScript and if JavaScript provide such support it will be misused and damage the web. I believe if we do not support line break then people who need it will misuse the word break to implement it any way and damage even worst. If you strongly believe adding line break support is essential, please file another bug here and request for adding line break granularity and put down the use case and motivation clearly. and I will try to reopen the issue in ECMA402 committee and TC39 to ask for reconsideration. |
Also, I would suggest you to file bug to ask for line break support in v8 (https://bugs.chromium.org/p/v8/issues/ - assign to me ( [email protected] ) Components: Internationalization , Mozilla and Microsoft Edge, JSC . If all browser vendors receive more feature requests and agree with you, it may pressure them to accept the feature in ECMA402. |
I don't think I have the permissions to assign bugs to users or components in v8, so if I were to create a v8 bug, it would probably get lost in triage. It might be better if you were to create it then. |
you can create one and send me the link. I will assign it to myself then.
…On Mon, 3 May 2021 at 13:44, Ming Iu ***@***.***> wrote:
I don't think I have the permissions to assign bugs to users or components
in v8, so if I were to create a v8 bug, it would probably get lost in
triage. It might be better if you were to create it then.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#136 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJ2N2KKJWE55O53WJVVVSQTTL4DMDANCNFSM43G633PA>
.
--
Frank Yung-Fong Tang
譚永鋒 / 🌭🍊
Sr. Software Engineer
|
I think I already ended up filing it incorrectly, but I’ll let you fix it up: |
I have a bunch of strings of text of unknown language/locale, and I want to find the word breaks so that I can lay them out in paragraphs in svg. Is there some way to create a word segmenter using some sort of generic or default locale? Would, say, the English segmenter still properly handle Chinese text? Or do I need some sort of language detection to figure out the locale of the segmenter I need to instantiate?
The text was updated successfully, but these errors were encountered: