-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About Full Unicode (Plane 1-16) support #2538
Comments
Hello,
(1) (2) (3) It is certainly doable but performance issues side-effects may hinded adoption. A compile-time toggle may facilitate it. |
For 1, Full unicode support is 21bit (0-10FFFF), but the plane3+ are uncommon, so 18bit would met most of the needs . https://en.wikipedia.org/wiki/Unicode plane1/2 are also rarely used, I suggest use an unsigned short map For 2, UTF-16 (Windows use) is also multibyte, the code points after Plane 0 (0-FFFF) can be encode to 4 unsigned short (wchar_t) , so I think we should write a function to convert UTF-8 to UTF-16. https://en.wikipedia.org/wiki/UTF-16 For 3, for the performance bottleneck, I think the hotspot would be |
Thanks for the details. For 2. Ok. Also, those functions being Windows-only we may use Win32 functionalities as well if we don't encounter other cases. For 3, One bottleneck for very large UI is I think a first way would to be allow |
Just saw your edit, yes, a binary search could make sense for planes over 0. I would suggest working on the basic case first! |
The |
That's correct, apologies! Only RenderText uses it, via e.g. |
I also suggest use a smaller cache (for example, a fixed 4096 slots hash table) and a sorted vector for It may be more cpu cache friendly than a large (64K) |
The current scheme is quasi ideal for Ascii/Latin based languages meaning anything else is likely to be a trade-off for those users. I'm however happy to have those other techniques and ideas explored, it is just outside of my reach at the moment. If you engage in making a PR I suggest making this in two steps, as supporting for the simple compile-time typedef would be easier to merge fast/first, then we can look into more advanced ideas. |
Ok, I will try to do it tomorrow, thanks. |
fix build for WideCharToMultiByte [3181ff1e] Full Unicode Support [6c9e73ac] Fix ImTextCountUtf8BytesFromChar and ImTextCharToUtf8, these APIs assume the input is an unicode code point, not UTF-16 [ba85665b] Add AddInputCharacterUTF16 for windows backend to handle WM_CHAR [fafdcaf0] Use Windows API to convert UTF-16 for ImFileOpen [dc7d5925] Use windows API to convert UTF-16 for clipboard
- Make ImWchar32 unsigned. - Fix Win32 version of ImFileOpen by including windows.h sooner. - Make ImGuiIO::AddInputCharacterUTF16() more robust by disallowing illegal surrogate pairs. - Allow pushing higher plane codepoints through ImGuiIO::AddInputCharacter(). - Minor cleaning up in the high-plane Unicode support. - Fix Clang -Wunreachable-code warning
This is now merged, thank you @cloudwu @samhocevar ! |
Thank you @ocornut :) |
…rnut#2538) fix build for WideCharToMultiByte [3181ff1e] Full Unicode Support [6c9e73ac] Fix ImTextCountUtf8BytesFromChar and ImTextCharToUtf8, these APIs assume the input is an unicode code point, not UTF-16 [ba85665b] Add AddInputCharacterUTF16 for windows backend to handle WM_CHAR [fafdcaf0] Use Windows API to convert UTF-16 for ImFileOpen [dc7d5925] Use windows API to convert UTF-16 for clipboard
…nut#2815) - Make ImWchar32 unsigned. - Fix Win32 version of ImFileOpen by including windows.h sooner. - Make ImGuiIO::AddInputCharacterUTF16() more robust by disallowing illegal surrogate pairs. - Allow pushing higher plane codepoints through ImGuiIO::AddInputCharacter(). - Minor cleaning up in the high-plane Unicode support. - Fix Clang -Wunreachable-code warning
ImWchar
is defined asunsigned short
(16bit int) now.https://github.com/ocornut/imgui/blob/master/imgui.h#L124
And
AddInputCharactersUTF8
/ImTextStrFromUtf8
/ImTextCountCharsFromUtf8
discard the code points in Plane 1-16 ,https://github.com/ocornut/imgui/blob/master/imgui.cpp#L1263
https://github.com/ocornut/imgui/blob/master/imgui.cpp#L1678-L1710
ImTextCharToUtf8
can not convert the code points great than 0x10000 , for example , U+20628 (𠘨).https://github.com/ocornut/imgui/blob/master/imgui.cpp#L1713
I think full unciode support is not complicated, all we need is define
ImWchar
asunsigned int
and removing the branch(c < 0x10000)
.Is there any other reason to use u16 internal rather than u32 ?
The text was updated successfully, but these errors were encountered: