Fix journals not rendering Unicode text correctly (fixes #1403) #1405

dgelessus · 2023-06-24T00:34:39Z

For the ST::string overloads of the text rendering methods, the firstClippedChar output parameter still counted in wchar_t units, not UTF-8 bytes as needed for indexing into ST::string.

For the ST::string overloads of the text rendering methods, the firstClippedChar output parameter still counted in wchar_t units, not UTF-8 bytes as needed for indexing into ST::string.

Hoikas · 2023-06-24T04:49:08Z

Sources/Plasma/PubUtilLib/plGImage/plDynamicTextMap.cpp

+        // Convert the firstClippedChar offset from wchar_t units to UTF-8 byte units.
+        // This is a bit inefficient, because it creates an actual UTF-8 string
+        // even though we just need the count, but string_theory has no better alternative.
+        *firstClippedChar = ST::string::from_wchar(wcharBuf.data(), firstClippedWchar).size();


Any thoughts about this, @zrax?

By the way, this conversion will probably go away again in the long term. I assume at some point we'll convert the rendering APIs to use ST::string all the way through, which makes it trivial to calculate firstClippedChar as an UTF-8 count.

string_theory does have a private API (_ST_PRIVATE::utf8_measure_from_utf16/32) that could do this, but I don't think we really want to rely on that. Another option, since the font rendering code currently (unfortunately) assumes that all renderable characters fit into a single wchar_t, is to write a simplified length calculation that just ignores the possibility of UTF-16 surrogates. It could look something like (completely untested)

static uint32_t WcharToUtf8Position(const wchar_t *text, uint32_t position) { const wchar_t *end = text + position; uint32_t u8pos = 0; while (text < end) { if (*text < 0x80) u8pos += 1; else if (*text < 0x800) u8pos += 2; else /* Assumes all input characters are 0x0000 - 0xffff */ u8pos += 3; ++text; } return u8pos; }

Then you could just call *firstClippedChar = WcharToUtf8Position(wcharBuf.data(), firstClippedWchar)

But yeah, at some point we'll need to make the whole font rendering code work better with ST types, and break some of its faulty assumptions around wchar_t

Yeah, I wanted to avoid writing a manual UTF-8 length calculation here - seems easy to get wrong in some way :/ I don't think it's a big enough performance/memory issue to be worth the effort, especially if we're going to throw it out again soon.

(Actually, the previous line wrapping code did an identical conversion elsewhere, which I was able to remove now, so this change shouldn't even make a difference at all.)

Fix journals not rendering Unicode text correctly (fixes H-uru#1403)

f1cfacd

For the ST::string overloads of the text rendering methods, the firstClippedChar output parameter still counted in wchar_t units, not UTF-8 bytes as needed for indexing into ST::string.

Hoikas linked an issue Jun 24, 2023 that may be closed by this pull request

Journal text rendering broken for non-English languages #1403

Closed

Hoikas reviewed Jun 24, 2023

View reviewed changes

dgelessus mentioned this pull request Jun 24, 2023

Prepare supporting additional languages #1397

Merged

Hoikas approved these changes Jun 25, 2023

View reviewed changes

dpogue approved these changes Jun 25, 2023

View reviewed changes

dpogue merged commit 3c3264e into H-uru:master Jun 25, 2023

dgelessus deleted the fix_journal_unicode_rendering branch November 29, 2023 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix journals not rendering Unicode text correctly (fixes #1403) #1405

Fix journals not rendering Unicode text correctly (fixes #1403) #1405

dgelessus commented Jun 24, 2023

Hoikas Jun 24, 2023

dgelessus Jun 24, 2023

zrax Jun 24, 2023

dgelessus Jun 24, 2023

Fix journals not rendering Unicode text correctly (fixes #1403) #1405

Fix journals not rendering Unicode text correctly (fixes #1403) #1405

Conversation

dgelessus commented Jun 24, 2023

Hoikas Jun 24, 2023

Choose a reason for hiding this comment

dgelessus Jun 24, 2023

Choose a reason for hiding this comment

zrax Jun 24, 2023

Choose a reason for hiding this comment

dgelessus Jun 24, 2023

Choose a reason for hiding this comment