Feature request - Unicode UTF8 translation table #125

bobbimanners · 2024-12-18T14:31:52Z

I am enjoying using WRP to enable web browsing on my vintage machines. One feature I would like to see is some sort of table to all translation of commonly-encountered Unicode UTF8 byte sequences to ASCII equivalents.

For English language web readers, I note that a lot of newspapers and web pages use Unicode version of dash (em-dash, etc.), quotation marks, apostrophes etc. In my own software I have previously implemented a filter to convert a few of these common cases into ASCII equivalents.

We'll never get them all (and non-euro languages are obviously a hopeless case), but we could clean up English text very easily, and probably make French, Spanish, German etc., much easier to ready by simply omitting 'accents' / diacriticals. (Or to use German as an example, o-umlaut -> "oe").

You may well consider this out-of-scope.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request - Unicode UTF8 translation table #125

Feature request - Unicode UTF8 translation table #125

bobbimanners commented Dec 18, 2024

Feature request - Unicode UTF8 translation table #125

Feature request - Unicode UTF8 translation table #125

Comments

bobbimanners commented Dec 18, 2024