Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Unicode UTF8 translation table #125

Open
bobbimanners opened this issue Dec 18, 2024 · 0 comments
Open

Feature request - Unicode UTF8 translation table #125

bobbimanners opened this issue Dec 18, 2024 · 0 comments

Comments

@bobbimanners
Copy link

I am enjoying using WRP to enable web browsing on my vintage machines. One feature I would like to see is some sort of table to all translation of commonly-encountered Unicode UTF8 byte sequences to ASCII equivalents.

For English language web readers, I note that a lot of newspapers and web pages use Unicode version of dash (em-dash, etc.), quotation marks, apostrophes etc. In my own software I have previously implemented a filter to convert a few of these common cases into ASCII equivalents.

We'll never get them all (and non-euro languages are obviously a hopeless case), but we could clean up English text very easily, and probably make French, Spanish, German etc., much easier to ready by simply omitting 'accents' / diacriticals. (Or to use German as an example, o-umlaut -> "oe").

You may well consider this out-of-scope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant