Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix surrogate pair handling on Windows #1165

Merged
merged 4 commits into from
Sep 16, 2019

Conversation

Osspial
Copy link
Contributor

@Osspial Osspial commented Sep 15, 2019

  • Tested on all platforms changed
  • Compilation warnings were addressed
  • cargo fmt has been run on this branch
  • Added an entry to CHANGELOG.md if knowledge of this change could be valuable to users
  • Updated documentation to reflect any user-facing changes, including notes of platform-specific behavior
  • Created or updated an example program if it would help users understand this functionality
  • Updated feature matrix, if new features were added or implemented

Should fix #1164 and alacritty/alacritty#2796.

@Osspial Osspial merged commit c03ef85 into rust-windowing:master Sep 16, 2019
@chris-morgan
Copy link

This patch mangles mismatched high/low surrogates. Admittedly this is a corner case and indicates buggy software somewhere anyway, so if there’d be a concrete performance hazard to doing it that way I could live without it, but I prefer the style of the patch I wrote for gVim: on a normal character, flush any pending high surrogate, and on a low surrogate with no pending high surrogate, send the low surrogate.

@Osspial
Copy link
Contributor Author

Osspial commented Sep 16, 2019

Ah yeah, that first case is a problem. I'll go ahead and address that. I've gotta ask, though - why would it be a good idea to send the mismatched low surrogate? There's nothing sensible you can do with that data, and the Unicode standard explicitly says not to encode surrogate pairs if you aren't UTF-16.

@chris-morgan
Copy link

Mismatched surrogates are exceptional, when you have a buggy IME. I don’t particularly expect to encounter them in the wild, but I’m confident that some users will encounter them at some point in time, and I prefer to pass through bad input rather than swallowing it or turning into even worse input. That way you can build tools like gVim atop it, and receive the mismatched surrogates and do something sane with them—or at least inspect what they are.

I have just realised, however, that char is a Unicode scalar rather than a Unicode code point, so we actually can’t represent mismatched surrogates in a char. So anything that transmuted a surrogate into a char was actually wrong already. Hmm, perhaps winit actually should just drop such input on the floor.

@Osspial
Copy link
Contributor Author

Osspial commented Sep 16, 2019

Actually, now that you bring that up, I have no idea why we're using a transmute there. The standard library provides perfectly good char conversion methods so we really should be using those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Cannot enter unicode characters like emojis on Windows
2 participants