-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve utf8_to_utf16 speed for common path #892
Merged
Merged
Changes from 2 commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transformation doesn't work on platforms where char is unsigned -- e.g., most non-Windows platforms. Clearly this is missing some test coverage if that was missed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I missed that. I fixed the code, still thinking about testability. This would probably require a separate test binary with different char defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests might be avoiding this with the
__GLIBCXX__
bits? (GCC defaults to char being unsigned)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__GLIBCXX__
seems only to be used as a means to determine whether the test code should compare byte-wise (using unsigned char) or via the results of conversion derived from codecvt_utf8_utf16. Where I'm not sure about the reason for this, especially since codecvt_utf8_utf16 is deprecated in C++17 (->https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16).With regard to the test, I was considering whether there is an elegant way to make use of the effects of
-funsigned-char
and-fsigned-char
with GCC (-> http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/C-Dialect-Options.html) or /J Option in VS (-> https://msdn.microsoft.com/en-us/library/0d294k5z.aspx). Where elegant would mean that we at least avoid building the tests with their dependecies twice (once signed once unsigned).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm I stand corrected on non-Windows platforms. Maybe it was just an Android thing. https://twitter.com/ubsanitizer/status/1052343628201259008
A
static_assert(std::is_signed<char>::value, "Untested for unsigned
char")
is probably sufficient then.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but this would prevent the change to work on unsigned char platforms without changes. Or is this just a marker to ensure unsigned char checks are done before next release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thought: We could reinterpret_cast the pointer retrieved from s.data() to signed char (or unsigned whatever fits better to the project) and avoid the platform dependency. From what I see, this should also be fine with regard to strict aliasing rules since we're only working on the buffer via our pointers and are not using any std::string operations.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be OK (because signed char/unsigned char are character types)