-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make is_ascii_hexdigit branchless #103024
Closed
Closed
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, but it feels like the lack of branches still isn't clear to the passive viewer since
'0'..='9'
and'a'..='f'
are still two separate ranges with a gap in the middle.Oh, and uh, that
||
is looking suspiciously not-branchless. Since it ends up being so in the resulting assembly, perhaps it could be reworded as|
?Reading the compiled output:
Which is effectively:
It's less that there are "fewer" branches in this code, but more that going from three ranges to two triggers a threshold in LLVM's side that makes it decide that branches are no longer worth it, and it removes them.
So... tying this all together, maybe the real point is not that it's branchless by itself, but that there are fewer computations overall and LLVM is likely to optimise it without branches as a result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable, I can do one where all the optimization is done by hand to make it clear what the resulting assembly should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for pushing churn on you here, but I'm actually going to ask that you aim for a middle ground where the code is still as obvious as possible while still producing branchless assembly.
Particularly, there's no need to replace the range changes with wrapping_sub manually, as LLVM will do that quite reliably: https://rust.godbolt.org/z/rPaxbf1o7
It's normal in the library for the code to not be in the form of the expected assembly. For examples,
is_power_of_two
is phrased usingcount_ones
, not the well-known bitwise tricksrust/library/core/src/num/uint_macros.rs
Lines 2132 to 2134 in 1ca6777
as that leaves it up to LLVM to generate the best assembly sequence -- which will be the
(x != 0) && (x & (x-1)) == 0
on i386, but libcore doesn't need to know that.So please go back to the version with the
const
s you had in a previous iteration. And consider making a local for the lower-cased version of the character -- that would help localize the comment. Maybe something likeYou could also consider splitting the hex_letter part into a separate (non-
pub
) function so thatis_ascii_hexdigit
becomes justto isolate the conversion-and-range stuff to its own thing.
@rustbot author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like even the manual masking isn't necessary.
I was trying to make a repro to file an LLVM bug that it should figure out how to do that itself, and it turns out it already can. There's just something unfortunate about the
matches!
going on. Because just writing out the obvious comparisons does exactly what's needed: https://rust.godbolt.org/z/cPEoa1nT8No branches, and does the subtraction and masking tricks:
(it picks
& !32
instead of| 32
, but same thing.)I might still say to use
|
there, though, to getor i1 %1, %4
instead of theselect
, even though the x86 backend appears to use anor
anyway.Now I guess I need to figure out what's going wrong in
matches!
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it's just something about the or pattern in
matches!
. This is also branchless: https://rust.godbolt.org/z/9xqT38rz9(And yup, it works for
char
too: https://rust.godbolt.org/z/WvfsjMG6a.)So I think, in the end, the right answer here is to just replace some short-circuiting rust operations with non-short circuiting ones (the logical or and pattern or each with a bitwise or instead) and call it a day, as that gives exactly the desired output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Loving the investigative work here. Perhaps this is an actual codegen issue, since
matches!(x, P) | matches!(x, Q)
should ideally generate code rather close tomatches!(x, P | Q)
. It feels like more of a codegen quirk than an LLVM optimizer quirk, but I could be wrong.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dug in some more, and it's actually more interesting than I expected!
It looks like with the or-pattern it's tripping a different technique in LLVM. https://rust.godbolt.org/z/zdW1T43T3
Starting with the nice obvious
SimplifyCfg and SROA get it down to the nice simple short-circuiting
and InstCombine does the "I know how to simplify range checks like that" rewrite:
But then something really interesting happens, and
SimplifyCfg
says "wait, that looks like a lookup table!", givingThen later SimplifyCfg looks at that again and say "wait, that's a weird switch, how about I do that with a shift instead?" by encoding the table into an
i38
(because'A'
through'f'
is 38 values):which is also branchless. Just not quite as good a way.