-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigation into ASCII ctype inherent methods performance with lookup table #72895
Comments
Please note (I haven't had time to read your entire comment, sorry) that in the case of the methods on char, being in libcore, it is generally pretty important that we keep an eye on the size of the generated code as well as the performance. |
Yes, that is why I'm also looking into other implementations (without a lookup table). Specifically #[inline]
pub const fn is_ascii_punctuation(&self) -> bool {
matches!(*self, b'!'..=b'/' | b':'..=b'@' | b'['..=b'`' | b'{'..=b'~')
} which results in complex branching. An alternative implementation would be: #[inline]
pub const fn is_ascii_punctuation(&self) -> bool {
matches!(*self, b'!'..=b'@' | b'['..=b'`' | b'{'..=b'~') && !matches!(*self, b'0'..=b'9')
} Which seems to generate a bit more efficient code and in my benchmark results in a slight speedup (although using a lookup table is still twice as fast). There might exist other, even more efficient bit-twiddling implementations. However, measuring these results still relies upon the quality of my benchmark, of which I'm not a 100% sure. That is why I posted this report, to get some feedback on the best way to measure these things. Any feedback or help is appreciated, but take the time you need. |
What is the effect on the L1 cache pressure? |
@CDirkx could you add another approach of a sorted positive char list? The algo complexity of binary searching or scanning the array might not matter given that for Edit: I just realized that |
I compared some other implementations, again specifically for
pub const fn is_ascii_punctuation_branch(byte: &u8) -> bool {
matches!(*byte, b'!'..=b'/' | b':'..=b'@' | b'['..=b'`' | b'{'..=b'~')
}
pub const fn is_ascii_punctuation_branch_adapted(byte: &u8) -> bool {
// !byte.is_alphanumeric() && byte.is_graphic()
!matches!(*byte, b'0'..=b'9' | b'A'..=b'Z' | b'a'..=b'z') && matches!(*byte, b'!'..=b'~')
}
pub const fn is_ascii_punctuation_lookup(byte: &u8) -> bool {
const LOOKUP : [bool; 256] = ...;
LOOKUP[*byte as usize]
}
pub const fn is_ascii_punctuation_hybrid(byte: &u8) -> bool {
const LOOKUP : [bool; 128] = ...;
byte.is_ascii() && LOOKUP[*byte as usize]
}
pub fn is_ascii_punctuation_linear_search(byte: &u8) -> bool {
const LOOKUP : [u8; 32] = ...;
LOOKUP.contains(byte)
}
pub fn is_ascii_punctuation_binary_search(byte: &u8) -> bool {
const LOOKUP : [u8; 32] = ...;
LOOKUP.binary_search(byte).is_ok()
} Re: @estebank, actually
pub const fn is_ascii_punctuation_lookup_bitset(byte: &u8) -> bool {
const LOOKUP : [u8; 32] = ...;
LOOKUP[(*byte / 8) as usize] >> (*byte % 8) & 1u8 == 1
}
pub const fn is_ascii_punctuation_hybrid_bitset(byte: &u8) -> bool {
const LOOKUP : [u8; 16] = ...;
byte.is_ascii() && LOOKUP[(*byte / 8) as usize] >> (*byte % 8) & 1u8 == 1
} Generated code for comparison: https://rust.godbolt.org/z/mwf4B4. BenchmarkSame setup as before: #![feature(test)]
#![feature(decl_macro)]
extern crate test;
// "Hamlet, Prince of Denmark", adapted from https://www.gutenberg.org/files/27761/27761-0.txt
const SOURCE_TEXT: &'static str = include_str!("hamlet.txt");
macro bench_impl($bench:expr, $condition:expr) {
$bench.iter(|| {
let mut total = 0;
for byte in SOURCE_TEXT.bytes() {
if $condition(&byte) { total += 1; }
}
assert_eq!(total, 10566);
});
}
#[bench]
fn is_ascii_punctuation_branch(bench: &mut test::Bencher) {
bench_impl!(bench, |byte: &u8| {
byte.is_ascii_punctuation()
});
}
#[bench]
fn is_ascii_punctuation_branch_adapted(bench: &mut test::Bencher) {
bench_impl!(bench, |byte: &u8| {
!byte.is_ascii_alphanumeric() && byte.is_ascii_graphic()
});
}
#[bench]
fn is_ascii_punctuation_lookup(bench: &mut test::Bencher) {
const LOOKUP : [bool; 256] = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false];
bench_impl!(bench, |byte: &u8| {
LOOKUP[*byte as usize]
});
}
#[bench]
fn is_ascii_punctuation_hybrid(bench: &mut test::Bencher) {
const LOOKUP : [bool; 128] = [false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, true, true, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, true, true, true, true, false];
bench_impl!(bench, |byte: &u8| {
byte.is_ascii() && LOOKUP[*byte as usize]
});
}
#[bench]
fn is_ascii_punctuation_linear_search(bench: &mut test::Bencher) {
const LOOKUP : [u8; 32] = [46, 95, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 58, 59, 60, 61, 62, 63, 64, 91, 92, 93, 94, 96, 123, 124, 125, 126];
bench_impl!(bench, |byte: &u8| {
LOOKUP.contains(byte)
});
}
#[bench]
fn is_ascii_punctuation_binary_search(bench: &mut test::Bencher) {
const LOOKUP : [u8; 32] = [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 58, 59, 60, 61, 62, 63, 64, 91, 92, 93, 94, 95, 96, 123, 124, 125, 126];
bench_impl!(bench, |byte: &u8| {
LOOKUP.binary_search(byte).is_ok()
});
}
#[bench]
fn is_ascii_punctuation_lookup_bitset(bench: &mut test::Bencher) {
const LOOKUP : [u8; 32] = [0, 0, 0, 0, 254, 255, 0, 252, 1, 0, 0, 248, 1, 0, 0, 120, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
bench_impl!(bench, |byte: &u8| {
LOOKUP[(*byte / 8) as usize] >> (*byte % 8) & 1u8 == 1
});
}
#[bench]
fn is_ascii_punctuation_hybrid_bitset(bench: &mut test::Bencher) {
const LOOKUP : [u8; 16] = [0, 0, 0, 0, 254, 255, 0, 252, 1, 0, 0, 248, 1, 0, 0, 120];
bench_impl!(bench, |byte: &u8| {
byte.is_ascii() && LOOKUP[(*byte / 8) as usize] >> (*byte % 8) & 1u8 == 1
});
} Results
Adapted from the output of Analysis
ConclusionI think these tests uncovered two valuable techniques:
Now we can test if these techniques are also effective with the other methods. As before, the results above might be influenced by the design of the benchmark, so I was planning on also doing some further investigation into usage of these methods in the wild, other source texts and other benchmarking tasks. |
There is another revision of the pub fn is_ascii_punctuation_hybrid_bitset(byte: &u8) -> bool {
const LOOKUP: u128 = 79753679825085174867510150554292584448;
LOOKUP >> *byte & 1 == 1
} I dropped |
@alex-700 You can't drop the There's also an off-by-one as Edit: Corrected lookup table for that off-by-one is |
@LingMan thanks! Shame on me :( Adding |
Nothing to be ashamed of. It's a mistake. Everyone makes mistakes all the time. Recognize it, learn from it, do better and carry on :-) |
|
(Remember for core embedded stuff memory is tight. My keyboard for example only has 20kb ram.) |
Prompted by #68983 (comment), I started looking into whether the performance of methods like
is_ascii_alphabetic
could be improved by using a lookup table.The full list of methods:
is_ascii_alphabetic
is_ascii_alphanumeric
is_ascii_control
is_ascii_digit
is_ascii_graphic
is_ascii_hexdigit
is_ascii_lowercase
is_ascii_punctuation
is_ascii_uppercase
is_ascii_whitespace
Implementations
Currently, all of these methods are implemented by matching on characters/ranges:
I investigated two ways of implementing these functions with a lookup table:
A lookup table for the entire
u8
range:A hybrid approach by checking if the byte is ascii first, reducing table size by half:
I will be calling these implementations
branch
,lookup
andhybrid
respectively throughout the rest of this report.Using the features
const_if_match
andconst_loop
, the lookup tables forlookup
andhybrid
can be easily generated:The instructions these different implementations compile down to can be compared on godbolt.
Benchmark
Note: I do not have enough experience with benchmarking to know if this approach is fully representative, let me know if you know any way the benchmark can be improved.
The task I used for benchmarking is iterating through the bytes of a source text and counting how many of them satisfy a condition, e.g.
is_ascii_alphabetic
:This results in a tight loop, which due to caching might be favorable to the lookup tables. However, some quick searching for the use of the ascii methods in open source projects reveals at least a few instances of them being used in a loop or iterator filter, so the benchmark is representative of at least some real-world usage.
As a source text I used "Hamlet, Prince of Denmark", adapted from https://www.gutenberg.org/files/27761/27761-0.txt, a primarily ascii text of 4k+ lines:
Benchmark
Files: benches.zip
Results
branch
hybrid
lookup
is_ascii_alphabetic
is_ascii_alphanumeric
is_ascii_control
is_ascii_digit
is_ascii_graphic
is_ascii_hexdigit
is_ascii_lowercase
is_ascii_punctuation
is_ascii_uppercase
is_ascii_whitespace
Adapted from the output of
cargo bench
, results are in ns.Analysis
It seems the
hybrid
approach with the smaller lookup table was overall as fast or faster thanlookup
, even though it has to do an extra check (smaller table fits better in cache?).The
hybrid
approach was significantly faster than the currentbranch
implementation for:is_ascii_control
(105,160 ±10,883 vs. 154,802 ±17,761)is_ascii_hexdigit
(280,170 ± 17,197 vs. 358,390 ± 21,165)is_ascii_punctuation
(157,147 ±7,353 vs. 372,600 ± 22,126)Looking at the current implementation for is_ascii_hexdigit and is_ascii_punctuation and compiler output, these two methods have the most complex branches of all the ascii methods, indicating that there is indeed potential for a faster implementation using a lookup table.
Why the
hybrid
version ofis_ascii_control
is faster I can not say, maybe caching or an artifact of the way I benchmarked?Conclusion
From this preliminary investigation it seems that at least some of the ascii methods can be implemented faster, prompting further investigation.
As further steps I propose first evaluating the merits of this benchmark, and then conduct more. There are still a number of questions: are the results of this benchmark correct, or is another factor interfering with the results? Is the benchmark representative of real-world usage? Is the source text influencing the benchmark?
Once we are sure our benchmarks are measuring what we want, we can proceed with the investigation: quantify the trade-off of speed vs. memory, and compare alternate implementations. This information will inform the discussion on whether it makes sense to change the current implementations.
The text was updated successfully, but these errors were encountered: