Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve hashtable prefetching (carbon-language#4585)
I had removed most but not all of the hashtable prefetching during development because I wasn't confident in the benchmarking results. However, I never revisited this once the benchmarking infrastructure improved and there were solid and stable results. This factors the two interesting prefetch patterns I've seen for this style of hashtable into helpers that are always called, and provides macros that can be used during the build to configure exactly which prefetch strategies are enabled. Benchmarking these and gaining confidence is very frustrating -- even now with the improved infrastructure, the noise is much higher than I would like. But it seems clear that *some* prefetching is a significant win. It also seems like enabling both results in too much prefetch traffic. And the entry group prefetch appears to be significantly more effective, both for the most interesting of the microbenchmarks and maybe most importantly for our compilation benchmarks. There, AMD is helped substantially and M1 seems to be helped some (although harder to measure). AMD server benchmark numbers: ``` name old cpu/op new cpu/op delta BM_CompileAPIFileDenseDecls<Phase::Lex>/256 35.0µs ± 2% 34.2µs ± 2% -2.40% (p=0.000 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/1024 156µs ± 2% 151µs ± 2% -3.18% (p=0.000 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/4096 625µs ± 1% 605µs ± 1% -3.22% (p=0.000 n=19+18) BM_CompileAPIFileDenseDecls<Phase::Lex>/16384 2.79ms ± 1% 2.69ms ± 2% -3.67% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/65536 12.1ms ± 1% 11.6ms ± 1% -4.30% (p=0.000 n=17+18) BM_CompileAPIFileDenseDecls<Phase::Lex>/262144 56.6ms ± 1% 53.8ms ± 1% -5.00% (p=0.000 n=18+17) BM_CompileAPIFileDenseDecls<Phase::Parse>/256 61.1µs ± 2% 61.7µs ± 1% +0.87% (p=0.000 n=19+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 288µs ± 1% 290µs ± 1% +0.55% (p=0.004 n=20+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.16ms ± 1% 1.16ms ± 1% -0.54% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 4.98ms ± 1% 4.91ms ± 1% -1.39% (p=0.000 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 20.9ms ± 1% 20.5ms ± 1% -1.86% (p=0.000 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 92.1ms ± 1% 90.2ms ± 1% -2.12% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Check>/256 1.16ms ± 2% 1.16ms ± 1% ~ (p=0.931 n=19+19) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 2.17ms ± 2% 2.16ms ± 1% ~ (p=0.247 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 6.07ms ± 1% 6.04ms ± 1% -0.48% (p=0.007 n=19+19) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 22.4ms ± 1% 22.2ms ± 1% -0.99% (p=0.000 n=20+19) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 93.3ms ± 1% 92.2ms ± 1% -1.23% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 400ms ± 1% 391ms ± 1% -2.15% (p=0.000 n=20+18) ```
- Loading branch information