-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LTO/IPO compile support #764
Conversation
It seems GCC builds in linux work well but Clang builds fail in linking phase (https://github.com/apache/incubator-kvrocks/runs/7631095274?check_suite_focus=true). I have not tried it manually but I guess it maybe due to ld, which cannot recognize Hence you can try to replace ld with lld while clang is used. I can help you to do that if you do not familiar with those stuff. |
I'll fix it. wait for |
It's fine. I will help you. I'm going to try your branch on my side and commit to it. So please do not force-push from now 🤣 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT, it looks didn't increase much compile time while comparing to before, but I didn't have a look at the memory usage.
I think we should add some descriptions for these options, why do we use, their advantage or disadvantages |
Link-time optimization (LTO) is an important optimization technology in modern compilers. As we already known, C/C++ compilers generate native object files for every translation unit (TU), which indicates compiler can only get information within a certain TU, i.e. it know nothing about another TU (except declarations). But obviously most compiler optimization approach requires the information in function definitions, like constant propagation, reachable analysis, interprocedual dataflow analysis, inlining, loop invariant analysis, pointer analysis, etc. So if compiler cannot retrieve such information, the optimization will be just discarded. Obviously, this is a huge loss. LTO postpone the optimization procedures to link-time so (almost) every definition in the program is available to the optimizer. I do not think LTO has any disadvantages other than possibly slowing down compilation. I do not know why CMake call it interprocedure optimization, since interprocedure optimization can be done without LTO (but limited since it loss information in definitions). |
Hi everyone, is there any other thought on this PR? I will merge it if no further discussion 🚀 |
oh, thanks, is there a performance comparison? |
You can check this review and this paper. |
I mean how it affects kvrocks? I don't say we will merge it only if this option must bring much obvious performance improvement, but we must know it clearly. In addition, not only for this PR changes, even if they are valid on another system, we also don't need to merge it if they don't bring earnings on kvrocks till now, since, for a long time, it will be more and more complex, and make it hard to maintain. BTW, if you merge this commit, also please describe it clearly in commit log. |
Do we have benchmark test? Or some scripts to test performence easy? We could use those stuff test LTO/before LTO difference. LTO is free lunch, just like change compiler from c++98 to c++11, c++17, more higher, more gains. |
Hi @ShooterIT, I agree with your argument that benchmark matters. I think we can wait for a benchmark before merge it or set |
from studying of these options, i think they would bring performance earnings, but till now, we still don't know how much on kvrocks |
for test, currently we can use redis-benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wanghenshui @PragmaTwice perhaps you can submit a benchmark report before and after this patch via redis-benchmark -p 6666 -q
.
Closed as no consensus. @wanghenshui I suggest you create an issue first and share the performance benchmark so that our maintainers can be sure whether this change brings good. |
before IPO redis-benchmark -q -p 6666
ERROR: failed to fetch CONFIG from 127.0.0.1:6666
WARNING: Could not fetch server CONFIG
PING_INLINE: 185528.77 requests per second, p50=0.143 msec
PING_MBULK: 191204.59 requests per second, p50=0.135 msec
SET: 132802.12 requests per second, p50=0.343 msec
GET: 205338.81 requests per second, p50=0.127 msec
INCR: 124688.28 requests per second, p50=0.359 msec
LPUSH: 95693.78 requests per second, p50=0.455 msec
RPUSH: 101936.80 requests per second, p50=0.399 msec
LPOP: 90579.71 requests per second, p50=0.463 msec
RPOP: 84033.61 requests per second, p50=0.495 msec
SADD: 132100.39 requests per second, p50=0.303 msec
HSET: 133511.34 requests per second, p50=0.311 msec
SPOP: 224719.11 requests per second, p50=0.119 msec
ZADD: 180180.17 requests per second, p50=0.231 msec
ZPOPMIN: 289855.06 requests per second, p50=0.095 msec
LPUSH (needed to benchmark LRANGE): 100200.40 requests per second, p50=0.439 msec
LRANGE_100 (first 100 elements): 118063.76 requests per second, p50=0.231 msec
LRANGE_300 (first 300 elements): 65445.03 requests per second, p50=0.415 msec
LRANGE_500 (first 500 elements): 41876.05 requests per second, p50=0.647 msec
LRANGE_600 (first 600 elements): 37735.85 requests per second, p50=0.719 msec
MSET (10 keys): 21537.80 requests per second, p50=2.223 msec after IPO redis-benchmark -q -p 6666
ERROR: failed to fetch CONFIG from 127.0.0.1:6666
WARNING: Could not fetch server CONFIG
PING_INLINE: 233100.23 requests per second, p50=0.111 msec
PING_MBULK: 241545.89 requests per second, p50=0.111 msec
SET: 137741.05 requests per second, p50=0.319 msec
GET: 204498.98 requests per second, p50=0.127 msec
INCR: 119760.48 requests per second, p50=0.375 msec
LPUSH: 99108.03 requests per second, p50=0.439 msec
RPUSH: 101626.02 requests per second, p50=0.415 msec
LPOP: 86655.11 requests per second, p50=0.463 msec
RPOP: 97656.24 requests per second, p50=0.415 msec
SADD: 149476.83 requests per second, p50=0.311 msec
HSET: 163398.70 requests per second, p50=0.255 msec
SPOP: 158478.61 requests per second, p50=0.167 msec
ZADD: 136425.66 requests per second, p50=0.327 msec
ZPOPMIN: 242718.45 requests per second, p50=0.111 msec
LPUSH (needed to benchmark LRANGE): 93720.71 requests per second, p50=0.447 msec
LRANGE_100 (first 100 elements): 141442.72 requests per second, p50=0.183 msec
LRANGE_300 (first 300 elements): 69832.40 requests per second, p50=0.391 msec
LRANGE_500 (first 500 elements): 44247.79 requests per second, p50=0.607 msec
LRANGE_600 (first 600 elements): 37721.61 requests per second, p50=0.719 msec
MSET (10 keys): 23557.13 requests per second, p50=1.911 msec after IPO optimize, redis quick bench result was worse, so don't merge it. |
Thanks for @wanghenshui input, I'd like to have a try as well. Update: I tested the ping/ping_bulk which won't be affected by the data volume, and this PR improve the performance from |
The merge conflict is resolved now. |
I think we can merge it now since the benchmark looks good. |
Yes |
Thanks all, merging... |
Thanks to @wanghenshui contribution again. |
No description provided.