-
Notifications
You must be signed in to change notification settings - Fork 12.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x86] Use function-relative label differences in the large code model, and consider it in other code models #62894
Comments
@llvm/issue-subscribers-backend-x86 |
Note that AArch64 uses distances between basic blocks to implement compressed jump tables. (Not sure how difficult it would be to implement compression on x86 given we don't really need to track the sizes of instructions at the moment...) A more exotic alternative is to embed the jump table into the text section:
|
We also have the option to put the data directly into the text section, which I think we did at some point in history. The only downside is that it creates more ROP gadgets and doesn't disassemble nicely. Darwin does this, although I'm not familiar with the I think the executable jump table you've suggested ends up having worse code size overall, so it's a fun alternative, but probably not something we'd ship. :) If we want to compress x86 jump tables, we could either estimate an upper bound on function length (65k is the obvious one), or we could invent fancy assembler directive patters ( |
I think that just marks the fact that there's data embedded in a code section, so disassemblers can be aware.
You can eliminate the padding at the cost of an extra "add"... that's 5 bytes per entry, so not that terrible. (Instead of |
https://reviews.llvm.org/D159297 uses a different approach of just making jump table entries 64 bits under the large code model. That's more resistant to blocks in functions getting moved around (e.g. BOLT-like optimizations). |
I think the main benefit of this would be reducing static relocation size, which is essentially a very very low priority for the project, and we're not going to implement this idea. |
With the large code model, the label difference may not fit into 32 bits. Even if we assume that any individual function is no larger than 2^32 and use a difference from the function entry to the target destination, things like BOLT can rearrange blocks (even if BOLT doesn't necessarily work with the large code model right now). set directives avoid static relocations in some 32-bit entry cases, but don't worry about set directives for 64-bit jump table entries (we can do that later if somebody really cares about it). check-llvm in a bootstrapped clang with the large code model passes. Fixes #62894 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D159297
With the large code model, the label difference may not fit into 32 bits. Even if we assume that any individual function is no larger than 2^32 and use a difference from the function entry to the target destination, things like BOLT can rearrange blocks (even if BOLT doesn't necessarily work with the large code model right now). set directives avoid static relocations in some 32-bit entry cases, but don't worry about set directives for 64-bit jump table entries (we can do that later if somebody really cares about it). check-llvm in a bootstrapped clang with the large code model passes. Fixes llvm#62894 Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D159297 commit-id:9020bd5e
This is the code pattern LLVM currently uses for jump tables:
->
As shown, the address of the jump table is materialized (LEA), the offset is loaded, added to the materialized offset, and we jump.
First, these label differences may overflow in the large code model when .rodata is far from .text. This code pattern also requires a static relocation for each entry because the delta is cross-section. See the objdump output:
We could, however, change the code pattern to avoid these relocations by adding in a displacement to the
movslq
instruction above, consider this assembly:This doesn't add any additional instructions, but it does use a more complex address mode which increases code size by 4 bytes, which could result in runtime performance degradation. Normally, we never care about static object file size, but I imagine that computing smaller, non-negative offsets results in more zero bytes in the jump table that probably compress better in the end.
We can't use the displacement encoding trick in the large code model. Instead, we have to materialize a 64-bit offset and add it, but that's the large code model for you.
@aeubanks @jyknight @tkoeppe
The text was updated successfully, but these errors were encountered: