Linux/x64: re-enable assembly routines and prevent illegal relocations #3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here's some notes for the next poor soul that has to deal with this. I'm omitting a lot for the sake of brievety/sanity (no, really), but it should be enough to get started and running in the right direction...
It's safe to assume that I got some (if not all) things wrong. Clarifications welcome.
Context
The problem
On linux/x64, assuming you're using any of the problematic symbols,
rav1d
will fail to link with the following errors:This error message is infamously bogus: it's not only misleading, it's leading you in the exact opposite direction of where you wanna go.
No, the issue is not that you forgot to compile position-independent code, the issue is that you are in fact very much compiling position-independent code, and that's a problem. Let's look at why that is.
SIMD-rav1d in a nutshell
rav1d's SIMD routines work by sampling tables of data at runtime. These tables are defined, and exposed, in
tables.rs
. Here's one of them:re_rav1d/src/tables.rs
Lines 1004 to 1070 in 8097727
These tables are then referenced in the SIMD routines directly, using their non-mangled symbol.
This is achieved via the
cextern
macro:re_rav1d/src/ext/x86/x86inc.asm
Lines 865 to 869 in 8097727
The
cextern
macro does a bunch of things (btw,private_prefix
==dav1d
) that we don't really need to concern ourselves with, but most importantly it bottoms out toextern
, which does what you think it does (and in fact so, so much more...).Example usage:
re_rav1d/src/x86/mc_sse.asm
Line 9360 in 54888c3
re_rav1d/src/x86/mc_sse.asm
Lines 9536 to 9539 in 54888c3
The build script compiles those tables into object files, assembles these SIMD routines into yet other object files, and finally pack all of that in a neat little Rust archive (
.rlib
is really just a.a
with some extra Rust metadata baked in).If you check out the symbols, you'll find what you'd expect. Here's e.g.
dav1d_resize_filter
:The actual table is there (first row), followed by all the yet-to-be-resolved references to it from the different SIMD routines.
Unresolved references means relocations, so let's look at those:
Once again, looks pretty good.
A
R_X86_64_PC32
relocation instructs the linker to patch the missing reference using a 32bit offset from the instruction pointer, something that can be done either at link- or load- time.In this specific case, there is no reason to defer the resolution to load-time since we already know everything we need to know at link-time. Either way, looks fine.
Except it doesn't link. Why?!
It looks like everything is correct on
rav1d
's end (and the fact that everything works flawlessly on all other platforms would tend to confirm that), and so deeper we go...Rust, position-independent code and relocation nightmares
On Linux, ASLR has long become standard: all major C compilers are configured to output PIC (position-independent code, for shared libraries) and PIE (position-independent code, for binaries) by default.
rustc
is no different: it will ship position-independent executables by default:This is why the link is failing: because we are linking our object files into a PIE, and for some reason that's a problem.
Now, 15 years ago, the solution would have been to disable PIE support entirely across your whole toolchain, and call it a day.
These days that's a no-no though: everything expects PIE and you need it to work, and so we need to understand what's wrong...
Let's look at that error message again:
This doesn't make any sense: we literally are compiling a PIE, that's the reason we have a problem to begin with.
And why would a PC-relative relocation ever be a problem for ASLR?
Not to mention that all of this will be statically compiled into the final
rerun
executable anyway, in which case the relocations will be resolved at link-time anyhow, so how could any of this matter at all?!The thing with
R_X86_64_PC32
relocations is they are generally incompatible with position-independent code on 64bit platforms.This makes sense: this kind of relocation can only jump into a ±2 GiB range from the current instruction, but the kernel will map the executable and the different libraries it depends on randomly across the entire 64bit virtual address space for ASLR, and so a 32bit signed offset will indeed likely not be enough.
The general wisdom at this point is to do away with direct relocations, and introduce an indirection through a global offset table, using e.g. a
R_X86_64_GOTPCREL
. Now the code only has to do a direct jump into a GOT that's sitting at a known, fixed offset and then another indirect jump to whatever offset is present in that GOT entry (which was put there either by the loader of the runtime dynamic linker).This is slower a process, it requires more complicated assembly, but it generally works in all possible cases ("no problem that cannot be solved with an extra indirection").
Ian Wienand has a good article on the matter, which I recommend reading for a better view of the entire picture.
This is the first thing I've tried: patching
rav1d
so that everything went through a GOT indirection first.That did indeed fix the linking, but now
rav1d
was segfaulting at runtime, probably because I messed up any one of the gazillon offsets somewhere.The
rav1d
assembly codebase is fairly complex, and trying to patch all these references to jump through the GOT first is a non-trivial, extremely painful to debug task.Technically, with enough work, one could make that work. But again, why? Why is it that any of these purely runtime, ASLR-related issues matter at all in our case?
All we're trying to do is resolve some symbols at link-time and call it a day, this doesn't make any sense.
Alright... deeper we go.
Actual hell: runtime interposition
We finally reach the root reason of all our pains: runtime interposition, more commonly known by its infamous
LD_PRELOAD
environment variable.Runtime interposition is a feature of dynamic linking on Linux that allows anybody to intercept and override any public ELF symbol with their own, at load and/or run time.
The most famous use case is probably that of overriding the global C allocator from a shell:
This should trigger a look of sheer horror on your face: that is the problem.
Because the symbols of our tables are public, the linker assumes that they could be intercepted at any point.
Because the binary we're building is position-independent, the linker also assumes that whoever intercepts that symbol may live anywhere in the 64bit virtual address space.
Therefore, a
R_X86_64_PC32
cannot be used: there is no guarantee that 32bits will be enough.Let's look at this error message one last time:
This is not telling us to recompile our code with
-fPIC
, it's simply telling us that our object files containing the SIMD routines needs to be PIC compliant, but right now they're not.The problem is that it's assuming that these files were generated by one of the major C compilers, rather than simply handwritten. Passing
-fPIC
makes no sense: there's no compiler to pass it to. We are the compiler.So... we just make all these symbols non-interceptable, and we're done, right?
The solution: have some privacy
For some reason, I haven't been able to find any documentation, forum post, or even managed to make an LLM tell me how on earth to change the visibility of an ELF symbol using nasm assembly.
But after a lot of trial and error, and mimicking other similar syntax I found across the codebase, I ended up with this syntax:
And guess what, it works:
And not only that, it links! 🥳
As expected, the linker can now see that there's no way these symbols could ever be intercepted, and therefore they can just be linked in as-is.
All relocations are resolved at link-time, and the final binary just works ™️. Lean and clean:
Update
@Wumpf asks a very good question:
Here's what's happening there: the tables themselves are marked as hidden using compiler intrinsics:
https://github.com/memorysafety/rav1d/blob/c7d127e7e31bd3366ac7dc1717dda9782905c605/src/tables.h#L113-L115
where
EXTERN
is defined as:https://github.com/memorysafety/rav1d/blob/c7d127e7e31bd3366ac7dc1717dda9782905c605/include/common/attributes.h#L116-L120
Because of this, all the references that follow are themselves automatically marked as
HIDDEN
, and all our problems disappear.This is also the reason why the tables themselves are not visible at all in
dav1d.so
:Bad news: I initially tried hiding the Rust symbol directly (as opposed to the external references in the assembly code), but there doesn't seem to exist any way to do that (the closest you can get to that is by using custom linker scripts in your build.rs, but even then that's just unmaintainable).
Good news: that confirms all of the above!