Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux/x64: re-enable assembly routines and prevent illegal relocations #3

Merged
merged 4 commits into from
Oct 15, 2024

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Oct 15, 2024

Here's some notes for the next poor soul that has to deal with this. I'm omitting a lot for the sake of brievety/sanity (no, really), but it should be enough to get started and running in the right direction...

It's safe to assume that I got some (if not all) things wrong. Clarifications welcome.

Context

The problem

On linux/x64, assuming you're using any of the problematic symbols, rav1d will fail to link with the following errors:

mold: error: /home/cmc/dev/rerun-io/rerun/target/debug/deps/libre_rav1d-d1774017185e2712.rlib(mc_avx2.o):(.text): R_X86_64_PC32 relocation at offset 0x7562 against symbol `dav1d_mc_subpel_filters' can not be used; recompile with -fPIC
mold: error: /home/cmc/dev/rerun-io/rerun/target/debug/deps/libre_rav1d-d1774017185e2712.rlib(mc16_avx2.o):(.text): R_X86_64_PC32 relocation at offset 0xa65c against symbol `dav1d_resize_filter' can not be used; recompile with -fPIC
mold: error: /home/cmc/dev/rerun-io/rerun/target/debug/deps/libre_rav1d-d1774017185e2712.rlib(mc16_avx2.o):(.text): R_X86_64_PC32 relocation at offset 0xa687 against symbol `dav1d_resize_filter' can not be used; recompile with -fPIC
mold: error: /home/cmc/dev/rerun-io/rerun/target/debug/deps/libre_rav1d-d1774017185e2712.rlib(mc_avx2.o):(.text): R_X86_64_PC32 relocation at offset 0x7581 against symbol `dav1d_mc_subpel_filters' can not be used; recompile with -fPIC
mold: error: /home/cmc/dev/rerun-io/rerun/target/debug/deps/libre_rav1d-d1774017185e2712.rlib(mc16_avx2.o):(.text): R_X86_64_PC32 relocation at offset 0xa691 against symbol `dav1d_resize_filter' can not be used; recompile with -fPIC
# ... more of the same ...

This error message is infamously bogus: it's not only misleading, it's leading you in the exact opposite direction of where you wanna go.
No, the issue is not that you forgot to compile position-independent code, the issue is that you are in fact very much compiling position-independent code, and that's a problem. Let's look at why that is.

SIMD-rav1d in a nutshell

rav1d's SIMD routines work by sampling tables of data at runtime. These tables are defined, and exposed, in tables.rs. Here's one of them:

re_rav1d/src/tables.rs

Lines 1004 to 1070 in 8097727

#[no_mangle]
pub static dav1d_resize_filter: Align8<[[i8; 8]; 64]> = Align8([
[0, 0, 0, -128, 0, 0, 0, 0],
[0, 0, 1, -128, -2, 1, 0, 0],
[0, -1, 3, -127, -4, 2, -1, 0],
[0, -1, 4, -127, -6, 3, -1, 0],
[0, -2, 6, -126, -8, 3, -1, 0],
[0, -2, 7, -125, -11, 4, -1, 0],
[1, -2, 8, -125, -13, 5, -2, 0],
[1, -3, 9, -124, -15, 6, -2, 0],
[1, -3, 10, -123, -18, 6, -2, 1],
[1, -3, 11, -122, -20, 7, -3, 1],
[1, -4, 12, -121, -22, 8, -3, 1],
[1, -4, 13, -120, -25, 9, -3, 1],
[1, -4, 14, -118, -28, 9, -3, 1],
[1, -4, 15, -117, -30, 10, -4, 1],
[1, -5, 16, -116, -32, 11, -4, 1],
[1, -5, 16, -114, -35, 12, -4, 1],
[1, -5, 17, -112, -38, 12, -4, 1],
[1, -5, 18, -111, -40, 13, -5, 1],
[1, -5, 18, -109, -43, 14, -5, 1],
[1, -6, 19, -107, -45, 14, -5, 1],
[1, -6, 19, -105, -48, 15, -5, 1],
[1, -6, 19, -103, -51, 16, -5, 1],
[1, -6, 20, -101, -53, 16, -6, 1],
[1, -6, 20, -99, -56, 17, -6, 1],
[1, -6, 20, -97, -58, 17, -6, 1],
[1, -6, 20, -95, -61, 18, -6, 1],
[2, -7, 20, -93, -64, 18, -6, 2],
[2, -7, 20, -91, -66, 19, -6, 1],
[2, -7, 20, -88, -69, 19, -6, 1],
[2, -7, 20, -86, -71, 19, -6, 1],
[2, -7, 20, -84, -74, 20, -7, 2],
[2, -7, 20, -81, -76, 20, -7, 1],
[2, -7, 20, -79, -79, 20, -7, 2],
[1, -7, 20, -76, -81, 20, -7, 2],
[2, -7, 20, -74, -84, 20, -7, 2],
[1, -6, 19, -71, -86, 20, -7, 2],
[1, -6, 19, -69, -88, 20, -7, 2],
[1, -6, 19, -66, -91, 20, -7, 2],
[2, -6, 18, -64, -93, 20, -7, 2],
[1, -6, 18, -61, -95, 20, -6, 1],
[1, -6, 17, -58, -97, 20, -6, 1],
[1, -6, 17, -56, -99, 20, -6, 1],
[1, -6, 16, -53, -101, 20, -6, 1],
[1, -5, 16, -51, -103, 19, -6, 1],
[1, -5, 15, -48, -105, 19, -6, 1],
[1, -5, 14, -45, -107, 19, -6, 1],
[1, -5, 14, -43, -109, 18, -5, 1],
[1, -5, 13, -40, -111, 18, -5, 1],
[1, -4, 12, -38, -112, 17, -5, 1],
[1, -4, 12, -35, -114, 16, -5, 1],
[1, -4, 11, -32, -116, 16, -5, 1],
[1, -4, 10, -30, -117, 15, -4, 1],
[1, -3, 9, -28, -118, 14, -4, 1],
[1, -3, 9, -25, -120, 13, -4, 1],
[1, -3, 8, -22, -121, 12, -4, 1],
[1, -3, 7, -20, -122, 11, -3, 1],
[1, -2, 6, -18, -123, 10, -3, 1],
[0, -2, 6, -15, -124, 9, -3, 1],
[0, -2, 5, -13, -125, 8, -2, 1],
[0, -1, 4, -11, -125, 7, -2, 0],
[0, -1, 3, -8, -126, 6, -2, 0],
[0, -1, 3, -6, -127, 4, -1, 0],
[0, -1, 2, -4, -127, 3, -1, 0],
[0, 0, 1, -2, -128, 1, 0, 0],
]);

These tables are then referenced in the SIMD routines directly, using their non-mangled symbol.
This is achieved via the cextern macro:

%macro cextern 1
%xdefine %1 mangle(private_prefix %+ _ %+ %1)
CAT_XDEFINE cglobaled_, %1, 2
extern %1
%endmacro

The cextern macro does a bunch of things (btw, private_prefix == dav1d) that we don't really need to concern ourselves with, but most importantly it bottoms out to extern, which does what you think it does (and in fact so, so much more...).

Example usage:

cextern resize_filter

re_rav1d/src/x86/mc_sse.asm

Lines 9536 to 9539 in 54888c3

movq m6, [base+resize_filter+r8*8]
movq m7, [base+resize_filter+r10*8]
movhps m6, [base+resize_filter+r9*8]
movhps m7, [base+resize_filter+r11*8]

The build script compiles those tables into object files, assembles these SIMD routines into yet other object files, and finally pack all of that in a neat little Rust archive (.rlib is really just a .a with some extra Rust metadata baked in).

If you check out the symbols, you'll find what you'd expect. Here's e.g. dav1d_resize_filter:

$ readelf --wide -s target/release/libre_rav1d.rlib | rg resize_filter
  5141: 0000000000000000   512 OBJECT  GLOBAL DEFAULT 4682 dav1d_resize_filter
   676: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter
   572: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter
   474: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter
   459: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter
   544: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter
   482: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter

The actual table is there (first row), followed by all the yet-to-be-resolved references to it from the different SIMD routines.

Unresolved references means relocations, so let's look at those:

$ readelf --wide -r target/debug/libre_rav1d.rlib | rg resize_filter
00000000000081ef  000002a700000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 81ef
00000000000081f9  000002a700000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 81f9
0000000000008203  000002a700000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 8203
000000000000820d  000002a700000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 820d
0000000000009562  0000023f00000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 9562
000000000000956c  0000023f00000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 956c
0000000000009576  0000023f00000002 R_X86_64_PC32          0000000000000000 dav1d_resize_filter + 9576
# ... more of the same ...

Once again, looks pretty good.

A R_X86_64_PC32 relocation instructs the linker to patch the missing reference using a 32bit offset from the instruction pointer, something that can be done either at link- or load- time.
In this specific case, there is no reason to defer the resolution to load-time since we already know everything we need to know at link-time. Either way, looks fine.
Except it doesn't link. Why?!

It looks like everything is correct on rav1d's end (and the fact that everything works flawlessly on all other platforms would tend to confirm that), and so deeper we go...

Rust, position-independent code and relocation nightmares

On Linux, ASLR has long become standard: all major C compilers are configured to output PIC (position-independent code, for shared libraries) and PIE (position-independent code, for binaries) by default.
rustc is no different: it will ship position-independent executables by default:

$ file target/release/dav1d
target/release/dav1d: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=6ad723122b38f681731101b93d11556d5eb01adb, for GNU/Linux 4.4.0, not stripped
                                     ^^^

This is why the link is failing: because we are linking our object files into a PIE, and for some reason that's a problem.

Now, 15 years ago, the solution would have been to disable PIE support entirely across your whole toolchain, and call it a day.
These days that's a no-no though: everything expects PIE and you need it to work, and so we need to understand what's wrong...

Let's look at that error message again:

R_X86_64_PC32 relocation at offset 0xa691 against symbol `dav1d_resize_filter' can not be used; recompile with -fPIC

This doesn't make any sense: we literally are compiling a PIE, that's the reason we have a problem to begin with.
And why would a PC-relative relocation ever be a problem for ASLR?
Not to mention that all of this will be statically compiled into the final rerun executable anyway, in which case the relocations will be resolved at link-time anyhow, so how could any of this matter at all?!

The thing with R_X86_64_PC32 relocations is they are generally incompatible with position-independent code on 64bit platforms.
This makes sense: this kind of relocation can only jump into a ±2 GiB range from the current instruction, but the kernel will map the executable and the different libraries it depends on randomly across the entire 64bit virtual address space for ASLR, and so a 32bit signed offset will indeed likely not be enough.
The general wisdom at this point is to do away with direct relocations, and introduce an indirection through a global offset table, using e.g. a R_X86_64_GOTPCREL. Now the code only has to do a direct jump into a GOT that's sitting at a known, fixed offset and then another indirect jump to whatever offset is present in that GOT entry (which was put there either by the loader of the runtime dynamic linker).
This is slower a process, it requires more complicated assembly, but it generally works in all possible cases ("no problem that cannot be solved with an extra indirection").
Ian Wienand has a good article on the matter, which I recommend reading for a better view of the entire picture.

This is the first thing I've tried: patching rav1d so that everything went through a GOT indirection first.
That did indeed fix the linking, but now rav1d was segfaulting at runtime, probably because I messed up any one of the gazillon offsets somewhere.
The rav1d assembly codebase is fairly complex, and trying to patch all these references to jump through the GOT first is a non-trivial, extremely painful to debug task.

Technically, with enough work, one could make that work. But again, why? Why is it that any of these purely runtime, ASLR-related issues matter at all in our case?
All we're trying to do is resolve some symbols at link-time and call it a day, this doesn't make any sense.

Alright... deeper we go.

Actual hell: runtime interposition

We finally reach the root reason of all our pains: runtime interposition, more commonly known by its infamous LD_PRELOAD environment variable.

Runtime interposition is a feature of dynamic linking on Linux that allows anybody to intercept and override any public ELF symbol with their own, at load and/or run time.
The most famous use case is probably that of overriding the global C allocator from a shell:

LD_PRELOAD=mymalloc.so ./doom.out

This should trigger a look of sheer horror on your face: that is the problem.

Because the symbols of our tables are public, the linker assumes that they could be intercepted at any point.

   676: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dav1d_resize_filter
                                              ^^^^^^^
                                      `DEFAULT` means public, btw

Because the binary we're building is position-independent, the linker also assumes that whoever intercepts that symbol may live anywhere in the 64bit virtual address space.
Therefore, a R_X86_64_PC32 cannot be used: there is no guarantee that 32bits will be enough.

Let's look at this error message one last time:

R_X86_64_PC32 relocation at offset 0xa691 against symbol `dav1d_resize_filter' can not be used; recompile with -fPIC

This is not telling us to recompile our code with -fPIC, it's simply telling us that our object files containing the SIMD routines needs to be PIC compliant, but right now they're not.
The problem is that it's assuming that these files were generated by one of the major C compilers, rather than simply handwritten. Passing -fPIC makes no sense: there's no compiler to pass it to. We are the compiler.

So... we just make all these symbols non-interceptable, and we're done, right?

The solution: have some privacy

For some reason, I haven't been able to find any documentation, forum post, or even managed to make an LLM tell me how on earth to change the visibility of an ELF symbol using nasm assembly.

But after a lot of trial and error, and mimicking other similar syntax I found across the codebase, I ended up with this syntax:

external var:$elf_type $elf_visibility

And guess what, it works:

$ readelf --wide -s target/release/libre_rav1d.rlib | rg resize_filter
  5141: 0000000000000000   512 OBJECT  GLOBAL DEFAULT 4682 dav1d_resize_filter
   676: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND dav1d_resize_filter
   572: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND dav1d_resize_filter
   474: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND dav1d_resize_filter
   459: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND dav1d_resize_filter
   544: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND dav1d_resize_filter
   482: 0000000000000000     0 OBJECT  GLOBAL HIDDEN   UND dav1d_resize_filter
                                              ^^^^^^

And not only that, it links! 🥳

As expected, the linker can now see that there's no way these symbols could ever be intercepted, and therefore they can just be linked in as-is.
All relocations are resolved at link-time, and the final binary just works ™️. Lean and clean:

$ readelf --wide -s target/release/rerun | rg resize_filter
  4496: 0000000002f0f3f0     0 OBJECT  LOCAL  DEFAULT   35 dav1d_resize_filter$got
112105: 0000000000877968   512 OBJECT  LOCAL  HIDDEN    15 dav1d_resize_filter
$ readelf --wide -r target/release/rerun | rg resize_filter
# <nothing>

Update

@Wumpf asks a very good question:

so.. but how does this work in the original dav1d, I presume we don’t know?
I figure they had the exact same asm files, no?
which hints that they’re private there, just like they are now after your change. :thinking_face:

Here's what's happening there: the tables themselves are marked as hidden using compiler intrinsics:
https://github.com/memorysafety/rav1d/blob/c7d127e7e31bd3366ac7dc1717dda9782905c605/src/tables.h#L113-L115

EXTERN const int8_t dav1d_mc_subpel_filters[6][15][8];
EXTERN const int8_t dav1d_mc_warp_filter[193][8];
EXTERN const int8_t dav1d_resize_filter[64][8];

where EXTERN is defined as:
https://github.com/memorysafety/rav1d/blob/c7d127e7e31bd3366ac7dc1717dda9782905c605/include/common/attributes.h#L116-L120

#if (defined(__ELF__) || defined(__MACH__) || (defined(_WIN32) && defined(__clang__))) && __has_attribute(visibility)
#define EXTERN extern __attribute__((visibility("hidden")))
#else
#define EXTERN extern
#endif

Because of this, all the references that follow are themselves automatically marked as HIDDEN, and all our problems disappear.
This is also the reason why the tables themselves are not visible at all in dav1d.so:

readelf --wide -s /usr/lib/libdav1d.so | rg dav1d_
    39: 0000000000003284  1797 FUNC    GLOBAL DEFAULT   11 dav1d_open
    40: 000000000016a390   912 FUNC    GLOBAL DEFAULT   11 dav1d_flush
    41: 0000000000169910   610 FUNC    GLOBAL DEFAULT   11 dav1d_parse_sequence_header
    42: 0000000000003989    20 FUNC    GLOBAL DEFAULT   11 dav1d_close
    43: 0000000000169c60   113 FUNC    GLOBAL DEFAULT   11 dav1d_data_props_unref
    44: 0000000000169320   206 FUNC    GLOBAL DEFAULT   11 dav1d_data_wrap
    45: 0000000000002b81    12 FUNC    GLOBAL DEFAULT   11 dav1d_version
    46: 0000000000002d9f    96 FUNC    GLOBAL DEFAULT   11 dav1d_get_frame_delay
    47: 000000000016a060   413 FUNC    GLOBAL DEFAULT   11 dav1d_apply_grain
    48: 0000000000172f00   890 FUNC    GLOBAL DEFAULT   11 dav1d_get_picture
    49: 00000000001692a0   115 FUNC    GLOBAL DEFAULT   11 dav1d_data_create
    50: 0000000000169270    41 FUNC    GLOBAL DEFAULT   11 dav1d_get_event_flags
    51: 0000000000002020    11 FUNC    GLOBAL DEFAULT   11 dav1d_set_cpu_flags_mask
    52: 0000000000169ce0   253 FUNC    GLOBAL DEFAULT   11 dav1d_get_decode_error_data_props
    53: 0000000000002b8d    10 FUNC    GLOBAL DEFAULT   11 dav1d_version_api
    54: 0000000000169470     9 FUNC    GLOBAL DEFAULT   11 dav1d_data_unref
    55: 00000000001693f0   125 FUNC    GLOBAL DEFAULT   11 dav1d_data_wrap_user_data
    56: 0000000000172e50   164 FUNC    GLOBAL DEFAULT   11 dav1d_send_data
    57: 0000000000002b97    86 FUNC    GLOBAL DEFAULT   11 dav1d_default_settings
    58: 000000000016a720     9 FUNC    GLOBAL DEFAULT   11 dav1d_picture_unref

Bad news: I initially tried hiding the Rust symbol directly (as opposed to the external references in the assembly code), but there doesn't seem to exist any way to do that (the closest you can get to that is by using custom linker scripts in your build.rs, but even then that's just unmaintainable).

Good news: that confirms all of the above!

@teh-cmc teh-cmc force-pushed the cmc/simd_linux_x64_reloc_fiesta branch from 08a9500 to 9c35dc2 Compare October 15, 2024 08:25
@teh-cmc teh-cmc changed the title Linux/x64: enable assembly routines and prevent illegal relocations Linux/x64: re-enable assembly routines and prevent illegal relocations Oct 15, 2024
@teh-cmc teh-cmc marked this pull request as ready for review October 15, 2024 08:25
@teh-cmc teh-cmc force-pushed the cmc/simd_linux_x64_reloc_fiesta branch from 9c35dc2 to 7b51a6d Compare October 15, 2024 08:27
@teh-cmc teh-cmc added the bug Something isn't working label Oct 15, 2024
@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 15, 2024

I will be back to fill the description. I have been back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant