Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symbolFor is unpredictable in the presence of function symbol aliases #25

Open
RyanGlScott opened this issue Aug 22, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@RyanGlScott
Copy link
Contributor

Consider the following C program:

// test.c
int foo(void) {
  return 0;
}

extern typeof(foo) foo_weak __attribute__((__weak__, __alias__("foo")));

int main(void) {
  return foo();
}

This defines a function foo, which gets compiled into a function symbol with GLOBAL visibility when compiled to an ELF binary. It also defines a function foo_weak, which uses a GCC-/Clang-specific attribute to compile it to a function symbol with WEAK visibility. Moreover, foo_weak is defined as an alias for foo, so both function symbols are placed at the same address in the ELF binary:

$ clang test.c -o test.exe 
$ readelf --syms test.exe 

<snip>

Symbol table '.symtab' contains 37 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
<snip>
    27: 0000000000001130     8 FUNC    GLOBAL DEFAULT   14 foo
<snip>
    30: 0000000000001130     8 FUNC    WEAK   DEFAULT   14 foo_weak
<snip>

(The technique of defining weak aliases is somewhat common in musl.)

Having multiple function symbols at the same address like this confuses macaw-loader's symbolFor function. To explain what I mean, let's use the following macaw-loader–based driver program to load the function symbol at address 0x1130:

--- Main.hs
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# OPTIONS_GHC -Wall #-}
module Main (main) where

import qualified Data.ByteString as BS
import qualified Data.ElfEdit as EE

import qualified Data.Macaw.BinaryLoader as DMB
import Data.Macaw.BinaryLoader.X86 ()
import qualified Data.Macaw.Memory as MM
import qualified Data.Macaw.Memory.LoadCommon as MML
import qualified Data.Macaw.X86 as MX

main :: IO ()
main = do
  bytes <- BS.readFile "test.exe"
  case EE.decodeElfHeaderInfo bytes of
    Left (_off, msg) -> fail msg
    Right (EE.SomeElf e) -> do
      EE.ELFCLASS64 <- pure $ EE.headerClass $ EE.header e
      lb <- DMB.loadBinary @MX.X86_64 options e
      sym <- DMB.symbolFor lb $ MM.absoluteAddr 0x1130
      print sym
  where
    options = MML.LoadOptions { MML.loadOffset = Just 0 }

Well, I say "the function symbol", but which one in particular? As it turns out, symbolFor is somewhat unpredictable in which symbol it picks. On the program above, it happens to pick foo_weak:

$ runghc Main.hs
"foo_weak"

Let's suppose we renamed foo_weak to something else, however:

-extern typeof(foo) foo_weak __attribute__((__weak__, __alias__("foo")));
+extern typeof(foo) abc      __attribute__((__weak__, __alias__("foo")));

Now, the driver program picks foo instead of abc!

$ clang test.c -o test.exe 
$ runghc Main.hs
"foo"

The explanation for what we see above is that internally, macaw-loader-x86 creates a Map of symbol addresses to symbol names. The order in which macaw-loader-x86 happens to encounter each symbol in the binary determines which symbol name is inserted into the Map. In the example above, the user-defined symbols in the binary happen to be sorted in more-or-less alphabetical order, so the symbol name that comes last alphabetically will be the one that is ultimately placed in the Map.

This is quite unfortunate, as it means that you're never quite sure which symbol you're going to get. Moreover, I have a downstream application that uses symbolFor to map function addresses to names, and if a binary defines weak symbols, then this can cause confusion when symbolFor picks a weak symbol name instead of a globally visible name (most users would expect the globally visible names).

Possible ways of addressing this issue:

  • Change the return type of symbolFor to return a non-empty list of all the symbol names associated with that address.
  • Keep the type of symbolFor the same, but also introduce another class method that can be used to query all of the symbol names associated with an address. Document that symbolFor picks one symbol name arbitrarily.
@RyanGlScott RyanGlScott added the bug Something isn't working label Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant