Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macaw-symbolic: Inconsistent reporting of memory writes to read-only sections of memory #418

Open
RyanGlScott opened this issue Aug 1, 2024 · 1 comment
Labels
bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution

Comments

@RyanGlScott
Copy link
Contributor

The default implementation of mkGlobalPointerValidityPred (here) is designed to check if you are attempting a memory write to a read-only section of the global address space, and if so, it will throw an assertion failure. I've discovered that this only works some of the time, however. To see what I mean, let's cook up a small test case that demonstrates writing to a read-only part of memory:

// test.c

int x = 0;

__attribute__((noinline)) int test_and_verify_set_x(void) {
  x = 1;
  return x;
}

void _start(void) {
  test_and_verify_set_x();
}

Writing to x is fine on its own, but I will use objcopy to mark the ELF section that x resides in as read-only (as though x were a constant):

$ ~/Software/powerpc-linux-musl-cross/bin/powerpc-linux-musl-gcc -nostdlib -fno-stack-protector -fcf-protection=none test.c -o test.exe && ~/Software/powerpc-linux-musl-cross/bin/powerpc-linux-musl-objcopy --writable-text --set-section-flags .sbss=readonly test.exe

(I am using a musl-based, PPC32 version of gcc in the example above, but the same principle should apply to any architecture.)

Let's test this out. The easier way I know how to test this is to make the following change to macaw-symbolic:

diff --git a/symbolic/src/Data/Macaw/Symbolic/Testing.hs b/symbolic/src/Data/Macaw/Symbolic/Testing.hs
index 6203e531..85d547aa 100644
--- a/symbolic/src/Data/Macaw/Symbolic/Testing.hs
+++ b/symbolic/src/Data/Macaw/Symbolic/Testing.hs
@@ -514,7 +514,7 @@ simulateFunction binfo bak execFeatures archInfo archVals halloc mmPreset g = do
         let mmConf = (MSM.memModelConfig bak memPtrTbl)
                        { MS.lookupFunctionHandle = lookupFunction archVals binfo
                        , MS.lookupSyscallHandle = lookupSyscall
-                       , MS.mkGlobalPointerValidityAssertion = validityCheck
+                       -- , MS.mkGlobalPointerValidityAssertion = validityCheck
                        }
         pure (initMem, mmConf)
       LazyMemModel -> do
@@ -524,7 +524,7 @@ simulateFunction binfo bak execFeatures archInfo archVals halloc mmPreset g = do
         let mmConf = (MSMLazy.memModelConfig bak memPtrTbl)
                        { MS.lookupFunctionHandle = lookupFunction archVals binfo
                        , MS.lookupSyscallHandle = lookupSyscall
-                       , MS.mkGlobalPointerValidityAssertion = validityCheck
+                       -- , MS.mkGlobalPointerValidityAssertion = validityCheck
                        }
         pure (initMem, mmConf)
   (stackBasePtr, mem1) <- CLM.doMalloc bak CLM.StackAlloc CLM.Mutable "stack_alloc" initMem stackSize CLD.noAlignment

Then compile macaw-{ppc,x86}-symbolic, and then run this test harness program (based on the macaw-{ppc,x86}-symbolic test suites):

-- Bug.hs
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ImplicitParams #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
module Main (main) where

import           Control.Lens ((^.))
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BS8
import qualified Data.ElfEdit as Elf
import qualified Data.Foldable as F
import qualified Data.Map as Map
import           Data.Maybe ( mapMaybe )
import qualified Data.Parameterized.Classes as PC
import qualified Data.Parameterized.Nonce as PN
import           Data.Parameterized.Some ( Some(..) )
import           Data.Proxy ( Proxy(..) )
import           GHC.TypeLits ( KnownNat )

import qualified What4.Config as WC
import qualified What4.Expr as WE
import qualified What4.Expr.Builder as WEB
import qualified What4.FloatMode as WF
import qualified What4.Interface as WI
import qualified What4.ProblemFeatures as WPF
import qualified What4.Solver as WS

import qualified Lang.Crucible.Backend as CB
import qualified Lang.Crucible.Backend.Online as CBO
import qualified Lang.Crucible.Simulator as CS
import qualified Lang.Crucible.LLVM.MemModel as LLVM

import qualified Dismantle.PPC as DP
import qualified SemMC.Architecture.PPC as SAP

import qualified Data.Macaw.CFG as MC
import qualified Data.Macaw.Discovery as M
import qualified Data.Macaw.Memory as MM
import qualified Data.Macaw.Memory.ElfLoader.PLTStubs as MMEP
import qualified Data.Macaw.Symbolic as MS
import qualified Data.Macaw.Symbolic.Testing as MST
import qualified Data.Macaw.PPC as MP
import           Data.Macaw.PPC.Symbolic ()
import qualified Data.Macaw.X86 as MX
import           Data.Macaw.X86.Symbolic ()

main :: IO ()
main = do
  -- These are pass/fail in that the assertions in the "pass" set are true (and
  -- the solver returns Unsat), while the assertions in the "fail" set are false
  -- (and the solver returns Sat).
  let passRes = MST.SimulationResult MST.Unsat
  mkSymExTest passRes MST.LazyMemModel "test.exe"

hasTestPrefix :: Some (M.DiscoveryFunInfo arch) -> Maybe (BS8.ByteString, Some (M.DiscoveryFunInfo arch))
hasTestPrefix (Some dfi) = do
  bsname <- M.discoveredFunSymbol dfi
  if BS8.pack "test_" `BS8.isPrefixOf` bsname
    then return (bsname, Some dfi)
    else Nothing

mkSymExTest :: MST.SimulationResult -> MST.MemModelPreset -> FilePath -> IO ()
mkSymExTest expected mmPreset exePath = do
  bytes <- BS.readFile exePath
  case Elf.decodeElfHeaderInfo bytes of
    Left (_, msg) -> fail ("Error parsing ELF header from file '" ++ show exePath ++ "': " ++ msg)
    Right (Elf.SomeElf ehi) -> do
      case (Elf.headerClass (Elf.header ehi), Elf.headerMachine (Elf.header ehi)) of
        (Elf.ELFCLASS64, Elf.EM_X86_64) -> do
          binariesInfo <- MST.runDiscovery ehi exePath MST.toAddrSymMap MX.x86_64_linux_info MX.x86_64PLTStubInfo
          let funInfos = Map.elems (MST.binaryDiscState (MST.mainBinaryInfo binariesInfo) ^. M.funInfo)
          let testEntryPoints = mapMaybe hasTestPrefix funInfos
          F.forM_ testEntryPoints $ \(_name, Some dfi) -> do
            Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
            sym <- WEB.newExprBuilder WF.FloatRealRepr WE.EmptyExprBuilderState gen
            CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
              -- We are using the z3 backend to discharge proof obligations, so
              -- we need to add its options to the backend configuration
              let solver = WS.z3Adapter
              let backendConf = WI.getConfiguration sym
              WC.extendConfig (WS.solver_adapter_config_options solver) backendConf

              execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
              archVals <- case MS.archVals (Proxy @MX.X86_64) Nothing of
                            Just archVals -> pure archVals
                            Nothing -> error "mkSymExTest: impossible"
              let extract = x86ResultExtractor archVals
              let ?memOpts = LLVM.defaultMemOptions
              simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures MX.x86_64_linux_info archVals binariesInfo mmPreset extract dfi
              if expected == simRes
                then putStrLn "Test passed"
                else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes
        (Elf.ELFCLASS32, Elf.EM_PPC) -> do
          binariesInfo <- MST.runDiscovery ehi exePath MST.toAddrSymMap MP.ppc32_linux_info (MMEP.noPLTStubInfo "PPC")
          let funInfos = Map.elems (MST.binaryDiscState (MST.mainBinaryInfo binariesInfo) ^. M.funInfo)
          let testEntryPoints = mapMaybe hasTestPrefix funInfos
          F.forM_ testEntryPoints $ \(_name, Some dfi) -> do
            Some (gen :: PN.NonceGenerator IO t) <- PN.newIONonceGenerator
            sym <- WEB.newExprBuilder WF.FloatRealRepr WE.EmptyExprBuilderState gen
            CBO.withYicesOnlineBackend sym CBO.NoUnsatFeatures WPF.noFeatures $ \bak -> do
              -- We are using the z3 backend to discharge proof obligations, so
              -- we need to add its options to the backend configuration
              let solver = WS.z3Adapter
              let backendConf = WI.getConfiguration sym
              WC.extendConfig (WS.solver_adapter_config_options solver) backendConf

              execFeatures <- MST.defaultExecFeatures (MST.SomeOnlineBackend bak)
              archVals <- case MS.archVals (Proxy @MP.PPC32) Nothing of
                            Just archVals -> pure archVals
                            Nothing -> error "mkSymExTest: impossible"
              let extract = ppcResultExtractor archVals
              let ?memOpts = LLVM.defaultMemOptions
              simRes <- MST.simulateAndVerify solver WS.defaultLogData bak execFeatures MP.ppc32_linux_info archVals binariesInfo mmPreset extract dfi
              if expected == simRes
                then putStrLn "Test passed"
                else fail $ "Expected " ++ show expected ++ ", but got " ++ show simRes
        _ -> error "Unsupported architecture"

x86ResultExtractor :: (CB.IsSymInterface sym) => MS.ArchVals MX.X86_64 -> MST.ResultExtractor sym MX.X86_64
x86ResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
  let re = MS.lookupReg archVals regs MX.RAX
  k PC.knownRepr (CS.regValue re)

ppcResultExtractor :: ( arch ~ MP.AnyPPC v
                      , CB.IsSymInterface sym
                      , MP.KnownVariant v
                      , MM.MemWidth (SAP.AddrWidth v)
                      , MC.ArchConstraints arch
                      , MS.ArchInfo arch
                      , KnownNat (SAP.AddrWidth v)
                      )
                   => MS.ArchVals arch
                   -> MST.ResultExtractor sym arch
ppcResultExtractor archVals = MST.ResultExtractor $ \regs _sp _mem k -> do
  let re = MS.lookupReg archVals regs (MP.PPC_GP (DP.GPR 3))
  k PC.knownRepr (CS.regValue re)

If you do all that, you should get the following error:

$ ~/Software/powerpc-linux-musl-cross/bin/powerpc-linux-musl-gcc -nostdlib -fno-stack-protector -fcf-protection=none test.c -o test.exe && ~/Software/powerpc-linux-musl-cross/bin/powerpc-linux-musl-objcopy --writable-text --set-section-flags .sbss=readonly test.exe
$ runghc Bug.hs
Bug.hs: Failed to prove goal: test.exe: segment1+0x1e0: error: in test_and_verify_set_x
PointerWrite outside of static memory range (known BlockID 0): let -- test.exe: segment1+0x1d8
    v442 = select cglobalMemoryBytes@1:a 0x1ff6b:[32]
    -- test.exe: segment1+0x1d8
    v447 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff6a:[32]) v442
    -- test.exe: segment1+0x1d8
    v452 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff69:[32]) v447
 in bvConcat (select cglobalMemoryBytes@1:a 0x1ff68:[32]) v452
CallStack (from HasCallStack):
  error, called at src/Data/Macaw/Symbolic/Testing.hs:338:33 in macaw-symbolic-0.0.1-inplace:Data.Macaw.Symbolic.Testing

So far, so good.

The example above used PPC32, however. What happens if we use an x86-64 version of gcc instead?

$ gcc -nostdlib -fno-stack-protector -fcf-protection=none test.c -o test.exe && objcopy --writable-text --set-section-flags .bss=readonly test.exe
$ runghc Bug.hs
Test passed

Huh? How come the x86-64 version passes the test, but the PPC32 version fails? There is a hint in this part of mkGlobalPointerValidityPredCommon:

let ptrVal = CS.regValue ptr
let (ptrBase, ptrOff) = CL.llvmPointerView ptrVal
case WI.asNat ptrBase of
Just 0 -> do
p <- mkPred ptrOff
let msg = printf "%s outside of static memory range (known BlockID 0): %s" (show (MS.pointerUseTag puse)) (show (WI.printSymExpr ptrOff))
let loc = MS.pointerUseLocation puse
let assertion = CB.LabeledPred p (CS.SimError loc (CS.GenericSimError msg))
return (Just assertion)
Just _ -> return Nothing

Note that this will only attempt to emit an outside of static memory range assertion failure if the LLVMPtr representing the global address has a block number of zero. If it has a non-zero block number, then there is no chance of an assertion failure. And sure enough, if we add some additional debug-printing to this function:

diff --git a/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs b/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs
index d72ef96d..ac3094d6 100644
--- a/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs
+++ b/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs
@@ -167,6 +167,10 @@ mkGlobalPointerValidityPredCommon tbl = \sym puse mcond ptr -> do
 
   let ptrVal = CS.regValue ptr
   let (ptrBase, ptrOff) = CL.llvmPointerView ptrVal
+  putStrLn $ unlines
+    [ "RGS mkGlobalPointerValidityPredCommon"
+    , show $ CL.ppPtr ptrVal
+    ]
   case WI.asNat ptrBase of
     Just 0 -> do
       p <- mkPred ptrOff

And then if we re-run the x86-64 version of the program, we see:

$ runghc Bug.hs
<snip>

RGS mkGlobalPointerValidityPredCommon
(1, 0x4000:[64])

<snip>

Test passed

In the x86-64 version of the binary, 0x4000 is x's address. Here, we see that the block number is 1, so macaw-symbolic won't check if the offset is out of range.

On the other hand, if we run the PPC32 version of the program, we see:

$ runghc Bug.hs
<snip>

RGS mkGlobalPointerValidityPredCommon
let -- test.exe: segment1+0x1d8
    v442 = select cglobalMemoryBytes@1:a 0x1ff6b:[32]
    -- test.exe: segment1+0x1d8
    v447 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff6a:[32]) v442
    -- test.exe: segment1+0x1d8
    v452 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff69:[32]) v447
 in bvConcat (select cglobalMemoryBytes@1:a 0x1ff68:[32]) v452

<snip>

Bug.hs: Failed to prove goal: test.exe: segment1+0x1e0: error: in test_and_verify_set_x
PointerWrite outside of static memory range (known BlockID 0): let -- test.exe: segment1+0x1d8
    v442 = select cglobalMemoryBytes@1:a 0x1ff6b:[32]
    -- test.exe: segment1+0x1d8
    v447 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff6a:[32]) v442
    -- test.exe: segment1+0x1d8
    v452 = bvConcat (select cglobalMemoryBytes@1:a 0x1ff69:[32]) v447
 in bvConcat (select cglobalMemoryBytes@1:a 0x1ff68:[32]) v452
CallStack (from HasCallStack):
  error, called at src/Data/Macaw/Symbolic/Testing.hs:338:33 in macaw-symbolic-0.0.1-inplace:Data.Macaw.Symbolic.Testing

This time, we see that the block number is 0 (by virtue of the fact that it's not printing a pair).

What is causing this discrepancy? Ultimately, it's because when simulating the x86-64 version of the program, macaw-symbolic attempts to write to an address that was resolved using MacawGlobalPtr. This will translate an absolute address (e.g., 0x4000 with a block number of 0) into an offset from the LLVMPtr that backs global memory (e.g., (1, 0x4000)). When simulating the PPC32 version of the program, however, macaw-symbolic attempts to write to an absolute address (0x2000) that was not resolved using MacawGlobalPtr. (I'm very unclear on why the two architectures differ in this way.)

What should we do about this? We could investigate why the two architectures differ here, although I'm unsure how deep that rabbit hole goes. In principle, mkGlobalPointerValidityPredCommon ought to be able to handle addresses that were resolved using MacawGlobalPtr, so perhaps we should teach it to also make assertions about LLVMPtrs whose block number matches that of the LLVMPtr backing global memory.

@RyanGlScott RyanGlScott added bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution labels Aug 1, 2024
@RyanGlScott
Copy link
Contributor Author

See #313 for a similar problem affecting other pointer operations (e.g., PtrEq).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug symbolic-execution Issues relating to macaw-symbolic and symbolic execution
Projects
None yet
Development

No branches or pull requests

1 participant