Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macaw-symbolic: Fine-grained tracking of machine code–specific bad behavior (à la HasLLVMAnn) #429

Open
RyanGlScott opened this issue Aug 21, 2024 · 1 comment
Labels
enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution

Comments

@RyanGlScott
Copy link
Contributor

macaw-symbolic is built on top of the crucible-llvm memory model, which means that it has the ability to report instances of C-oriented bad behavior (assuming that the underlying machine code adheres to C's memory model conventions). These instances of bad behavior are tracked via the HasLLVMAnn constraint that is threaded throughout macaw-symbolic.

In addition to C memory model checks, macaw-symbolic also adds a variety of additional assertions that are specific to machine code. These include (but are likely not limited to):

  • Checking if reads from and writes to the global address space are within bounds:

    p <- mkPred ptrOff
    let msg = printf "%s outside of static memory range (known BlockID 0): %s" (show (MS.pointerUseTag puse)) (show (WI.printSymExpr ptrOff))
    let loc = MS.pointerUseLocation puse
    let assertion = CB.LabeledPred p (CS.SimError loc (CS.GenericSimError msg))

    p <- mkPred ptrOff
    zeroNat <- WI.natLit sym 0
    isZeroBase <- WI.natEq sym zeroNat ptrBase
    p' <- WI.itePred sym isZeroBase p (WI.truePred sym)
    let msg = printf "%s outside of static memory range (unknown BlockID): %s" (show (MS.pointerUseTag puse)) (show (WI.printSymExpr ptrOff))
    let loc = MS.pointerUseLocation puse
    let assertion = CB.LabeledPred p' (CS.SimError loc (CS.GenericSimError msg))

  • Using a default value when a conditional read's condition does not hold:

    let useDefault msg =
    do notC <- notPred sym cond
    let errMsg = "[doCondReadMem] " ++ msg
    assert bak notC (AssertFailureSimError errMsg errMsg)
    return def

  • A variety of checks when performing memory operations involving pointers:

    check :: (IsSymBackend sym bak) => bak -> Pred sym -> String -> String -> IO ()
    check bak valid name msg = assert bak valid
    $ AssertFailureSimError errMsg errMsg
    where
    errMsg = "[" ++ name ++ "] " ++ msg

  • The attn signal (in macaw-ppc-symbolic):

    MP.Attn -> do
    let reason = C.GenericSimError "ppc_attn"
    C.addFailedAssertion bak reason

  • Loss of floating-point precision (in macaw-ppc-symbolic):

    cond <- C.iFloatEq @_ @(MS.ToCrucibleFloatInfo fi') sym x'
    =<< C.iFloatCast @_ @_ @(MS.ToCrucibleFloatInfo fi)
    sym
    (MS.floatInfoToCrucible fi')
    C.RNE
    res
    C.assert bak cond $ C.GenericSimError $
    "float precision loss: coercing from "
    ++ show (MS.floatInfoToCrucible fi')
    ++ " to "
    ++ show (MS.floatInfoToCrucible fi)
    return res

  • Dividing by zero in a division-related instruction (in macaw-x86-symbolic):

    -- Check denominator is not 0
    do let bvZ = app (BVLit dw (BV.zero dw))
    denNotZero <- evalApp sym $ Not (app (BVEq dw den bvZ))
    let errMsg = "denominator not zero"
    assert bak denNotZero (C.AssertFailureSimError errMsg (errMsg ++ " in Data.Macaw.X86.Crucible.getDenominator"))

  • A quotient overflowing in a division-related instruction (in macaw-x86-symbolic):

    qNoOverflow <- evalApp sym $ BVEq nw (ValBV nw qExt) qExt'
    let errMsg = "quotient no overflow"
    assert bak qNoOverflow (C.AssertFailureSimError errMsg (errMsg ++ " in Data.Macaw.X86.Crucible.uDivRem"))

    qNoOverflow <- evalApp sym $ BVEq nw (ValBV nw qExt) qExt'
    let errMsg = "quotient no overflow"
    assert bak qNoOverflow (C.AssertFailureSimError errMsg (errMsg ++ " in Data.Macaw.X86.Crucible.sDivRem"))

Unlike the checks in the crucible-llvm memory model, however, these assertions are all performed via GenericSimError or AssertFailureSimError. As a result, it is not straightforward to catch these assertions and perform subsequent analysis on them after simulation fails.

I propose that macaw-symbolic add a constraint similar to HasLLVMAnn (perhaps HasMacawAnn, for lack of a better name) and use it to track the What4 annotations of each of the terms that give rise to these assertion failures. That way, one can consult the Map of bad macaw-symbolic behaviors afterwards and match the annotations to the corresponding terms. This would require a fair bit of API churn in order to thread the new HasMacawAnn constraint through to the relevant functions, however.

@RyanGlScott RyanGlScott added enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution labels Aug 21, 2024
@langston-barrett
Copy link
Contributor

For reference, here is the definition of HasLLVMAnn:

https://github.com/GaloisInc/crucible/blob/93bfa7f7858eb458f83eb3e0f030bf600ef72365/crucible-llvm/src/Lang/Crucible/LLVM/MemModel/Partial.hs#L121

The parameter ?recordLLVMAnnotation is usually used in the following way:

  • A Pred sym, representing a safety condition, is annotated with What4.Interface.annotateTerm
  • An instance of BadBehavior is constructed that represents the meaning of the safety condition
  • ?recordLLVMAnnotation is called with these arguments, it generally stashes the (annotation, BadBehavior) pair into a MapF that lives in an IORef, where it can later be retrieved for analysis
  • The predicate is eventually asserted
  • During processing of goals, if an assertion fails and its predicate was annotated and that annotation appears in the map, tools can look up the BadBehavior from the map and analyze it.

I would actually suggest we consider supporting an API in Macaw that diverges slightly from this scheme, to potentially support more different use-cases. I have a particular proposal below, but just generally think we should consider alternatives.

We could create an enumeration (perhaps MacawError) of all (or at first, most) of the assertions that Macaw makes, and then provide an interface like so:

data MacawError sym where
  divByZero :: forall w. (1 <= w) => SymExpr (BVType w) -> MacawError sym
  -- more constructors here ...

-- | Given a safety predicate and a description of the error it represents,
-- return a new predicate (and possibly perform additional side-effects, such as
-- recording information about the predicate).
type MacawProcessAssertion sym
  = (?processAssert :: sym -> Pred sym -> MacawError sym -> IO (Pred sym))

With this scheme, the MacawProcessAssertion sym could still support the existing use-case of recording information about Pred syms to be used in later analysis, but it has more power:

  • It has access to the Pred sym itself, not just an annotation on it, so

    • it has the ability to not annotate the predicate if it doesn't want to
    • it can observe, e.g., if the Pred is a constant before deciding what to do next
  • It can change the predicate, e.g., to skip making certain classes of assertions by returning truePred sym

... and potentially other use-cases not yet envisioned here. A default implementation might be:

defaultMacawProcessAssertion _sym p e = pure p

Or, to support the existing use-case (not 100% sure this type-checks):

defaultMacawProcessAssertion mapRef sym p e = do
  (n, p') <- annotateTerm sym p
  modifyIORef mapRef (MapF.insert n e)
  pure p'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement symbolic-execution Issues relating to macaw-symbolic and symbolic execution
Projects
None yet
Development

No branches or pull requests

2 participants