Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MutateCSmithProvider (WIP) #1

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
### Testing ###
tmp/
platform.info

### VisualStudioCode ###
.vscode/
Expand Down
33 changes: 20 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,35 @@
# SemPy

A tool intended for preventing compiler regressions by comparing different
machine code compiled from the same assembly code.
A tool for testing compiler optimization by compiling the same code under
various optimization levels and comparing emulation results.

Currently, only x86 is supported.

Name subject to change.

## Requirements

```shell
pip install unicorn pwntools tqdm prettytable
poetry install
```

Please ensure that the target LLVM binaries are in PATH.

In addition, download and `make` CSmith runtime files in home directory (i.e.
`~/csmith/runtime`). In the future, the location of the runtime directory will
be changed into a command-line option that defaults to the system install
location of libcsmith0 package.

For [IRFuzzer](https://github.com/SecurityLab-UCD/IRFuzzer)-related program
providers such as `irfuzzer` and `mutate-csmith`, please ensure that the
MutatorDriver is compiled and present in PATH.

## Example

```shell
# assemble into ELF format
as foo.a.s -o foo.a.elf
as foo.b.s -o foo.b.elf
# copy executable code out of ELF format
objcopy -O binary foo.a.elf foo.a.bin
objcopy -O binary foo.b.elf foo.b.bin
# emulate and compare
./sem.py --arch x86 --mode 64 --count 10000 --seed 12345 foo.{a,b}.bin
# -e 0: Use all cores to compare -O0 and -O3
./sem.py -p mutate-csmith -e 0 -o ./experiment -O03
```

If you encounter an "Invalid instruction" error, chances are Unicorn Engine /
QEMU does not support the associated CPU extension yet.
To terminate, Ctrl+C and run `pkill -f sem.py`. Relevant program seeds can be
found in the specified output directory.
810 changes: 810 additions & 0 deletions poetry.lock

Large diffs are not rendered by default.

20 changes: 20 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
[tool.poetry]
name = "sempy"
version = "0.1.0"
description = "Compiler testing through emulation."
authors = ["Zhenkai Weng <[email protected]>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"
unicorn = "^2.0.1.post1"
pwntools = "^4.10.0"
tqdm = "^4.66.1"
prettytable = "^3.8.0"
pycparser = "^2.21"
pyelftools = "0.29"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
62 changes: 62 additions & 0 deletions scripts/finditer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env bash

if [[ $1 = "-h" ]]; then
echo "Usage: FN_INFO=<FN_INFO> $0 <SEED> <OUTPUT>"
exit 1
fi
# Argument parsing
SEED=$1
NUMERIC_RE='^[0-9]+$'
if ! [[ $SEED =~ $NUMERIC_RE ]]; then
echo "Please specify a valid program seed"
exit 1
fi
OUTPUT=$2
if [[ -z $OUTPUT ]]; then
echo "Please specify a valid output directory"
exit 1
fi
if [[ -z $FN_INFO ]]; then
echo "Please specify chosen function information in \$FN_INFO"
exit 1
fi

# Prepare temp dir for sem.py output
TEMP=$(mktemp -d -p /dev/shm)
cleanup() {
rm -rf "$TEMP"
}
trap cleanup EXIT

# Goal: Find exact iteration where differences were introduced (assuming `-p mutate-csmith`)
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
MAX_ITERS=100
DIFF_CODE=11

for iter in $(seq 1 $MAX_ITERS); do
echo ">>>===== Iteration $iter =====<<<"
NUM_MUTATE=$iter "$SCRIPT_DIR"/../sem.py -q --debug -p mutate-csmith --repro $SEED --keep-data -O03 -o "$TEMP"
err=$?
if [[ $err = $DIFF_CODE ]]; then
if ! [[ -d $OUTPUT ]]; then
mkdir -p $OUTPUT
fi
echo "Difference found at iteration $iter"

# If difference found, save code before and after mutation
if [[ -d $TEMP/before ]]; then
rm -rf "$OUTPUT/before"
cp -r "$TEMP/before" "$OUTPUT"
fi
rm -rf "$OUTPUT/after"
cp -r "$TEMP/$SEED" "$OUTPUT/after"
exit
elif [[ $err != 0 ]]; then
echo "Error occurred"
exit 1
else
# Save current as "before"
rm -rf "$TEMP/before"
mv "$TEMP/$SEED" "$TEMP/before"
fi
done
27 changes: 27 additions & 0 deletions scripts/repro.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env bash
if ! [[ -d $1 ]]; then
echo "Please specify a seeds directory"
fi

SEEDS_DIR="$1"

SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )

repro() {
echo "Seed: $1"
expected_diff="$SEEDS_DIR/$1/diff.txt"
RED='\033[0;31m'
GREEN='\033[0;32m'
RESET='\033[0m'
echo -e "Expected diff:${RED}"
cat "$expected_diff"
echo -e "${RESET}"
echo -e "Actual diff:${GREEN}"
"$SCRIPT_DIR"/../sem.py --repro $1 --debug -O03 -o "$SEEDS_DIR-repro" 2>/dev/null
echo -e "${RESET}\n==========\n"
}

for seed_dir in "$SEEDS_DIR"/*/; do
seed=$(basename "$seed_dir")
repro $seed
done
Loading