-
Notifications
You must be signed in to change notification settings - Fork 538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microblogging: Synthesizing (obfuscated) expressions #1074
Comments
Great news, I will have to check it out! |
Awesome!! 🚀 |
Good work. |
awesome |
Ok, we are now able to synthesize sub expressions of a tree. It leads to a synthesis of infinite operators with a very small oracle table (only ~700 entries). It also support constant synthesis. Below an example. Let's consider the following function. Its original expression is: int f(unsigned a, unsigned b) {
unsigned n = (a & ~(-((((b & a) * (b | a) + (b & ~a) * (~b & a)) & b) * \
(((b&a)*(b|a) + (b& ~a)*(~b&a))|b) + (((b&a)*(b| a) + (b & ~a) * \
(~b & a)) & ~b) * (~((b & a) * (b | a) + (b & ~a) * (~b & a)) & b)) - 1)) - \
(~a & (-((((b & a) * (b | a) + (b & ~a) * (~b & a)) & b) * (((b & a) * \
(b | a) + (b & ~a) * (~b & a)) | b) + (((b & a) * (b | a) + (b & ~a) * \
(~b & a)) & ~b) * (~((b & a) * (b | a) + (b & ~a) * (~b & a)) & b)) - 1)); \
return n;
} The Triton tool that attacks this function is the following: #!/usr/bin/env python
## -*- coding: utf-8 -*-
import sys
from triton import *
CODE = b"\x55\x48\x89\xE5\x89\x7D\xEC\x89\x75\xE8\x8B\x45\xE8\x23\x45\xEC"
CODE += b"\x89\xC2\x8B\x45\xE8\x0B\x45\xEC\x89\xD1\x0F\xAF\xC8\x8B\x45\xEC"
CODE += b"\xF7\xD0\x23\x45\xE8\x89\xC2\x8B\x45\xE8\xF7\xD0\x23\x45\xEC\x0F"
CODE += b"\xAF\xC2\x01\xC8\x23\x45\xE8\x89\xC2\x8B\x45\xE8\x23\x45\xEC\x89"
CODE += b"\xC1\x8B\x45\xE8\x0B\x45\xEC\x89\xCE\x0F\xAF\xF0\x8B\x45\xEC\xF7"
CODE += b"\xD0\x23\x45\xE8\x89\xC1\x8B\x45\xE8\xF7\xD0\x23\x45\xEC\x0F\xAF"
CODE += b"\xC1\x01\xF0\x0B\x45\xE8\x89\xD6\x0F\xAF\xF0\x8B\x45\xE8\x23\x45"
CODE += b"\xEC\x89\xC2\x8B\x45\xE8\x0B\x45\xEC\x89\xD1\x0F\xAF\xC8\x8B\x45"
CODE += b"\xEC\xF7\xD0\x23\x45\xE8\x89\xC2\x8B\x45\xE8\xF7\xD0\x23\x45\xEC"
CODE += b"\x0F\xAF\xC2\x8D\x14\x01\x8B\x45\xE8\xF7\xD0\x89\xD1\x21\xC1\x8B"
CODE += b"\x45\xE8\x23\x45\xEC\x89\xC2\x8B\x45\xE8\x0B\x45\xEC\x89\xD7\x0F"
CODE += b"\xAF\xF8\x8B\x45\xEC\xF7\xD0\x23\x45\xE8\x89\xC2\x8B\x45\xE8\xF7"
CODE += b"\xD0\x23\x45\xEC\x0F\xAF\xC2\x01\xF8\xF7\xD0\x23\x45\xE8\x0F\xAF"
CODE += b"\xC1\x8D\x14\x06\x8B\x45\xEC\x01\xD0\x83\xC0\x01\x89\x45\xFC\x8B"
CODE += b"\x45\xFC\x5D\xC3"
def emulate(ctx, pc):
while pc:
opcode = ctx.getConcreteMemoryAreaValue(pc, 16)
instruction = Instruction(pc, opcode)
ctx.processing(instruction)
print(instruction)
pc = ctx.getConcreteRegisterValue(ctx.registers.rip)
return
def main():
ctx = TritonContext(ARCH.X86_64)
ast = ctx.getAstContext()
ctx.setMode(MODE.CONSTANT_FOLDING, True)
ctx.setMode(MODE.ALIGNED_MEMORY, True)
ctx.setMode(MODE.AST_OPTIMIZATIONS, True)
a = ast.variable(ctx.symbolizeRegister(ctx.registers.edi, "a"))
b = ast.variable(ctx.symbolizeRegister(ctx.registers.esi, "b"))
ctx.setConcreteMemoryAreaValue(0x1000, CODE)
print('Emulate the code')
emulate(ctx, 0x1000)
eax = ctx.getRegisterAst(ctx.registers.eax)
print()
print('Synthesis IN:', ast.unroll(eax))
print()
eax = ctx.synthesize(eax, constant=False, subexpr=True)
print('Synthesis OUT:', ast.unroll(eax))
return 0
if __name__ == '__main__':
sys.exit(main()) The output is the following: $ time python test.py
Emulate the code
0x1000: push rbp
0x1001: mov rbp, rsp
0x1004: mov dword ptr [rbp - 0x14], edi
0x1007: mov dword ptr [rbp - 0x18], esi
0x100a: mov eax, dword ptr [rbp - 0x18]
0x100d: and eax, dword ptr [rbp - 0x14]
0x1010: mov edx, eax
0x1012: mov eax, dword ptr [rbp - 0x18]
[...]
0x10d4: mov eax, dword ptr [rbp - 0x14]
0x10d7: add eax, edx
0x10d9: add eax, 1
0x10dc: mov dword ptr [rbp - 4], eax
0x10df: mov eax, dword ptr [rbp - 4]
0x10e2: pop rbp
0x10e3: ret
Synthesis IN: (bvadd (bvadd a ((_ extract 31 0) (bvadd ((_ zero_extend 32) (bvmul (bvand (bvadd (bvmul (bvand (bvnot b) a) (bvand (bvnot a) b)) (bvmul (bvand b a) (bvor b a))) b) (bvor (bvadd (bvmul (bvand (bvnot b) a) (bvand (bvnot a) b)) (bvmul (bvand b a) (bvor b a))) b))) ((_ zero_extend 32) (bvmul (bvand (bvnot (bvadd (bvmul (bvand (bvnot b) a) (bvand (bvnot a) b)) (bvmul (bvand b a) (bvor b a)))) b) (bvand ((_ extract 31 0) (bvadd ((_ zero_extend 32) (bvmul (bvand b a) (bvor b a))) ((_ zero_extend 32) (bvmul (bvand (bvnot b) a) (bvand (bvnot a) b))))) (bvnot b))))))) (_ bv1 32))
Synthesis OUT: (bvadd (bvadd a (bvmul (bvmul a b) b)) (_ bv1 32))
python test.py 0.07s user 0.00s system 99% cpu 0.075 total We found : To the moon 🚀 🚀 [0] https://www.ndss-symposium.org/wp-content/uploads/2020/04/bar2020-23009.pdf |
We now support opaque constant synthesis. It relies on SMT logic more than pure synthesis, so it's not used by default. To use it we have to set the It allows to synthesis expression like this:
The input and output are the following:
By mixing SMT and pure synthesis we can do very great simplifications. |
Introduction
Software are getting more and more protected against reverse engineering. Several software protections exist [1]. In the past we already attacked some of these protections with Triton using symbolic and taint analysis [2, 3, 4, 5, 6]. Some software protections (like Mixed Boolean-Arithmetic expression: MBA [7]) transform simple operation into a more complicated one by mixing arithmetic operators (e.g:
ADD
,SUB
,MUL
) with bit-wise operators (e.g:AND
,OR
,XOR
,ROL
). For example:x + y == (x | y) + y - (~x & y)
x ^ y == (x | y) - y + (~x & y)
x ^ y == (x & ~y) | (~x & y)
x | y == (x ^ y) + y - (~x & y)
x & y == -(x | y) + y + x
In this contribution we introduce a new engine which lay the foundation of expression synthesis and aims to attack such protections. Synthesis is a well known analysis when reversing obfuscated expressions [8, 9, 10] and tool like Qsynth [10] already relies on Triton for its analysis. That's why I thought it will be nice if we can add some program synthesis concepts in Triton.
The engine and its components
We have added the engine
Synthesizer
into a new namespace:triton::engines::synthesis::
and a new method has been added into theAPI
to synthesize expressions. The engine is almost standalone, it only relies on the symbolic engine to create new symbolic variables but does not alters the symbolic state of your Triton context.The
SynthesisResult
object contains the input AST (the obfuscated one), the output AST (the synthesized one), the time in milliseconds of the analysis and a flagsuccess
which istrue
if the input AST has been synthesized.Oracle table with two variables
Program synthesis uses an oracle table to determine the behavior of obfuscated expressions. We consider the target as a black box and by seeing the result of a given input we can determine what the black box does. Our oracle table is located in
triton::engines::synthesis::oracles::binopTable
and contains several entries. All these entries are used for analyzing an expression that contains two variables and one operator. For example, these following entries are used to synthesize abvadd
node.A
BinaryEntry
represents an entry in our oracle table. For example, the entry:BinaryEntry(8, 0xab, 0xf4, 0x9f)
is structured as follows:8
is the size in bits of the expression0xab
is the input value of the first variable (e.g:x
)0xf4
is the input value of the second variable (e.g:y
)0x9f
is the result value of thebvadd
operator computation on those input valuesIt means that if we inject into our black box (the obfuscated expression)
0xab
and0xfa
into arguments and if the result of the executed black box is0x9f
, it means that the obfuscated expression might do anADD
operation. To be sure, we have to execute different entries. That's why in our oracle table we have multiple entries, with different values, on different sizes, and for all operators.Constant synthesis
The engine also supports constant synthesizing. Let's consider, for example, the following expression:
((z << 8) >> 16) << 8
, this expression contains only one variable and can be synthesized as follows:z & 0xffff00
. To do that, the synthesizer uses the SMT quantifierforall
and the constraint is expressed as follows: For all entriesz
, there exists a symbolic constantC
such that((z << 8) >> 16) << 8 == z <operator> C
. We have to request this constraint to the solver with all operators to find the good one and its constant.((z << 8) >> 16) << 8 == z & C
((z << 8) >> 16) << 8 == z ^ C
((z << 8) >> 16) << 8 == z | C
((z << 8) >> 16) << 8 == z * C
((z << 8) >> 16) << 8 == z + C
((z << 8) >> 16) << 8 == z - C
((z << 8) >> 16) << 8 == C - z
Constant synthesis is more relying on SMT reasoning than pure synthesis concepts as it uses SMT solver to solve constraint. However it seemed good to me to add this feature in the synthesis analysis because constant obfuscation is a real thing used in modern software (see examples below).
How to use the synthesizer
Ok let's talk about examples and how to use the synthesizer. Well, it's pretty simple, you just have to call
ctx.synthesize
on your obfuscated expression and if the synthesis succeed, you have a synthesized expression at return. You can work on standalone expressions as well as from an execution of obfuscated code.Working on standalone expressions
The code below deals with standalone expressions. We extracted/crafted our own obfuscated expressions and for each expression we call
ctx.synthesize
.The result is the following:
Working on obfuscated trace (emulate the code)
I use Triton a lot of time when doing reverse engineering. I mainly start doing the reverse part and then when some parts of the code need better investment, I use Triton to emulate the piece of code that I'm working on. Thus, I can follow what instructions are executed, what data they use, taint data and follow them, define symbolic variables, asking for constraints, etc. It's mainly like a scriptable Python debugger but with a set of program analysis utilities. For example, let's go back with the constant obfuscation of the
(x ^ 0x5c)
operation from a modern software. You can easily extract the opcodes of the function to work on it. Then, you can define the input of the function as a symbolic variable, emulate the function and synthesize its return.The result is the following:
As you can see, the result of the synthesis is:
(bvxor x (_ bv92 8))
(SMT syntax). If you prefer, you can set the Python syntax like below:If you do not want to work on big expression at the end of the execution, you can also define a simplification callback that will be called each time an AST is assigned to a register or a memory cell during the execution (see below). Thus, you can perform a synthesis before each assignment. In other words, you synthesize the expression on the fly.
Into IDA
You can also use IDA and its API to extract opcodes and directly print results in the IDA console. For example, an IDA script could be like this:
It provides the following result:
Conclusion
We introduced a new engine which aims to synthesize expressions. It currently do constant synthesizing and two variables with one operator synthesizing.
Synthesis is applied on root node and we do not yet go through the DAG to synthesize children branches. We let this job for the future. However, you can applied synthesis during the execution which synthesizes nodes at the construction which is pretty similar than going through the DAG when the full expression is built(edit: we do analyze children nodes now). In the future, oracle table may grow to support multiple operators and variables.Bibliography
[01] Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection (book)
[02] https://github.com/JonathanSalwan/Tigress_protection
[03] http://shell-storm.org/talks/DIMVA2018-deobfuscation-salwan-bardin-potet.pdf
[04] http://shell-storm.org/talks/SSTIC2017_Deobfuscation_of_VM_based_software_protection.pdf
[05] http://shell-storm.org/talks/csaw2016-sos-rthomas-jsalwan.pdf
[06] http://shell-storm.org/talks/sthack2016-rthomas-jsalwan.pdf
[07] https://tel.archives-ouvertes.fr/tel-01623849/document
[08] https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/blazytko
[09] https://github.com/mrphrazer/msynth
[10] https://www.ndss-symposium.org/wp-content/uploads/2020/04/bar2020-23009.pdf
The text was updated successfully, but these errors were encountered: