-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathnotes.txt
195 lines (159 loc) · 11.9 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
[Introduction & Motivation: 3 minutes]
-Who am I
- I’m Nathan, I work on highly concurrent C codebases and language runtimes; in grad school I worked on dynamic analysis tools to improve program performance
-What is dynamic analysis
- To cite: The Concept of Dynamic Analysis
- Two implementation techniques: build "walled simulation" around system to observe (the SNES emulator model), or run natively and inject instrumentation into the system (the early hypervisor model)
- Classic use cases:
- input for developers understanding a new codebase
- debugging crashes & races "We must remind ourselves again that history as usually written (peccavimus) is quite different from history as usually lived: the historian records the exceptional because it is interesting-because it is exceptional."
- security analysis of obfuscated binaries or malware
-Why should you care about this talk
- huge gulf between acadmic work on dynamic analysis tools and what we see in production
- many techniques shared with other areas you might find interesting, like virtualization and just-in-time compilers
- advanced techniques require hardware support, but hardware companies are watching out for what software people want...
- more powerful tools require greater performance sacrifices: knowing the lay of the land will help you better choose what techniques are right for your needs
-Static vs dynamic analysis
- why not Just Use Types? What must we do dynamically that we can't do statically?
- Static analysis: conservative approximation of runtime behaviour
- programs are only partially known (dynamic linking, user input), set of possibilities becomes huge
- it is imprecise: consider incorrect programs that a type checker would permit; pointer analysis, esp across shared objects / class files
- many language specs assume depend on single thread for defn of program correctness; race conditions can slip through the cracks here
- cite: On The LImits Of Infomation Flow, Dynamic Analysis from the Ground Up
- Dynamic analysis: can't prove correctness but can demonstrate failure
- Talk plan
- Begin with a conventional debugger, and extend the ideas there successively for increasingly-powerful & difficult forms of analysis
- Memory tracking
- Online Record & Offline Replay
- Full-system analysis
- Multicore Considerations
- Hardware Support / Wishlist
[Memory tracking: 20 minutes]
- A gentle introduction: time travelling debugging
- Problem statement: What assigned this pointer value to 7? (cite The Night's Watch)
- Time-travelling debugging: GDB7, some experimental Java runtimes [5 minutes]
- Consider a null pointer dereference crash: What does a stack trace not get you?
- Functions that have returned
- Previous / overwritten program state
- Traditional debugging approach: from crash, set increasingly early breakpoints & restart to walk backwards in time - O(n^2)
- Idea: create a data structure that allows the debugger to reconstruct the program state at a given period of time
- GDB 7: the most straightforward way to build this
- Doubly-linked list of register & memory writes (./gdb/record-full.c)
- Single-step & instruction emulation. Very slow!
- Not a criticism, serves to show that you don't need highly optimized algorithms to build a dynamic analysis tool
- Demo, maybe?
- Focus on usability rather than performance typical in the research literature.
- How better to store trace information [2.5 minutes]
- Having to replay all operations (see ReVirt) to get to a particular state is suboptimal
- Scalable Omniscient Debugging: B+ tree
- Optimized for high-speed offline recording rather than replaying - does this meet your usecase better?
- TTVM: checkpoint with redo/undo log
- contains CPU registers, memory state, virtualized disk
- Differences in checkpoints are stored via COW
- To reach a place, restore most recent checkpoint and walk forward (using redo log) or back (using undo log)
- Only stores non-deterministic events! With the state stored at checkpoints, that's enough to replay the full execution
- We will have more to say about this in the SMP world
- Speeding execution up with JITs [5 minutes]
- Data driven rather than execution-driven: Use techniques from language runtimes to build more flexible analysis tools
- Don't store record of how the program mutates itself over time, but recompile itself to add arbitrary instrumentation
- JIT: Rather than compiling bytecode to native instructions, TRANSLATE native to native (with instrumentation)
- Most instructions (arithmetic, data moves, etc) are unchanged; we just have to make sure we don't escape the JIT
- Jumps have to go to code that we generated, not back to the original code segment
- Pin: clone the code segment as it executes, weaving extra instrumentation and jump rewrites in as you go
- Walk through example w/ indirect target chaining?
- Valgrind: "disassemble and resynthesize" - decompile to an IR, weave in IR-level instrumentation, reassemble
- Have to mention Shadow Memory for AddressSanitizer, probably SMT-ReVirt too
- Memory tracking in practice [7.5 minutes]
- Problem statement: what values did a variable / piece of memory have over time? how does a value get copied through the system?
- All this sounds enticing because it can be applied to unmodified software, and we claim no false positives
- TaintBochs: observes "sensitive" data (like passwords)'s lifetimes in major applications: Perl, Mozilla, Apache
- Begin with sensitive data from "the outside world" (e.g. keyboard entry, network traffic), watch the taint spread through the system
- Any operation that uses "sensitive" data is itself tainted and considered sensitive too
- Copying data obviously taints it, but what about processor instruction with output that is a function of tainted input?
- "if any byte of any input value is tainted, then all bytes of the output are tainted.
- Problems with this approach:
- Lookup tables - output is a function of _control flow changes_ that depended on input
- entropy-seeking devices - eg. keyboard entry used to seed RNG, which can be used everywhere in the system
- Claimed to not be a problem in practice
- Walk through Mozilla example
- "Pointless Tainting" - a more cynical view. No false positives but what about false negatives?
- Focused on the limitations of this idea - easy to circumvent if trying to analyze malware
- benign actions like configuring IP addresses causes explosion of tainted memory
[Multicore considerations: 20 minutes]
- Limitations of the above [ 2 minutes]
- We assume a single thread of operation, or a single processor system!
- Either we haven't explicitly mentioned SMP or we assume single thread of execution
- in a multiprocessor environment, a non-deterministic event that we have to record is outcomes of memory races
-SMP Record & Replay [5 minutes]
- What's hard about concurrency
- News flash: your computer is a distributed system
- In a single-threaded context, non-deterministic events are still synchronous; now we consider non-determinism in timing
- Can build SMP record/replay systems by serializing threads or tracking synchronization (e.g. "who gets the lock" or
"who gets scheduled to run")
- Show Valgrind tool that does this
- But, this means data races, usually the problems we want to look for, are lost and remain ambiguous!
- Literature mostly assumes a sequentially consistent memory system
- Happens-Before (Lamport: Time, Clocks, and Ordering of Events in a Distributed System)
- happens-before relation can therefore be computed using vector clocks
- CREW protocol: everyone has read permission, nobody has write permission, OR, one has RW, no one else has anything
- SMP-ReVirt: "two processors accessing the same page w/ one being a write, a CREW event is emitted to adjust permissions"
- Implemented by hardware page protection. How do we we do this without confusing the underlying system? Hypervisor + shadow page tables
- This doesn't work for DMA, which is totally under the control of the hardware. Future work might require custom hardware...
- Generally-speaking, Happens-Before generates no false positives but hard to implement efficiently
- Race detection [10 minutes?]
- Different but very practical problem: don't need to record entire execution but only observe a failure of locking dicipline
- Examples: https://github.com/google/sanitizers/wiki/ThreadSanitizerPopularDataRaces
- Implementation: implement all read and writes, and all calls to lock() and unlock(), across all threads
- You have techniques to do this now!
- Eraser: Uses lockset rather than Happens-Before property
- Show example that H-B would miss
- Show iterative refinement algorithm: if the set of locks that protect a variable becomes empty, we've found a potential datarace
- Potential because the two threads may not have interfered with each other
- "Performance was not a goal" : 10x - 30x slowdown
- Valgrind-style shadow memory used to store locksets for each shared variable.
- ThreadSanitizer: a hybrid of happens-before and lockset; used by (among other things) Chromium and Golang
- Weaves instrumenting at compile-time, just as a JIT would do (but originally implemented in Valgrind)
- Compile time http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Instrumentation/ThreadSanitizer.cpp?view=markup
- Runtime https://github.com/llvm-mirror/compiler-rt/blob/310285e9d8cf5a817d0788ec9ba509009bf748bd/lib/tsan/rtl/tsan_rtl.cc#L745
- pseudocode? https://github.com/google/sanitizers/wiki/ThreadSanitizerAlgorithm
- Lockset appraoch doesn't entirely work for golang because syncrhonization happens by passing objects between threads via channels
- CSP cited in particular http://www.cs.columbia.edu/~junfeng/10fa-e6998/papers/hybrid.pdf
- relaxed determinism [3 minutes]
[hardware and future work: 3 minutes]
-Hardware support & The Future
- the current state: page protection, trap & emulate, ugh
- Hardware watchpoints
- Itanium's ALAT table?
- x86 Intel Processor Trace (PT)
- LBR for call stacks & branch recording
- TODO: think about hardware read/write barriers?
-Conclusion
SLIDE AESTHETIC IDEAS:
- sherlock holmes
- where in time is carmen sandiego
TO READ/CITE:
- T. Ball. The concept of dynamic analysis In ESEC / SIGSOFT FSE, pages 216–234, 1999
- Go's race detector
- Dho's Raikonnen
- How dtrace / perftools fits into the research space
- BTIO
Extra stuff
[Full-system instrumentation in practice: 5 minutes ]
-Full-system instrumentation
- We started with a userspace debugger and ended with a full-system memory tracker; what is different between userspace and full-system?
- userspace tools sit in the space between the OS and the instrumented program by using features of the OS
- What's under the OS? Either hardware emulators or a virtual machine monitor!
- We'll talk about hardware-specific extensions at the end
- Why is full-system instrumentation important?
- Use-cases like taint tracking require tracking how something propagates through the computer starting from hardware
- Limitations of the above
- really requires an insertion shim like a language runtime or a VMM, unless you're able to modify the compiler
- How far can we get for full-system dynamic analysis without virtualization?
-Linux instrumentation with BCC, BPF, and probes
- BCC: framework for dynamic full-system instrumentation of Linux & applications
- BPF: an "in-kernel virtual machine" that compiles, validates, and executes bytecode
- bytecode defined in bpf_common.h
- bpf_jit_comp.c : dojit()
- Probe system: tldr allows callbacks to be invoked when a piece of memory is accessed
TODO
- Limitations: you don't have that shim btw hardware and the OS to interpose on full-system memory access