Creating a new debugger is not an easy task and requires a lot of work to cover different target platforms and edge cases.
Luckily, we don't need all that to create a useful tool that can provide a lot of value to developers, and at this stage we focus mainly on creating a framework for smaller tools that can solve specific tasks rather than on creating a full-fledged debugger with command line, UI, IDE/editor integration and so forth.
This framework will be comprised of some fundamental building blocks which can be used to interactively probe another process or a core dump file. This also does not necessarily mean that a debuggee process has to be stopped to observe its behaviour, as some debug techniques work with running processes.
For the phase 1, we also limit our scope to 2 targets: macOS and Linux on x86_64. The purpose of having two targets rather than just one is to make sure that any design choices that will be made during the development phase will account for the modular project structure & the need to support multiple target platforms at later stages.
Phase 1 will be considered done when we will have these basics available to users in form of a library:
- Reading and writing memory of a debuggee process.
- Primitive types (e.g.: i8, u64, double).
- Collections (arrays, slices,
std::vec::Vec
). - Structures.
- Enums.
- Getting and setting values of hardware registers.
- Obtaining backtraces.
- Setting breakpoints at specified addresses (step-in, step-out, and step-over are optional).
- Getting info about threads and reading/writing to the thread-local storage.
- Reading DWARF debug info from binary executables:
- Function addresses.
- Variable addresses (including static and thread-local vars).
- Structures & enums information.
- Support source maps (mapping memory addresses to source code).
Next stage goals for the project is to provide more complete debugging experience, more integrations, and more supported platforms.
-
One of the main targets is a JSON-RPC API to communicate with the debugger core. This will allow us to start building UIs for the debugger in form of command-line, desktop, or web apps.
-
We will also need to support the GDB Remote Serial Protocol as one of the targets to integrate with other debugger cores (such as mozilla rr for deterministic debugging).
-
This is also a stage where we can begin experiments with the Rust compiler integration for an expression parser.
The project is split into several loosely coupled major components:
-
Target. This is the component that interacts with the operating system. It provides an abstract uniform interface for operations such as memory reading & writing, breakpoints (both internal and user-level), step-in, step-over, and step-out operations, hardware registers, backtraces, threads, and so forth. Targets are supposed to be pluggable components and one platform can support multiple targets (e.g., we can have a native target, a GDB serial protocol, and the core dump target supported for Linux).
-
Symbolication. This component translates human-readable symbols into addresses recognised by the machine and the other way around. There could be multiple symbol sources: they can be read from the DWARF debug information, supplied by the compiler, or read from some other source. Currently, we only aim for providing Rust symbolication component.
-
Integration/API. At this level, we can combine multiple techniques provided by the target & symbolication layers to achieve some interesting effects on the higher level. For example, an expression parser would belong here: it can parse symbols from an expression, find their addresses using the symbolication API, and read the actual values using the target API. This is also where we provide a JSON-RPC API to integrate with other tools such as code editors or IDEs.
-
Command line/UI. A general-purpose user interface is out of scope of the Headcrab project (which is a debugger library or framework as opposed to a universal debugger tool). The UI can be provided by an existing code editor or IDE, or it can be implemented as part of another project that would interface with Headcrab's API.
Please keep in mind that the current state of the code base does not always reflect the design outlined here. If you find that something is missing and it's not listed on our issue tracker, please let us know!
Concrete targets should implement traits defined as part of the Target API. They are conditionally-compiled, and can be flexibly configured: for example, if you're building a debugger tool only to work with core dumps, you should be able to build Headcrab only with this target enabled. By default, all targets that a given platform supports should be enabled.
The feature set that we intend to cover (please keep in mind that the API is loosely defined and is prone to change):
-
Target::read(&self) -> ReadMemory
- returns a trait implementation (ReadMemory
, see below) that can be used to read memory from a debuggee process. The trait can be implemented using different strategies: e.g., on Linux we can useprocess_vm_read(2)
in the majority of cases, butptrace(2)
might be required to cover some edge cases (e.g., when a memory page is read-protected). The concrete memory reading strategy (or a combination of them) should be chosen by the implementor, and a user should not be aware of it.-
ReadMemory::read<T>(mut self, val: &mut T, remote_address: usize)
- reads a valueT
from the debuggee process at the given remote address. This function can be called multiple times: it builds a sequence of operations that can be executed with a single call. For non-primitive types,std::mem::MaybeUninit
can be used to safely work with uninitialized values. The lifetime ofReadMemory
should not exceed the lifetime of&mut T
references it contains. -
ReadMemory::apply(self) -> Vec<Result<(), ReadError>>
- executes the read operation and returns a list of addresses where the read operation has failed (the address can be contained within theReadError
type). This function is not required to be atomic and values can be read partially.
-
-
Target::write(&mut self) -> WriteMemory
- returns a trait implementation (WriteMemory
, see below) that can be used to write memory to a debuggee process. The implementation should account for the possibility of page write protections. For example, on Linuxptrace(2)
can be used to rewrite even protected memory pages, albeit inefficiently, and a concrete strategy should be chosen on a case-by-case basis without a user's knowledge.-
WriteMemory::write<T>(mut self, val: &mut T, remote_address: usize)
- writes a valueT
to the debuggee process at the given remote address. This function can be called multiple times: it builds a sequence of operations that can be executed with a single call. TypeT
should be initialised and the lifetime ofWriteMemory
should not exceed the lifetime of&mut T
references it contains. -
WriteMemory::apply(mut self) -> Vec<Result<(), WriteError>>
- executes the write operation and returns a list of addresses where the write operation has failed (the address can be contained within theWriteError
type). This operation is not required to be atomic and values can be written partially.
-
-
Target::next_event(&self) -> Option<DebugEvent>
- blocks & waits for a next debug event (such asDebugEvent::BreakpointHit
) to occur. -
Target::breakpoints(&mut self) -> &mut BreakpointsRef
- provides a view into a set of target breakpoints. It can be used to set, disable, and remove breakpoints. Internally, theBreakpointsRef
struct can useWriteMemory
to write the corresponding interrupt instructions, and this implementation can be shared across many target implementations.-
enum Breakpoint { Regular(usize), Conditional { addr: usize, cond: Box<fn(&mut Target) -> bool> } }
- the function in the conditional breakpoint will be executed each time the breakpoint is hit. It should return a boolean indicating whether a condition is met or not. -
BreakpointsRef::set(&mut self, bps: &[Breakpoint]) -> Result<(), Vec<BreakpointError>>
- sets a list of breakpoints at the provided addresses. In the case of error, returns a list of breakpoints that weren't set along with the error descriptions. -
BreakpointsRef::get_all(&self) -> &[Breakpoint]
- returns a list of all set breakpoints. -
BreakpointsRef::find_by_addr(&self, remote_addr: usize) -> &[Breakpoint]
- returns a list of all set breakpoints at the given address. -
BreakpointsRef::disable(&mut self, remote_addrs: &[usize]) -> Result<(), Vec<BreakpointError>>
- temporarily disable all breakpoints at the given addresses. -
BreakpointsRef::enable(&mut self, remote_addrs: &[usize]) -> Result<(), Vec<BreakpointError>>
- re-enable a previously disabled breakpoints at the giveen addresses. -
BreakpointsRef::remove(&mut self, remote_addrs: &[usize]) -> Result<(), Vec<BreakpointError>>
- remove all breakpoints at the given addresses.
-
This section will be expanded in the future.