This is a hierarchical annotated bibliography of resources related to the development and functioning of debuggers, with a particular emphasis on debugging Go executables and even more in particular about the Delve debugger.
Backtrace.io series on implementing a debugger
Blog series about writing a debugger for linux using ptrace
and DWARF debug symbols.
Michał Łowicki: making a debugger for Golang
Blog series about writing a simple debugger for linux using ptrace
and the Go debugging symbols (gosymtab and gopclntab). In a real debugger you will probably want to use DWARF debug symbols instead of those.
Liz Rice: Debuggers from Scratch (Gophercon UK 2018)
Recording of a talk (and written version of the talk) about writing a simple debugger for linux using ptrace
and the Go debugging symbol (gosymtab and gopclntab). In a real debugger you will probably want to use DWARF debug symbols instead of those.
Both this and Michał Łowicki blog series above suffer from a relatively common pitfall: golang/go#28315.
Microsoft: Creating a Basic Debugger
Microsoft tutorial on creating a debugger for Windows using ContinueDebugEvent
/WaitForDebugEvent
and other related Win32 API.
Jonathan B. Rosenberg: How Debuggers Work
The only book I could find about writing debuggers. It explains how to write a debugger for Windows using Win32 API calls and the STI debug format (aka CodeView debug format, which is what Microsoft compilers used to produce until Visual Studio 4). It’s pretty outdated, being written in 1996 and the “step over” algorithm is, AFAICT, needlessly complicated and wrong.
Not recommended.
How Debuggers Work (Algorithms, Data Structures, and Architecture)
Jonathan B. Rosenberg, 1996
Wiley Computer Publishing (John Wiley & Sons, Inc.)
ISBN 0-471-14966-7
David J. Agans: Debugging
This doesn’t have anything to do with writing debuggers. Instead it’s a book about debugging that I really like. It isn’t even about using debuggers in particular, it just talks about how to approach a debugging problem in general. The 9 rules it outlays are crucial to using debuggers effectively.
Debugging—The Nine Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
David J. Agans, 2002
American Management Association
ISBN 0−8144−7168−4
Tim Misiak’s stuff
Former Microsoft Debugger Platform engineer, he worked on WinDbg and KD. Has a blog where he talks about debuggers (including a tutorial on implementing a toy debugger for Windows in Rust) and a Youtube channel with interviews (also about debuggers).
Derek Parker: Advanced Go debugging with Delve (Fosdem 2018)
Details of the Go runtime that make Delve necessary.
Alessandro Arzilli: Architecture of Delve (Gophercon Iceland 2018)
My talk about Delve internals. Also describes the three layer architecture of Delve (UI, symbolic, target) which is appropriate for other debuggers as well.
If you want to contribute to Delve this is probably the quickest introduction there is.
Also contains a description of a Step Over algorithm that actually works, unlike other algorithms you might find on the internet.
How to write a Delve Client
Tutorial on how to write a client for Delve (for example an editor plugin using delve to debug code).
This lists reference useful for writing the “target layer” of a debugger, i.e. that part of the debugger that is responsible for managing the target process and manipulating its memory.
Intel® 64 and IA-32 Architectures Software Developer’s Manual
At the end of the day, everything boils down to assembly. So why not read this agile 5000 pages booklet now, and get it over with?
Of particular interest to debuggers the XSAVE format, the INT 3 instruction and debug registers
Hardware breakpoints
Microsoft: Basic Debugging
MSDN entry point for documentation about the Win32 debugging API.
Minidump file format
This is the file format used on Windows to record core dumps of running applications. It’s called minidump to distinguish it from crashdumps, which are full-system kernel dumps.
Unlike linux and macOS, which use the same file format for executables and core dumps, the file format for core dumps on Windows is completely different from executables.
It can be produced automatically by Windows, by a WinDbg command or by using the Procdump utility.
A minidump is divide into streams: it has a header, followed by a stream directory and then a bunch of streams. Each stream either describes things about the process in general, about a thread in particular or contains a chunk of memory from the dumped process.
To read a minidump start reading the header, get the offset of the stream directory from it, then read the stream directory and form that read the streams you need.
Thread Naming on Windows
Windows has a couple of facilities to give names to threads to aid debugging, even if you don’t care about supporting this you should know about it or you might get an exception that you don’t know what to do with.
Linux Multithreading implementation
Linux has a weird way of implementing threads. Basically, there are no threads. Instead a multithreaded process is actually a group of processes that share memory, file handles and signal handling.
As a linux user you don’t need to care about this, because the user-space utilities and libc do a decent job of hiding the complexity. If you are writing a debugger backend for linux, however, there is no way to avoid all the weirdness.
Ptrace
Ptrace is the name of the POSIX syscall used to control a process you want to debug. It’s very powerful but also very complicated and janky. Use with caution.
Gdb Remote Serial Protocol
This protocol was originally devised to debug programs running in environment that were too constrained to host the full gdb program, such as embedded processors or operating system kernels. The idea was that you would embed a small assembly level debugger, implementing only what I call the “target layer”, and then connect gdb to it using this protocol and end up with a full symbolic debugger.
Notable programs implementing this protocol are gdbserver, lldb-server, debugserver (a stripped down version of lldb-server available on macOS) and Mozilla RR.
Beware that there are two different wire encodings for packets, the “binary” encoding and the “not-binary” encoding that differ on whether RLE compression is available and which character is the escape code. There is no good way to tell which packet uses which encoding and sometimes it isn’t even documented.
This section contains anything pertaining interpreting debug symbols and extracting them from executable files.
For a modern debugger you only need to be concerned with three executable formats (PE, ELF and Mach-O) and two debug formats (DWARF and PDB). Anything else is of historical interest only at this point.
Practical Binary Analysis
This book has a good introduction to both ELF and PE, as well as other interesting things.
Practical Binary Analysis
Build Your Own Linux Tools for Binary Instrumentation, Analysis, and Disassembly
by Dennis Andriesse
no starch press, December 2018, 456 pp.
ISBN-13: 978-1-59327-912-7
Portable Executable (PE)
This is the executable file format used on Windows. It obsoletes the MS-DOS MZ file format and is derived from COFF (Common Object File Format), an older UNIX executable file format.
Program Database (PDB)
This is the debug format currently used in Windows. It is supported by Visual Studio, WinDbg and the DbgHelp library. Unfortunately it’s also largely undocumented. Gcc, LLVM and Go do not produce debug symbols in this format, instead they opt for embedding DWARF symbols inside PE files even on Windows.
Unlike stabs and DWARF this debug format is not embedded inside the executable file and lives instead in separate .PDB
files.
Executable and Linkable Format (ELF) and System V release 4 Application Binary Interface
This is the executable format used on Linux and most other unix-like operating systems. Originally introduced by UNIX System V release 4, it replaces COFF and the older a.out. It is used to represent executables, object files, shared objects and core dumps.
If you start reading the source code of GDB you’ll come across a file called solib-svr4.c the name is a reference to the document introducing the ELF file format: “Shared Object LIBrary - System V Release 4”.
Mach-O
The file format used on macOS to represent executables, dynamic libraries and core dumps.
Examining executable files
To examine executable files you can use objdump
on Linux or otool
on macOS. My diexplorer can show the debug sections inside a browser window, with cross-references. Sometimes it is useful to examine executable files for an architecture other than the one you are using, diexplorer can do that, objdump from GNU’s binutils can also do that, but only if it is build in a special way – which Linux distributions usually don’t do. See compiling a cross-platform objdump.
DWARF debug format
This is the debug format used on most unix-like systems, including Linux and macOS. It obsoletes stabs.
Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code
If you are interested in obsolete things, like me.
Software Exorcism: A Handbook for Debugging and Optimizing Legacy Code
Reverend Bill Blunden, 2003
Apress
ISBN: 978-1-4302-0788-7
Embecosm: Howto: Porting the GNU Debugger
A detailed document describing how to port GDB to a different CPU architecture. Also a good introduction to the GDB architecture.
r_debug and link_map
The DT_DEBUG entry in the .dynamic section can be used to find out which shared libraries are used by a Linux program. Code using this entry is not portable.
/usr/include/elf/link.h
Mozilla RR Project
A time traveling debugger backend. Can be used as a backend of any debugger that speaks the Gdb Remote Serial Protocol. By default it starts GDB as its frontend.
Peter B. Kessler: Fast Breakpoints
Details of an implementation for fast conditional breakpoints using jumps to generated code.
Fast Breakpoints
Peter B. Kessler, 1990
Proceedings of the ACM SIGPLAN '90
White Plains, New York, June 20-22, 1990.
DOI: 10.1.1.90.2322
Acid: A Debugger Built From A Language
Plan 9’s debugger, built around a programming language. The same programming language is used as the command line of Acid, to build most of Acid’s functionality and by the compiler to describe debug symbols.
Acid: A Debugger Built From A Language
Phil Winterbottom, Lucent Technologies Inc.
Proc. of the Winter 1994 USENIX Conf., pp. 211-222, San Francisco, CA
DOI: 10.1.1.472.8070