easy debugging: higher-level gdb, full state recording #2910

synctext · 2017-04-25T08:11:36Z

towards fault-free software

The key problem we encounter while trying to create bug-free code is reproducing bugs.
Our users report bug is sufficient number and with sufficient detail. However, reproducing bug reported in-the-wild is now becoming the key problem. Triggering them has proven extremely hard giving our dependencies on thread scheduling, OS version, libraries versions, incoming network packets, obscure firewall blocking software, and database content.

One idea is to ask users to record all state of their Python intepreter in a ring buffer of many GBytes. When users are able to trigger a bug, they press the stop recoding and submit bug report to our system. By using the source code we should be able to piece together what happened.

There has been done prior work in this area. We struggled with debugging in the past, see ticket on Linux OS catching debig info..

synctext · 2018-12-03T15:40:31Z

OSDI 2018 paper on debugging. Microsoft uses "lightweight hardware tracing" to fix this: REPT: Reverse Debugging of Failures in Deployed Software
In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed systems. REPT reconstructs the execution history with high fidelity by combining online lightweight hardware tracing of a program’s control flow with offline binary analysis that recovers its data flow. It is seemingly impossible to recover data values thousands of instructions before the failure due to information loss and concurrent execution. REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction.

REPT leverages Intel Processor Trace (PT) to log control-flow and timing information of a program’s execution. Intel PT became available when the Broadwell architecture was released in 2014. Intel PT supports various program tracing modes, and REPT currently uses the per-thread circular buffer mode to trace user-space execution of all threads within a process.
https://github.com/01org/satt

synctext added long-term type: MSc Thesis Work labels Apr 25, 2017

qstokkink added this to the Backlog milestone Nov 27, 2018

xoriole mentioned this issue Nov 5, 2019

Debugging user issues better #4930

Closed

synctext closed this as completed Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

easy debugging: higher-level gdb, full state recording #2910

easy debugging: higher-level gdb, full state recording #2910

synctext commented Apr 25, 2017

synctext commented Dec 3, 2018 •

edited

Loading

easy debugging: higher-level gdb, full state recording #2910

easy debugging: higher-level gdb, full state recording #2910

Comments

synctext commented Apr 25, 2017

synctext commented Dec 3, 2018 • edited Loading

synctext commented Dec 3, 2018 •

edited

Loading