Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

easy debugging: higher-level gdb, full state recording #2910

Closed
synctext opened this issue Apr 25, 2017 · 1 comment
Closed

easy debugging: higher-level gdb, full state recording #2910

synctext opened this issue Apr 25, 2017 · 1 comment

Comments

@synctext
Copy link
Member

towards fault-free software

The key problem we encounter while trying to create bug-free code is reproducing bugs.
Our users report bug is sufficient number and with sufficient detail. However, reproducing bug reported in-the-wild is now becoming the key problem. Triggering them has proven extremely hard giving our dependencies on thread scheduling, OS version, libraries versions, incoming network packets, obscure firewall blocking software, and database content.

One idea is to ask users to record all state of their Python intepreter in a ring buffer of many GBytes. When users are able to trigger a bug, they press the stop recoding and submit bug report to our system. By using the source code we should be able to piece together what happened.

There has been done prior work in this area. We struggled with debugging in the past, see ticket on Linux OS catching debig info..

@qstokkink qstokkink added this to the Backlog milestone Nov 27, 2018
@synctext
Copy link
Member Author

synctext commented Dec 3, 2018

OSDI 2018 paper on debugging. Microsoft uses "lightweight hardware tracing" to fix this: REPT: Reverse Debugging of Failures in Deployed Software
In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed systems. REPT reconstructs the execution history with high fidelity by combining online lightweight hardware tracing of a program’s control flow with offline binary analysis that recovers its data flow. It is seemingly impossible to recover data values thousands of instructions before the failure due to information loss and concurrent execution. REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction.

REPT leverages Intel Processor Trace (PT) to log control-flow and timing information of a program’s execution. Intel PT became available when the Broadwell architecture was released in 2014. Intel PT supports various program tracing modes, and REPT currently uses the per-thread circular buffer mode to trace user-space execution of all threads within a process.
https://github.com/01org/satt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants