Skip to content

Latest commit

 

History

History
650 lines (460 loc) · 28.2 KB

Debugging-and-Profiling.md

File metadata and controls

650 lines (460 loc) · 28.2 KB

Debugging and Profiling

[toc]

Testing

  • Test-driven development(TDD) 软件开发流程
    • Fake and mock object methods
  • 也可参考【C++笔记】《The Art of Writing Readable Code》- chpt 14
  • 也可参考【protobuf等基本工具笔记】gtest

MIT 6.NULL - Debugging and Profiling

Debugging

概要

Printf Debugging and Logging

脑力劳动debug,借助打印信息思考和推断问题所在

信息除了Printf,还可以Logging,更灵活(可以输出到文件、sockets、remote servers),可复用

Here is an example code that logs messages:

$ python logger.py
# Raw output as with just prints
python logger.py log
# Log formatted output
$ python logger.py log ERROR
# Print only ERROR levels and above
$ python logger.py color
# Color formatted output

利用颜色信息: ANSI escape codes

#!/usr/bin/env bash
for R in $(seq 0 20 255); do
    for G in $(seq 0 20 255); do
        for B in $(seq 0 20 255); do
            printf "\e[38;2;${R};${G};${B}m█\e[0m";
        done
    done
done

Third party logs

  • 一个简化版 Google's logging

  • UNIX系统中第三方库的log常存在/var/log

    • the NGINX webserver places its logs under /var/log/nginx
  • systemd, a system daemon that controls many things in your system such as which services are enabled and running

    • /var/log/journal,可用journalctl显示
    • macOS上var/log/system.log,但更多人用system log,用 log show 显示
    • 参考data wrangling,需要对log信息处理,也可以用lvav,体验更好
logger "Hello Logs"
# On macOS
log show --last 1m | grep Hello
# On Linux
journalctl --since "1m ago" | grep Hello

Debuggers

gdb

C++: gdb (and its quality of life modification pwndbg) and lldb

Debugging with GDB 教程

gdb:c(continue), l(ist), s(tep), n(ext), b(reak), p(rint), r(eturn), run, q(uit), watch

  • up, down

    • up n
    • up-silently n
  • 特有:start, finish, cond, disable, where

    • cond 3 this==0xXXX
  • 参数

    • gdb --args sleep 20 debug带参数的binary
    • -q quiet模式
    • -ex: 执行gdb命令
      • -ex "run"
  • bt(backtrace), frame X 进帧

  • info b

  • watch -l 同时监视表达式本身和表达式指向的内容

  • attach $pid debug正在运行的进程

  • ptype 打印变量类型;打印stl使用 python pretty print

  • gcore attach后制造core文件

  • call

//增加print的可读性
set print pretty on/off
//显示完整 STL 结构
set print elements 0

//显示智能指针对象指向的变量
p ((Object*) my_ptr)->attribute //利用类型转换
p *(my_ptr._M_ptr)
  
//显示vector内部值
p *(my_vec._M_impl._M_start)@my_vec.size()  //打印大小
p *(my_vec._M_impl._M_start+0)
p (my_vec._M_impl._M_start)[N-1] //打印第N个成员
p *(my_vec._M_impl._M_start)@N  //打印前N个成员
  
//pb相关
p *(std::string*)(X.rep_.elements) //repeated string, 字段X
  • Q: 都是问号
    • info sharedlibrary
    • set solib-search-path $PATH
  • info frame

  • info args

  • info locals

  • info threads:显示当前可调试的所有线程,GDB会给每一个线程都分配一个ID。前面有*的线程是当前正在调试的线程。

  • info reg

  • thread ID:切换当前调试的线程为指定ID的线程。

  • thread apply all command:让所有被调试的线程都执行command命令。

    • thread apply all bt 把各个线程的调用栈打出来
  • thread apply ID1 ID2 … command:让线程编号是ID1,ID2…等等的线程都执行command命令。

  • set scheduler-locking on|off|step:在使用step或continue命令调试当前被调试线程的时候,其他线程也是同时执行的,如果我们只想要被调试的线程执行,而其他线程停止等待,那就要锁定要调试的线程,只让他运行。

    • off:不锁定任何线程,所有线程都执行。
    • on:只有当前被调试的线程会执行。
    • step:阻止其他线程在当前线程单步调试的时候抢占当前线程。只有当next、continue、util以及finish的时候,其他线程才会获得重新运行的
  • show scheduler-locking:查看当前锁定线程的模式。

  • gdb -q --batch -ex "set height 0" -ex "thread apply all bt full" [可执行文件] [core文件]

    • -q: 不打印gdb的版权消息
    • --batch: 执行批处理,不进入交互模式
    • -ex: 执行gdb 命令
    • "set height 0": 不对输出进行分页

Gdb 的汇编级别调试

  • ni, si

打印变量到文件

set logging file $file_name
set logging on
thread apply all bt
set logging off
set logging overwrite on/off
set logging redirect on/off
show logging

Gdb STL support tools: https://sourceware.org/gdb/wiki/STLSupport

  • How to search memory for a byte sequence with GDB command find?
  • Debug mode
    • DWARF is a pun on ELF, which stands for Executable and Linking Format. DWARF goes with ELF.
    • dwarf的不同版本
      • -g2: dwarf4
      • -g3
      • -g3 -Og 平衡性能和debug信息
    • readelf --debug-dump a.out | less
    • readelf --debug-dump=loc a.out | less
    • print $pc
  • Split-dwarf
    • Even with incremental builds, the linker still needs to read all of the debug information in order to do things like remove duplications, for example.
100个gdb小技巧

https://wizardforcel.gitbooks.io/100-gdb-tips/content/index.html

  1. 打印
  • 打印变量

    • p/t 打印二进制变量
  • 打印内存的值

    • 格式为“x/nfu addr”。含义为以f格式打印从addr开始的n个长度单元为u的内存值
      • n:输出单元的个数。
      • f:是输出格式。比如x是以16进制形式输出,o是以8进制形式输出,t是二进制输出。
      • u:标明一个单元的长度。b是一个byteh是两个byte(halfword),w是四个byte(word),g是八个byte(giant word)
  • gdb with Gtest

    • --gtest_break_on_failure

Python: ipdb is an improved pdb that uses the IPython REPL enabling tab completion, syntax highlighting, better tracebacks, and better introspection while retaining the same interface as the pdb module.

CS107 GDB and Debugging教程

CS107 Software Testing Strategies

lldb的使用

macOS上配置VSCode的gdb调试环境

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "envFile": "${workspaceFolder}/.env",
            "console": "integratedTerminal",
            "env": {
              "ENV1":"123",
              "ENV2":"abc",
            },
            "args": ["-version","4"]
        }
    ]
}

Sanitizers

如何高效解决 C++内存问题,Apache Doris 实践之路|技术解析 TODO

  • ASan

    • -fsanitize=address
  • UBSan

  • -fsanitize=undefined

  • ThreadSan

    • -fsanitize=thread
  • Note

    • 在程序里使用 Pthreads 库有一个额外的好处: 分析工具认得它们,懂得其语意。 线程分析工具如 Intel Thread Checker 和 Valgrind-Helgrind 等能识别 Pthreads 调 用,并依据 happens-before (TODO) 关系分析程序有无 data race。

Specialized Tools

Under some circumstances, you may need to look at the network packets to figure out the issue in your program. Tools like tcpdump and Wireshark are network packet analyzers that let you read the contents of network packets and filter them based on different criteria.

For web development, the Chrome/Firefox developer tools are quite handy. They feature a large number of tools, including:

  • Source code - Inspect the HTML/CSS/JS source code of any website.
  • Live HTML, CSS, JS modification - Change the website content, styles and behavior to test (you can see for yourself that website screenshots are not valid proofs).
  • Javascript shell - Execute commands in the JS REPL.
  • Network - Analyze the requests timeline.
  • Storage - Look into the Cookies and local application storage.

Static Analysis

Coredump 相关

  • Linux 默认会把 core dump 写到当前目录,而且文件名是固定的 core。为了不让新的 core dump 文件冲掉旧的,我们可以通过 sysctl 设置 kernel.core_pattern 参数(也可以修改 /proc/sys/kernel/core_pattern),让每次 core dump 都产生不同的文件
    • core_pattern: |/usr/share/apport/apport %p %s %c %d %P
    • https://wiki.ubuntu.com/Apport
    • 修改core位置
      • echo "/data/coredump/core.%e.%p" > /proc/sys/kernel/core_pattern
      • %E:程序文件的完整路径(路径中的/会被!替代)
      • %p:进程 ID
      • %t:进程奔溃的时间戳
      • %s:哪个信号让进程奔溃
    • ulimit -c unlimited
    • 判断是否是core文件:readelf -h core

Dynamic Tracing

  • 动态追踪技术
    • 动态追踪技术允许我们使用非侵入式的方式,不用去修改我们的操作系统内核,不用去修改我们的应用程序,也不用去修改我们的业务代码或者任何配置,就可以快速高效地精确获取我们想要的信息,第一手的信息,从而帮助定位我们正在排查的各种问题
    • 调试技术需要贯通各个软件层次的抽象和封装
    • 火焰图:on-CPU, off-CPU
  • dtrace
    • Even if what you are trying to debug is a black box binary there are tools that can help you with that. Whenever programs need to perform actions that only the kernel can, they use System Calls. There are commands that let you trace the syscalls your program makes. In Linux there’s strace and macOS and BSD have dtrace. dtrace can be tricky to use because it uses its own D language, but there is a wrapper called dtruss that provides an interface more similar to strace (more details here).
    • strace入门
# On Linux
strace git status 2>&1 >/dev/null | grep index.lock
sudo strace [-e lstat] ls -l > /dev/null

# 多线程 strace,要显示 PPID
ps -efl | grep $task_name # 显示 PPID、PID
strace -p $PID

# 一些 flag
-tt   发生时刻
-T 		持续时间
-s 1024 print输入参数的长度限制
-e write=   -e read=     -e trace=file/desc			-e recvfrom
-f 监控所有子线程   -ff
# On macOS
sudo dtruss -t lstat64_extended ls -l > /dev/null

# 与之配合的技术
readlink /proc/22067/fd/3
lsof | grep /tmp/foobar.lock

Profiling

  • profilers和monitoring tools的意义:premature optimization is the root of all evil

  • CPU usage和CPU使用率

    • CPU Usage: CPU Usage是指单位时间内,进程使用了多少核心的CPU: 包括该进程用户态和内核态的开销。对于Pod来说,即单位时间内pod内所有进程使用了多少核心的CPU。
    • CPU使用率: CPU利用率是指单位时间内: CPU usage/CPU总核心数。
  • 时间概念:real/user/system time

    • real time(wall time): 真实时间, user: 用户态耗时, sys: 内核态耗时, user+sys: 实际用时

    • time指令

  • Jeff Dean推崇的back-of-the-envelope方法估算系统性能,这是一篇很好的文章,里面有各种实用数据

    • scalable counter

    • keep per user comment indexes when paging through comments

  • 性能优化思路:

    • CPU利用率 与 latency 的平衡
    • 提高CPU利用率再优化latency的思路
      • 暴露其他性能问题:在比较高的CPU使用率下,请求latency对CPU使用会变得比较敏感。
        • 引入糟糕的代码引入,会导致明显的latency上升,从而被及时发现。
        • 非预期的请求burst,会导致latency pct99明显上升。
        • 糟糕的流量调度,也会导致latency pct99明显上升。
      • 提升对基础组件的要求
        • 容器调度、超售
        • 容灾降级、auto scale
        • 流量动态调度
        • 网络和中间件
    • 分析CPU是否是瓶颈
    • CPU cache与TLB
      • 容器环境下,由于内核是共享的,租户之间很可能通过TLB和CPU cache进行相互影响。
      • pmc工具: tlbstat
        • K_CYCLES: CPU Cycles x 1000
        • K_INSTR: CPU Instructions x 1000
        • IPC: Instructions-Per-Cycle
        • DTLB_WALKS: Data TLB walks (count)
        • ITLB_WALKS: Instruction TLB walks (count)
        • K_DTLBCYC: Cycles at least one Page Miss Handler (PMH) is active with data TLB walks x 1000
        • K_ITLBCYC: Cycles at least one PMH is active with instr. TLB walks x 1000
        • DTLB%: Data TLB active cycles as a ratio of total cycles,即:DTLB miss花费的CPU开销。
        • ITLB%: Instruction TLB active cycles as a ratio of total cycles,即:ITLB miss花费的CPU开销。
      • pmc工具: pmcarch
        • K_CYCLES: CPU Cycles x 1000
        • K_INSTR: CPU Instructions x 1000
        • IPC: Instructions-Per-Cycle
        • BMR%: Branch Misprediction Ratio, as a percentage
        • LLC%: Last Level Cache hit ratio, as a percentage
      • 71e328e7-e873-4336-b029-740b38218e78
#!/usr/bin/env bpftrace

#include <linux/sched.h>

BEGIN
{
        printf("Count involutary context switch... Hit Ctrl-C to end.\n");
}

tracepoint:sched:sched_switch
{
        if (args->prev_state == TASK_RUNNING) { // involutary
                @involutary[args->prev_comm] = count();
        }
}
i:s:1
{
        print(@involutary);
        clear(@involutary);
}
  • QPS/单核QPS/latency等指标和CPU使用率的关系
    • CPU使用率从40%到60%阶段,Success QPS从2000左右增加到了3500(增加了约75%)
    • CPU利用率与mesh过载保护
    • 单核QPS随QPS上升而上升,呈抛物线状
      • 基础开销(mesh/go runtime调度, sysmon等)被逐渐均摊,基础开销边际成本变得更低
      • QPS上升,IPC变高; 但高峰期物理机负载高的时候IPC低
    • CPU利用率从60%到80%,latency p99变化很大

Profilers

CPU: 两种CPU profilers,tracing and sampling profilers

  • Python
    • cProfile: python -m cProfile -s tottime grep.py 1000 '^(import|\s*def)[^,]*$' *.py
    • line_profiler 可逐行输出,用@prifiledecorator标注函数, kernprof -l -v a.py
b = [2] * (2 * 10 ** 7)
del b

kernprof -l -v sorts.py
python -m line_profiler sorts.py.lprof

Event Profiling

perf

perf

perf的介绍与使用

perf documentation

sudo apt-get install linux-tools-$(uname -r) linux-tools-generic -y
  • perf list - List the events that can be traced with perf
  • perf stat COMMAND ARG1 ARG2 - Gets counts of different events related a process or command
  • perf record COMMAND ARG1 ARG2 - Records the run of a command and saves the statistical data into a file called perf.data
  • perf report - Formats and prints the data collected in perf.data
perf help
(sudo) perf top --call-graph graph
								-t $thread_id

sudo perf record stress -c 1 # record->stat
sudo perf report

perf stat -e cache-misses,cache-references,instructions,cycles,faults,branch-instructions,branch-misses,L1-dcache-stores,L1-dcache-store-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -p $pid

perf record -e cache-misses -p $pid
perf report --sort comm,dso,symbol

sudo perf kmem --alloc --caller --slab stat
sudo perf sched script

Survey of C++ profiling techniques: gprof vs valgrind vs perf vs gperftools

https://stackoverflow.com/questions/375913/how-can-i-profile-c-code-running-on-linux/378024#378024

gprof

http://unix.ba/text/runtime-profiling-with-gprof/

http://sam.zoy.org/writings/programming/gprof.html

编译参数 -Og -pg,前者是debug mode,后者是动态trace

代码分析

clock_gettime(CLOCK_THREAD_CPUTIME_ID) 统计当前线程的CPU时间开销(不包含系统调用的开销)

内存泄露问题

VALGRIND_LIB=/usr/lib/valgrind valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes -v --instr-atstart=no ./bin
# 这条命令暂时还没跑通看到 output

Visualization

Resource Monitoring

  • General Monitoring - Probably the most popular is htop, which is an improved version of top. htop presents various statistics for the currently running processes on the system. htop has a myriad of options and keybinds, some useful ones are: <F6> to sort processes, t to show tree hierarchy and h to toggle threads. See also glances for similar implementation with a great UI. For getting aggregate measures across all processes, dstat is another nifty tool that computes real-time resource metrics for lots of different subsystems like I/O, networking, CPU utilization, context switches, &c.

    • dstat -nf,n表示网络,f表示看详细信息
    • lscpu
    • cat /proc/cpuinfo 查cpu信息,其中 flags 表示指令集支持
    • dmidecode | grep "Memory Device$" | wc -l 查内存条数量
  • I/O operations - iotop displays live I/O usage information and is handy to check if a process is doing heavy I/O disk operations

  • Disk Usage - df displays metrics per partitions and du displays disk usage per file for the current directory. In these tools the -h flag tells the program to print with human readable format. A more interactive version of du is ncdu which lets you navigate folders and delete files and folders as you navigate.

  • Memory Usage - free displays the total amount of free and used memory in the system. Memory is also displayed in tools like htop.

  • Open Files - lsof lists file information about files opened by processes. It can be quite useful for checking which process has opened a specific file.

  • Network Connections and Config - ss lets you monitor incoming and outgoing network packets statistics as well as interface statistics. A common use case of ss is figuring out what process is using a given port in a machine. For displaying routing, network devices and interfaces you can use ip. Note that netstat and ifconfig have been deprecated in favor of the former tools respectively.

    • ss的使用技巧,查端口占用通常用 -nlp,如果出现 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) 是正常情况,意思就是没端口没被占用
  • Network Usage - nethogs and iftop are good interactive CLI tools for monitoring network usage.

If you want to test these tools you can also artificially impose loads on the machine using the stress command.

cat /etc/network/interfaces | egrep eth0 -A 1 | egrep address | awk '{print $2}' | egrep '^10'
Specialized tools

Sometimes, black box benchmarking is all you need to determine what software to use. Tools like hyperfine let you quickly benchmark command line programs. For instance, in the shell tools and scripting lecture we recommended fd over find. We can use hyperfine to compare them in tasks we run often. E.g. in the example below fd was 20x faster than find in my machine.

$ hyperfine --warmup 3 'fd -e jpg' 'find . -iname "*.jpg"'
Benchmark #1: fd -e jpg
  Time (mean ± σ):      51.4 ms ±   2.9 ms    [User: 121.0 ms, System: 160.5 ms]
  Range (min … max):    44.2 ms …  60.1 ms    56 runs

Benchmark #2: find . -iname "*.jpg"
  Time (mean ± σ):      1.126 s ±  0.101 s    [User: 141.1 ms, System: 956.1 ms]
  Range (min … max):    0.975 s …  1.287 s    10 runs

Summary
  'fd -e jpg' ran
   21.89 ± 2.33 times faster than 'find . -iname "*.jpg"'

As it was the case for debugging, browsers also come with a fantastic set of tools for profiling webpage loading, letting you figure out where time is being spent (loading, rendering, scripting, &c). More info for Firefox and Chrome.

C++ profiling functions
  • print memory
std::pair<size_t, size_t> MemoryUsage() {
  uint32_t size = 0, resident = 0, share = 0;
  std::ifstream buffer("/proc/self/statm");
  buffer >> size >> resident >> share;
  buffer.close();

  uint32_t page_size_kb = sysconf(_SC_PAGE_SIZE) / 1024;
  return std::make_pair(resident * page_size_kb / 1024,
                        share * page_size_kb / 1024);
}
  • 通过硬件实现任务组(tasks group)级的L3 Cache(LLC)和内存带宽(MBM)的监控和隔离限制的技术
Intel PCM

Exercises

(Advanced) Read about reversible debugging and get a simple example working using rr or RevPDB.

rr:

watch -l XX
reverse-cont

memory-profiler

pycallgraph

限制进程资源

taskset --cpu-list 0,2 stress -c 3

利用cgroup控制内存等资源