-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG Report] coctx_swap violates the Sys V ABI of i386 and AMD64 (违反Sys V ABI约定) #90
Comments
I have created a bug proof in this branch. To reproduce this bug: $ git clone -b tencent_libco_bug_report_and_coctx_swap_benchmark https://github.com/hnes/libaco.git
$ cd libaco
$ # require gcc >= 5.0 to progress
$ bash libco_bug_build.sh
$ ls *libco_bug_0
test_libco_bug_0 test_m32_libco_bug_0
$ # use objdump find the address of acosw (coctx_swap) in test_m32_libco_bug_0
08048786 <acosw>: # actually it is indeed coctx_swap, i just changed the name for convenient build
8048786: 8d 44 24 04 lea eax,[esp+0x4]
804878a: 8b 64 24 04 mov esp,DWORD PTR [esp+0x4]
804878e: 8d 64 24 20 lea esp,[esp+0x20]
8048792: 50 push eax
8048793: 55 push ebp
8048794: 56 push esi # <- breakpoint we would use later
8048795: 57 push edi
8048796: 52 push edx
8048797: 51 push ecx
8048798: 53 push ebx
8048799: ff 70 fc push DWORD PTR [eax-0x4]
804879c: 8b 60 04 mov esp,DWORD PTR [eax+0x4]
804879f: 58 pop eax
80487a0: 5b pop ebx
80487a1: 59 pop ecx
80487a2: 5a pop edx
80487a3: 5f pop edi
80487a4: 5e pop esi
80487a5: 5d pop ebp
80487a6: 5c pop esp
80487a7: 50 push eax
80487a8: 31 c0 xor eax,eax
80487aa: c3 ret
$ gdb ./test_m32_libco_bug_0
...
(gdb) break *0x8048794
Breakpoint 1 at 0x8048794: file coctx_swap.S, line 33.
(gdb) r
Starting program: /mnt/shr/codes/libaco/test_m32_libco_bug_0
main: sp:0xffffd1c8 ####<<<<!!!! main stack pointer
Breakpoint 1, acosw () at coctx_swap.S:33
33 pushl %esi
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.i686
(gdb) bt
#0 acosw () at coctx_swap.S:33
#1 0xffffd1c8 in ?? ()
#2 0xf7e121b3 in __libc_start_main () from /lib/libc.so.6
#3 0xf7ffcfbc in _DYNAMIC () from /lib/ld-linux.so.2
#4 0x00000001 in ?? ()
#5 0x08048693 in main () at test_libco_bug_0.c:61
(gdb) signal SIGINT
Continuing with signal SIGINT.
Received interrupt signal!
signal: sp:0x804b968 ####<<<<!!!! signal stack pointer -> HEAP!!! BUG!!!
signal: co:0x804c088 save_stack:0x804c0d8 share_stack:0xf7bf7000
signal: exit ####^<<<<!!!! `co` is on the heap
Breakpoint 1, acosw () at coctx_swap.S:33
33 pushl %esi
(gdb)
$ # same amd64 version, i.e. the test_libco_bug_0 |
BUG 1: 栈指针的使用方式违反Sys V ABI约定
ABI规范中规定用户空间程序的栈指针必须时刻指到运行栈的栈顶,而coctx_swap.S中却使用栈指针直接对位于堆中的数据结构进行寻址内存操作,这违反了ABI约定。
当coctx_swap正在用栈指针对位于堆中的数据结构进行寻址内存操作时,若此时执行线程收到了一个信号,接着内核抢占了该执行线程并开始准备接下来用户空间线程的信号处理执行环境,由于在默认情况下,内核将会选择主栈作为信号处理函数的执行栈,但此时栈已经被指向了堆中(用户空间的程序违反ABI约定在先),那么信号处理函数的执行栈就会被错误的放置到堆中,这样,堆中的数据结构在接下来就极有可能会被破坏。 BUG修复的方法可以参考libaco项目的上下文切换函数acosw实现(libaco BUG 2: x87与MXCSR控制字的保存与恢复违反Sys V ABI约定在Intel386及AMD64 Sys V ABI约定中,x87与MXCSR控制字是“callee saved”的,但是 根据libaco的性能基准测试报告得到的结论,在上下文切换代码中增加对x87与MXCSR控制字的保存与恢复只会给切换汇编带来约0.87%的性能损耗。 |
在libaco的这一分支中,有一个关于coctx_swap的bug最小复现实例。 Bug复现过程: $ git clone -b tencent_libco_bug_report_and_coctx_swap_benchmark https://github.com/hnes/libaco.git
$ cd libaco
$ # require gcc >= 5.0 to progress
$ # libaco在v1.1之后已经没有对gcc版本的限制了
$ bash libco_bug_build.sh
$ ls *libco_bug_0
test_libco_bug_0 test_m32_libco_bug_0
$ # use `objdump` find the address of acosw (coctx_swap) in test_m32_libco_bug_0
08048786 <acosw>: # 实际上它就是`coctx_swap`,这里为了方便构建直接改写成了libaco的切换函数`acosw`
8048786: 8d 44 24 04 lea eax,[esp+0x4]
804878a: 8b 64 24 04 mov esp,DWORD PTR [esp+0x4]
804878e: 8d 64 24 20 lea esp,[esp+0x20]
8048792: 50 push eax
8048793: 55 push ebp
8048794: 56 push esi # <- 我们选择的断点
8048795: 57 push edi
8048796: 52 push edx
8048797: 51 push ecx
8048798: 53 push ebx
8048799: ff 70 fc push DWORD PTR [eax-0x4]
804879c: 8b 60 04 mov esp,DWORD PTR [eax+0x4]
804879f: 58 pop eax
80487a0: 5b pop ebx
80487a1: 59 pop ecx
80487a2: 5a pop edx
80487a3: 5f pop edi
80487a4: 5e pop esi
80487a5: 5d pop ebp
80487a6: 5c pop esp
80487a7: 50 push eax
80487a8: 31 c0 xor eax,eax
80487aa: c3 ret
$ gdb ./test_m32_libco_bug_0
...
(gdb) break *0x8048794
Breakpoint 1 at 0x8048794: file coctx_swap.S, line 33.
(gdb) r
Starting program: /mnt/shr/codes/libaco/test_m32_libco_bug_0
main: sp:0xffffd1c8 ####<<<<!!!! 默认的线程主栈
Breakpoint 1, acosw () at coctx_swap.S:33
33 pushl %esi
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.i686
(gdb) bt
#0 acosw () at coctx_swap.S:33
#1 0xffffd1c8 in ?? ()
#2 0xf7e121b3 in __libc_start_main () from /lib/libc.so.6
#3 0xf7ffcfbc in _DYNAMIC () from /lib/ld-linux.so.2
#4 0x00000001 in ?? ()
#5 0x08048693 in main () at test_libco_bug_0.c:61
(gdb) signal SIGINT
Continuing with signal SIGINT.
Received interrupt signal!
signal: sp:0x804b968 ####<<<<!!!! 信号的执行栈被放置到了堆中 BUG复现
signal: co:0x804c088 save_stack:0x804c0d8 share_stack:0xf7bf7000
signal: exit ####^<<<<!!!! 这个`co`对象是在堆中分配的
Breakpoint 1, acosw () at coctx_swap.S:33
33 pushl %esi
(gdb)
$ # 对于amd64,可以使用与上述相同的方法对test_libco_bug_0进行bug复现,略。 |
人人都知道这个bug啦,微信好像内部版本都修复了似的。 |
您是指刚刚这几天修复的么?还是指很久以前libco(在这个issue以前)就已经修复了呢(只是一直以来都没有放出来)? 十分感谢 ;-) |
2016年听他们分享的时候他们就说他们修了。没想到这个开源版还停留在没修改的状态,不太了解具体情况。 |
十分感谢您提供的信息,对我很有用处 :D |
这是我对coctx_swap.S中__x86_64__部分的修改,欢迎指正(手头没有__i386__机器没法试验)
|
抱歉具体的没有仔细看,但是有两个建议哈:
pushq %rax
xorl %eax, %eax
ret
|
对了,如果要测试i386,可以使用gcc的 |
十分感谢!您的建议很好
改为
另附上__i386__实现
|
@Ananfa leaq 8(%rsp), %rax // rsp + 8 -> rax
movq %rax, 104(%rdi)
movq %rbx, 96(%rdi)
movq %rcx, 88(%rdi) # 缺少了 %rcx的拷贝?
movq %rdx, 80(%rdi) # 缺少了 %rdx的拷贝?
movq -8(%rax), %rdx
movq %rdx, 72(%rdi) // ret func addr
movq %rbp, 48(%rdi)
movq %r12, 24(%rdi)
movq %r13, 16(%rdi)
movq %r14, 8(%rdi)
movq %r15, 0(%rdi)
movq %rsi, 64(%rdi) // param
movq %rdi, 56(%rdi) // param
movq 0(%rsi), %r15
movq 8(%rsi), %r14
movq 16(%rsi), %r13
movq 24(%rsi), %r12
movq 48(%rsi), %rbp
movq 72(%rsi), %rax // ret func addr
movq 80(%rsi), %rdx # 缺少了 %rcx的恢复?
movq 88(%rsi), %rcx # 缺少了 %rcx的恢复?
movq 96(%rsi), %rbx
movq 104(%rsi), %rsp
movq 56(%rsi), %rdi // param
movq 64(%rsi), %rsi // param
pushq %rax
xorl %eax, %eax
ret 另外,__i386__的实现: movl 0(%esp), %edx // ret func addr
movl %edx, 0(%eax) 能直接这样吗: movl 0(%esp), 0(%eax) // ret func addr 对汇编不是很懂,如有错误,麻烦指正,谢谢。 |
能麻烦告知在当前有bug的实现中具体是哪句违反了Sys V ABI的约定吗,不是很懂~^o^~~^o^~,
leaq 112(%rdi),%rsp 这一句把rsp堆栈指针指向了堆内存resg[13] ,然后后面的操作都是在调用push指令,因为堆栈的生长方向是由高到低的所以push指令把对应寄存器中的值按顺序push到resg[13]==>resg[0]中。问题出在不该把rsp堆栈指针指向了堆内存吗?
可是在libaco的实现中mov rsp,rcx这句话不也同样把rsp堆栈指针指向了我们主动malloc出来的堆内存中吗?还是说问题出在leaq 112(%rdi),%rsp 后rsp堆栈指针没有指向用来模拟运行栈帧的堆内存的栈顶。 |
这里感觉回答有点乱,我整理一下: Bug2: |
Issue的最下方有BUG的详细中文描述,十分感谢。
Stack Pointer
The ABI states that the
(E|R)SP
should always point to the end of the latest allocated stack frame. But in file coctx_swap.S of libco, the(E|R)SP
had been used to address the memory on the heap.Terrible things may happend if the
(E|R)SP
is pointing to the data structure on the heap when signal comes. (Using thebreakpoint
andsignal
commands of gdb could produce such bug conveniently. Although by usingsigalstack
to change the default signal stack could alleviate the problem, but still, that kind of usage of(E|R)SP
still violates the ABI.)Control Words of x87 FPU and MXCSR
The control words of x87 FPU and MXCSR should be preserved across function calls (callee saved) in the Sys V ABI of i386 and AMD64. But in the current implementation of
coctx_swap
, there is no saving/restoring stuff about them.The text was updated successfully, but these errors were encountered: