-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: deadlock in runtime.findrunnable #35532
Comments
Your Also, does this issue happen in 1.13.x also ? |
Binary was built on a build server, And no, latest release works fine |
Thanks. Your Copying some runtime people @aclements @mknyszek |
Fixed that, thanks. |
That suggests a possible connection to the timer refactoring that @ianlancetaylor did this cycle. |
See previously #35375. |
Commit 99957b6 was just this morning. Was that the same version used to build the deadlocking binary? (Several sources of deadlocks have been fixed over the past week, so the precise version is important.) |
Yes, version in first message is precise |
Is there a way that we can recreate the problem ourselves? Are you sure that your program doesn't have a deadlock itself, such that there are no runnable goroutines? What leads you to think that this is a runtime problem? Can you try running the program with |
Just found it while trying new timers, will try to provide reproducible example. Might not be easy because tests/benchmarks are fine and I didn't find anything that might trigger this deadlock yet.
Pretty much sure, it's an app serving traffic in our advert network (30+ instances, around 20B requests daily), and this issue doesnt reproduce on 1.13.4. The are a lot of background running routines that update data, send metrics, etc, that stopped too.
Will do, post here when it's ready. |
No luck of getting deadlock with schedtrace (it slows down app a lot, maybe thats why it's not triggered) A little more digging into locked app. It still shows a little cpu usage. bt of threads that actually using some cpu: 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x0000000000465960 in runtime.systemstack_switch at /root/sdk/gotip/src/runtime/asm_amd64.s:330 2 0x000000000041cd24 in runtime.gcStart at /root/sdk/gotip/src/runtime/mgc.go:1307 3 0x000000000040e496 in runtime.mallocgc at /root/sdk/gotip/src/runtime/malloc.go:1129 4 0x000000000044d5bc in runtime.makeslice at /root/sdk/gotip/src/runtime/slice.go:49 ... 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cb7a in runtime.lock at /root/sdk/gotip/src/runtime/lock_futex.go:102 3 0x000000000045530f in runtime.addInitializedTimer at /root/sdk/gotip/src/runtime/time.go:344 4 0x0000000000455b2f in runtime.modtimer at /root/sdk/gotip/src/runtime/time.go:614 5 0x0000000000430db2 in internal/poll.runtime_pollSetDeadline at /root/sdk/gotip/src/runtime/netpoll.go:264 6 0x00000000004d1e79 in internal/poll.setDeadlineImpl at /root/sdk/gotip/src/internal/poll/fd_poll_runtime.go:155 7 0x000000000062d5e9 in net.(*conn).SetReadDeadline .... Examining more threads shows some other backtraces that hang locked 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cdff in runtime.notesleep at /root/sdk/gotip/src/runtime/lock_futex.go:151 3 0x000000000043bcf0 in runtime.stopm at /root/sdk/gotip/src/runtime/proc.go:1856 4 0x000000000043c714 in runtime.gcstopm at /root/sdk/gotip/src/runtime/proc.go:2056 5 0x000000000043dfc7 in runtime.schedule at /root/sdk/gotip/src/runtime/proc.go:2497 6 0x000000000043ec66 in runtime.goexit0 at /root/sdk/gotip/src/runtime/proc.go:2844 7 0x000000000046594b in runtime.mcall at /root/sdk/gotip/src/runtime/asm_amd64.s:318 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cb7a in runtime.lock at /root/sdk/gotip/src/runtime/lock_futex.go:102 3 0x000000000043176a in runtime.netpolldeadlineimpl at /root/sdk/gotip/src/runtime/netpoll.go:459 4 0x0000000000431a0d in runtime.netpollReadDeadline at /root/sdk/gotip/src/runtime/netpoll.go:503 5 0x0000000000456c64 in runtime.runOneTimer at /root/sdk/gotip/src/runtime/time.go:1128 6 0x0000000000456895 in runtime.runtimer at /root/sdk/gotip/src/runtime/time.go:1030 7 0x000000000043e100 in runtime.checkTimers at /root/sdk/gotip/src/runtime/proc.go:2626 8 0x000000000043ca6a in runtime.findrunnable at /root/sdk/gotip/src/runtime/proc.go:2200 9 0x000000000043de2c in runtime.schedule at /root/sdk/gotip/src/runtime/proc.go:2548 10 0x000000000043e27d in runtime.park_m at /root/sdk/gotip/src/runtime/proc.go:2688 11 0x000000000046594b in runtime.mcall at /root/sdk/gotip/src/runtime/asm_amd64.s:318 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cdff in runtime.notesleep at /root/sdk/gotip/src/runtime/lock_futex.go:151 3 0x000000000043bcf0 in runtime.stopm at /root/sdk/gotip/src/runtime/proc.go:1856 4 0x000000000043c714 in runtime.gcstopm at /root/sdk/gotip/src/runtime/proc.go:2056 5 0x000000000043dfc7 in runtime.schedule at /root/sdk/gotip/src/runtime/proc.go:2497 6 0x000000000043e476 in runtime.goschedImpl at /root/sdk/gotip/src/runtime/proc.go:2703 7 0x000000000043e6c4 in runtime.gopreempt_m at /root/sdk/gotip/src/runtime/proc.go:2731 8 0x000000000044f9be in runtime.newstack at /root/sdk/gotip/src/runtime/stack.go:1025 9 0x0000000000465aaf in runtime.morestack at /root/sdk/gotip/src/runtime/asm_amd64.s:449 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cdff in runtime.notesleep at /root/sdk/gotip/src/runtime/lock_futex.go:151 3 0x000000000043bcf0 in runtime.stopm at /root/sdk/gotip/src/runtime/proc.go:1856 4 0x000000000043f9c1 in runtime.exitsyscall0 at /root/sdk/gotip/src/runtime/proc.go:3257 5 0x000000000046594b in runtime.mcall at /root/sdk/gotip/src/runtime/asm_amd64.s:318 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cdff in runtime.notesleep at /root/sdk/gotip/src/runtime/lock_futex.go:151 3 0x000000000043bc12 in runtime.templateThread at /root/sdk/gotip/src/runtime/proc.go:1834 4 0x000000000043a8c3 in runtime.mstart1 at /root/sdk/gotip/src/runtime/proc.go:1125 5 0x000000000043a7de in runtime.mstart at /root/sdk/gotip/src/runtime/proc.go:1072 6 0x00000000004019bc in ??? at ?:-1 7 0x00007f065da76e88 in ??? at ?:-1 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040ced6 in runtime.notetsleep_internal at /root/sdk/gotip/src/runtime/lock_futex.go:174 3 0x000000000040d0dc in runtime.notetsleepg at /root/sdk/gotip/src/runtime/lock_futex.go:228 4 0x000000000044d3ac in os/signal.signal_recv at /root/sdk/gotip/src/runtime/sigqueue.go:147 5 0x00000000007a62d2 in os/signal.loop at /root/sdk/gotip/src/os/signal/signal_unix.go:23 6 0x0000000000467a71 in runtime.goexit at /root/sdk/gotip/src/runtime/asm_amd64.s:1375 0 0x0000000000469bb3 in runtime.futex at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 1 0x00000000004322d6 in runtime.futexsleep at /root/sdk/gotip/src/runtime/os_linux.go:44 2 0x000000000040cb7a in runtime.lock at /root/sdk/gotip/src/runtime/lock_futex.go:102 3 0x0000000000456fa0 in runtime.timeSleepUntil at /root/sdk/gotip/src/runtime/time.go:1255 4 0x0000000000442991 in runtime.sysmon at /root/sdk/gotip/src/runtime/proc.go:4478 5 0x000000000043a8c3 in runtime.mstart1 at /root/sdk/gotip/src/runtime/proc.go:1125 6 0x000000000043a7de in runtime.mstart at /root/sdk/gotip/src/runtime/proc.go:1072 7 0x00000000004019bc in ??? at ?:-1 error: input/output error (truncated) Will try to dig deeper. Tell me if there anything i could try to get closer to problem |
upd: running with |
Thanks, I think I see it.
So if your program changes the deadline of some network connection exactly when the previous deadline expires, a deadlock seems possible at least in theory. Fortunately CL 207348 which I sent earlier today should fix this problem. |
Change https://golang.org/cl/207348 mentions this issue: |
Running tip |
Thanks for testing it. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
No
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Built production app with latest go tip and run it on production traffic. App is highly concurrent and uses timers a lot.
What did you expect to see?
Serving requests as usual
What did you see instead?
Deadlock. App not accepting connections and not responding to os signals. All threads stuck on
dlv
Deadlock happens in random amount of time/served requests. Didn't find any patterns of when it happens. Increasing load (just setting higher weight on load balancer for that backend) helps trigger deadlock faster
The text was updated successfully, but these errors were encountered: