Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bogus STREAM is spinning at 169874 calls per second and refuses to die, aborting now! #91

Closed
slw opened this issue Jul 21, 2020 · 8 comments

Comments

@slw
Copy link

slw commented Jul 21, 2020

Jul 21 16:16:08 msk-ixlrt-lv1 nuster[25910]: A bogus STREAM [0x380dba8900] is spinning at 169874 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x380dba8900 src=85.117.100.10 fe=live be=live dst=unknown rqf=64400000 rqa=4c000 rpf=80000000 rpa=0 sif=EST,200008 sib=INI,10 af=0x0,0 csf=0x381b7fd1c0,104000 ab=0x0,0 csb=0x0,0 cof=0x381b7f90c0,80001300:H1(0x381b79f1e0)/SSL(0x381b7fc000)/tcpv4(75) cob=0x0,0:NONE(0x0)/NONE(0x0)/NONE(0) filters={0x381b7fd100="nuster cache id"}]
pid 25910 (nuster), jid 0, uid 80: exited on signal 6 (core dumped)

Core was generated by `/usr/local/sbin/nuster -q -f /usr/local/etc/nuster.conf -p /var/run/nuster.pid -'.
Program terminated with signal SIGABRT, Aborted.
#0 0x0000000800cc01ba in thr_kill () from /lib/libc.so.7
[Current thread is 1 (LWP 101116)]
(gdb) bt
#0 0x0000000800cc01ba in thr_kill () from /lib/libc.so.7
#1 0x0000000800cbe5e4 in raise () from /lib/libc.so.7
#2 0x0000000800c327e9 in abort () from /lib/libc.so.7
#3 0x00000000004234b7 in stream_dump_and_crash (obj=0x380dba8a66, rate=169874) at src/stream.c:2691
#4 0x000000000041c8e8 in process_stream (t=0x381b79f0a0, context=0x380dba8900, state=4100) at src/stream.c:1512
#5 0x0000000000568a96 in run_tasks_from_lists (budgets=0x7fffde5f0e0c) at src/task.c:476
#6 0x0000000000569992 in process_runnable_tasks () at src/task.c:674
#7 0x00000000004ee785 in run_poll_loop () at src/haproxy.c:2953
#8 0x00000000004f47d2 in run_thread_poll_loop (data=0xd) at src/haproxy.c:3120
#9 0x00000008006ef736 in ?? () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()

@slw
Copy link
Author

slw commented Jul 21, 2020

Don't know -- I am don't controlled traffic.

global
        master-worker
        maxconn 81920
        nuster cache on data-size 192g
        nuster manager on uri /XXX purge-method MYPURGE
        tune.maxaccept -1
        tune.ssl.default-dh-param 2048
        tune.h2.max-concurrent-streams 8000
        nbproc 1
        nbthread 20
        cpu-map auto:1/1-20 0-19
        log /dev/log local0
        stats socket /var/run/haproxy.sock
        user www
        group www
        daemon

defaults
        mode http
        log global
        http-reuse always
        option http-keep-alive
        option httplog
        option redispatch
        retries 3
        maxconn 80000
        backlog 4096
        timeout connect 5000
        timeout client 50000
        timeout server 50000

frontend live
        mode http
        bind :80 backlog 1024 name clear
        bind :443 ssl crt /usr/local/etc/nginx/ssl/ backlog 1024 name clear
        http-request del-header ^X-Forwarded-For:.*
        option forwardfor
        maxconn 80000
        default_backend live

backend live
        mode http
        timeout connect 5s
        timeout server 5s
        nuster cache on
        nuster rule cacheRule ttl auto wait on use-stale 10
        server live1 XXXX:80

@slw
Copy link
Author

slw commented Jul 21, 2020

Last message in log from this IP is

Jul 21 16:16:07 msk-ixlrt-lv1 nuster[25910]: 85.117.100.10:40745 [21/Jul/2020:16:16:07.199] live~ live/<NUSTER.CACHE.ENGINE> 0/0/0/0/0 200 526 - - ---- 5/5/1/0/0 0/0 "GET /live/vanbur_720p/index.m3u8 HTTP/1.1"

@jiangwenyuan
Copy link
Owner

Can you share the output of thread apply all bt ?

@slw
Copy link
Author

slw commented Jul 22, 2020

Reading symbols from /usr/local/sbin/nuster...
[New LWP 101116]
[New LWP 100794]
[New LWP 101104]
[New LWP 101105]
[New LWP 101106]
[New LWP 101107]
[New LWP 101108]
[New LWP 101109]
[New LWP 101110]
[New LWP 101111]
[New LWP 101112]
[New LWP 101113]
[New LWP 101114]
[New LWP 101115]
[New LWP 101117]
[New LWP 101118]
[New LWP 101119]
[New LWP 101120]
[New LWP 101121]
[New LWP 101122]
Core was generated by `/usr/local/sbin/nuster -q -f /usr/local/etc/nuster.conf -p /var/run/nuster.pid -'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000000800cc01ba in thr_kill () from /lib/libc.so.7
[Current thread is 1 (LWP 101116)]
(gdb) thread apply all bt

Thread 20 (LWP 101122):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x13) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdd9eb000

Thread 19 (LWP 101121):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x12) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffddbec000

Thread 18 (LWP 101120):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x11) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffddded000

Thread 17 (LWP 101119):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x10) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffddfee000

Thread 16 (LWP 101118):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0xf) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffde1ef000

Thread 15 (LWP 101117):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0xe) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffde3f0000

Thread 14 (LWP 101115):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0xc) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffde7f2000

Thread 13 (LWP 101114):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0xb) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffde9f3000

Thread 12 (LWP 101113):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0xa) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdebf4000

Thread 11 (LWP 101112):
#0  0x0000000000567c85 in wake_expired_tasks () at src/task.c:276
#1  0x00000000004ee71e in run_poll_loop () at src/haproxy.c:2943
#2  0x00000000004f47d2 in run_thread_poll_loop (data=0x9) at src/haproxy.c:3120
--Type <RET> for more, q to quit, c to continue without paging--c
#3  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#4  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdedf5000

Thread 10 (LWP 101111):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x8) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdeff6000

Thread 9 (LWP 101110):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x7) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf1f7000

Thread 8 (LWP 101109):
#0  0x0000000800c5293a in _nanosleep () from /lib/libc.so.7
#1  0x00000008006f211c in ?? () from /lib/libthr.so.3
#2  0x0000000800ca8356 in usleep () from /lib/libc.so.7
#3  0x00000000005c9aee in _nst_cache_filter_http_headers (s=0x380f7d6900, filter=0x3819a5c240, msg=0x3819b84330) at src/nuster/cache/filter.c:258
#4  0x00000000004c62b5 in flt_analyze_http_headers (s=0x380f7d6900, chn=0x380f7d6910, an_bit=16384) at src/filters.c:770
#5  0x000000000041e2bb in process_stream (t=0x3817ff71e0, context=0x380f7d6900, state=4100) at src/stream.c:1787
#6  0x0000000000568a96 in run_tasks_from_lists (budgets=0x7fffdf3f7e0c) at src/task.c:476
#7  0x0000000000569992 in process_runnable_tasks () at src/task.c:674
#8  0x00000000004ee785 in run_poll_loop () at src/haproxy.c:2953
#9  0x00000000004f47d2 in run_thread_poll_loop (data=0x6) at src/haproxy.c:3120
#10 0x00000008006ef736 in ?? () from /lib/libthr.so.3
#11 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf3f8000

Thread 7 (LWP 101108):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x5) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf5f9000

Thread 6 (LWP 101107):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x4) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf7fa000

Thread 5 (LWP 101106):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x3) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdf9fb000

Thread 4 (LWP 101105):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x2) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfbfc000

Thread 3 (LWP 101104):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x1) at src/haproxy.c:3120
#5  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#6  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000

Thread 2 (LWP 100794):
#0  0x0000000800cab17a in _kevent () from /lib/libc.so.7
#1  0x00000008006f27e3 in ?? () from /lib/libthr.so.3
#2  0x00000000002fe374 in _do_poll (p=0x69afb0 <cur_poller>, exp=1904504656, wake=0) at src/ev_kqueue.c:156
#3  0x00000000004eeb12 in run_poll_loop () at src/haproxy.c:3000
#4  0x00000000004f47d2 in run_thread_poll_loop (data=0x0) at src/haproxy.c:3120
#5  0x00000000004f0bb9 in main (argc=8, argv=0x7fffffffe650) at src/haproxy.c:3822

Thread 1 (LWP 101116):
#0  0x0000000800cc01ba in thr_kill () from /lib/libc.so.7
#1  0x0000000800cbe5e4 in raise () from /lib/libc.so.7
#2  0x0000000800c327e9 in abort () from /lib/libc.so.7
#3  0x00000000004234b7 in stream_dump_and_crash (obj=0x380dba8a66, rate=169874) at src/stream.c:2691
#4  0x000000000041c8e8 in process_stream (t=0x381b79f0a0, context=0x380dba8900, state=4100) at src/stream.c:1512
#5  0x0000000000568a96 in run_tasks_from_lists (budgets=0x7fffde5f0e0c) at src/task.c:476
#6  0x0000000000569992 in process_runnable_tasks () at src/task.c:674
#7  0x00000000004ee785 in run_poll_loop () at src/haproxy.c:2953
#8  0x00000000004f47d2 in run_thread_poll_loop (data=0xd) at src/haproxy.c:3120
#9  0x00000008006ef736 in ?? () from /lib/libthr.so.3
#10 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffde5f1000

@jiangwenyuan
Copy link
Owner

Seems wait on is the culprit , can you remove wait on and see what happens? Thanks

@slw
Copy link
Author

slw commented Jul 22, 2020

OK, I am try.
I am have question about wait on

And see in log file next lines:

Jul 15 13:03:18 msk-ixlrt-lv1 nuster[25910]: 109.252.64.201:13551 [15/Jul/2020:13:03:18.470] live~ live/live1 0/0/0/0/29 200 340247 - - ---- 3/2/1/1/0 0/0 "GET /live/vanbur_720p/1594807394758.ts HTTP/1.1"
Jul 15 13:03:18 msk-ixlrt-lv1 nuster[25910]: 31.173.86.5:36554 [15/Jul/2020:13:03:18.470] live~ live/live1 0/0/0/0/88 200 340247 - - ---- 3/2/0/0/0 0/0 "GET /live/vanbur_720p/1594807394758.ts HTTP/1.1"

Is this normal?

@jiangwenyuan
Copy link
Owner

jiangwenyuan commented Jul 22, 2020

It may happen.

For example,

A             B              C                D
|-------------|--------------|----------------|

A: the time haproxy accept the request
B: the time nuster create an internal representation of the cache object
C: the time the request is sent to backend server
D: the time the response finished

Assuming there are two identical requests.

If request 1 is between B and C, request 2 will wait as the internal representation of the cache object has beed created.

But if the request 1 is between A and B, request 2 will NOT wait as at this time there's no internal representation, so request 2 will not wait and will be sent to backend sever. Which is your case.

[15/Jul/2020:13:03:18.470]
[15/Jul/2020:13:03:18.470]

Typically the time from A to B is very very short(~microsecond).

@jiangwenyuan
Copy link
Owner

@slw Hi, do you have any updates on this? Does removing wait on solve the problem?

jiangwenyuan added a commit that referenced this issue Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants