Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edge: After the last stream is disconnected, if the same stream is requested again within 3 seconds, it cannot be played smoothly. #2215

Closed
chia7 opened this issue Feb 25, 2021 · 4 comments · Fixed by #2707
Assignees
Labels
Bug It might be a bug. PullRequest Has PR or solution in issue. TransByAI Translated by AI/GPT.
Milestone

Comments

@chia7
Copy link

chia7 commented Feb 25, 2021

Description'

Please ensure that the markdown structure is maintained.

Please describe the issue you encountered here.
'
Make sure to maintain the markdown structure.

  1. SRS version: 4.0.65 (Leo)
  2. The log of SRS is as follows:
[2021-02-24 14:51:31.551][Trace][761][5806830r] HTTP #0 1.200.163.28:18569 GET http://4.3.2.1:8080/live/livestream.flv, content-length=-1
[2021-02-24 14:51:31.551][Trace][761][5806830r] new source, stream_url=/live/livestream
[2021-02-24 14:51:31.551][Trace][761][5806830r] http: mount flv stream for sid=/live/livestream, mount=/live/livestream.flv
[2021-02-24 14:51:31.551][Trace][761][5806830r] flv: source url=/live/livestream, is_edge=1, source_id=/
[2021-02-24 14:51:31.551][Trace][761][5806830r] ignore disabled exec for vhost=__defaultVhost__
[2021-02-24 14:51:31.552][Trace][761][5806830r] dispatch cached gop success. count=0, duration=-1
[2021-02-24 14:51:31.552][Trace][761][5806830r] create consumer, active=1, queue_size=-nan, jitter=10000000
[2021-02-24 14:51:31.552][Trace][761][5806830r] set fd=9 TCP_NODELAY 0=>1
[2021-02-24 14:51:31.552][Trace][761][5806830r] set fd=9, SO_SNDBUF=46080=>175000, buffer=350ms
[2021-02-24 14:51:31.552][Trace][761][5806830r] FLV /live/livestream.flv, encoder=FLV, nodelay=1, mw_sleep=350ms, cache=0, msgs=128
[2021-02-24 14:51:31.552][Trace][761][5806830r] update source_id=5806830r/5806830r
[2021-02-24 14:51:31.556][Trace][761][s57m2w6i] complex handshake success.
[2021-02-24 14:51:31.556][Trace][761][s57m2w6i] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2021-02-24 14:51:31.599][Trace][761][s57m2w6i] connected, version=4.0.65.0, ip=10.140.0.183, pid=17905, id=0, dsu=1
[2021-02-24 14:51:31.600][Trace][761][s57m2w6i] edge-pull publish url rtmp://4.3.2.2:1935/live/livestream, stream=livestream as livestream
[2021-02-24 14:51:31.600][Trace][761][s57m2w6i] edge change from 100 to state 101 (pull).
[2021-02-24 14:51:31.656][Trace][761][s57m2w6i] got metadata, width=352, height=624, vcodec=7, acodec=10
[2021-02-24 14:51:31.656][Trace][761][s57m2w6i] 4B audio sh, codec(10, profile=LC, 1channels, 0kbps, 44100HZ), flv(16bits, 1channels, 44100HZ)
[2021-02-24 14:51:31.656][Trace][761][s57m2w6i] 43B video sh, codec(7, profile=Baseline, level=3, 352x624, 0kbps, 0.0fps, 0.0s)
[2021-02-24 14:51:31.901][Trace][761][5806830r] update source_id=s57m2w6i/5806830r
[2021-02-24 14:51:31.901][Trace][761][5806830r] FLV: write header audio=1, video=1
[2021-02-24 14:51:37.506][Warn][761][s57m2w6i][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read
thread [761][s57m2w6i]: ingest() [src/app/srs_app_edge.cpp:339][errno=4]
thread [761][s57m2w6i]: recv_message() [src/protocol/srs_rtmp_stack.cpp:403][errno=4]
thread [761][s57m2w6i]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:913][errno=4]
thread [761][s57m2w6i]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:1008][errno=4]
thread [761][s57m2w6i]: grow() [src/protocol/srs_protocol_stream.cpp:179][errno=4]
thread [761][s57m2w6i]: read() [src/protocol/srs_service_st.cpp:519][errno=4]
[2021-02-24 14:51:38.296][Trace][761][w1210580] HTTP #0 1.200.163.28:18713 GET http://4.3.2.1:8080/live/livestream.flv, content-length=-1
[2021-02-24 14:51:38.296][Trace][761][w1210580] dispatch cached gop success. count=94, duration=1052
[2021-02-24 14:51:38.296][Trace][761][w1210580] create consumer, active=1, queue_size=0.00, jitter=10000000
[2021-02-24 14:51:38.296][Trace][761][w1210580] set fd=11 TCP_NODELAY 0=>1
[2021-02-24 14:51:38.296][Trace][761][w1210580] set fd=11, SO_SNDBUF=46080=>175000, buffer=350ms
[2021-02-24 14:51:38.296][Trace][761][w1210580] FLV /live/livestream.flv, encoder=FLV, nodelay=1, mw_sleep=350ms, cache=0, msgs=128
[2021-02-24 14:51:38.296][Trace][761][w1210580] FLV: write header audio=1, video=1
[2021-02-24 14:51:40.507][Trace][761][5806830r] cleanup when unpublish
[2021-02-24 14:51:40.507][Trace][761][5806830r] edge change from 101 to 300 then 0 (init).
[2021-02-24 14:51:40.507][Trace][761][5806830r] TCP: before dispose resource(HttpStream)(0x275f0e0), conns=2, zombies=0, ign=0, inz=0, ind=0
[2021-02-24 14:51:40.507][Warn][761][5806830r][4] client disconnect peer. ret=1007
[2021-02-24 14:51:40.507][Trace][761][m4981k2y] TCP: clear zombies=1 resources, conns=2, removing=0, unsubs=0
[2021-02-24 14:51:40.507][Trace][761][5806830r] TCP: disposing #0 resource(HttpStream)(0x275f0e0), conns=2, disposing=1, zombies=0
[2021-02-24 14:51:40.749][Trace][761][w1210580] TCP: before dispose resource(HttpStream)(0x28011b0), conns=1, zombies=0, ign=0, inz=0, ind=0
[2021-02-24 14:51:40.749][Warn][761][w1210580][4] server disconnect. ret=4040
[2021-02-24 14:51:40.749][Trace][761][m4981k2y] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2021-02-24 14:51:40.749][Trace][761][w1210580] TCP: disposing #0 resource(HttpStream)(0x28011b0), conns=1, disposing=1, zombies=0
  1. The configuration of SRS is as follows:

edge

listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;
srs_log_level       trace;

stats {
    network         0;
    disk            sda sdb xvda xvdb;
}

http_api {
    enabled         on;
    listen          1985;
}

http_server {
    enabled         on;
    listen          8080;
    dir             ./objs/nginx/html;
}

vhost __defaultVhost__ {
    cluster {
        mode            remote;
        origin          10.155.2.47:1935;
    }

    tcp_nodelay     on;
    min_latency     on;

    play {
        gop_cache       on;
        queue_length    10;
        mw_latency      100;
    }

    http_remux {
        enabled         on;
        mount           [vhost]/[app]/[stream].flv;
    }
}

origin

listen              1935;
max_connections     1000;
daemon              off;
srs_log_tank        console;
srs_log_level       trace;

stats {
    network         0;
    disk            sda sdb xvda xvdb;
}

http_api {
    enabled         on;
    listen          1985;
}

vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
    }

    tcp_nodelay     on;
    min_latency     on;

    publish {
        mr              off;
    }
}

Replay

How to replay bug?

Steps to reproduce the bug

Steps to reproduce the bug:

  1. Start streaming from the origin.
  2. Pull the stream from the edge.
  3. Disconnect the stream and restart pulling the same stream after 1 second.
  4. The second pull will be disconnected by the server.

Analyze the reason:

When the last stream is disconnected, it will go to SrsPlayEdge::on_all_client_stop, and then enter SrsEdgeIngester::stop.
At this point, when trd->stop() is executed, it takes about 3 seconds, and if there is a new stream request coming in, there will be a problem.

Expected Behavior (Expect)

> Describe your expectation
Please describe what you expect to happen.

Hope that even if the last stream is disconnected, requesting the same stream again within 3 seconds can still play smoothly and normally.

TRANS_BY_GPT3

@winlinvip winlinvip added the Bug It might be a bug. label Mar 1, 2021
@winlinvip winlinvip added this to the SRS 3.0 release milestone Mar 1, 2021
@chia7
Copy link
Author

chia7 commented Apr 6, 2021

After investigation, it was found that too much time was consumed during st_thread_join. Can I please ask @winlinvip for guidance on how to fix this?

[2021-04-06 03:26:45.135][Trace][12139][008457c1] HTTP #0 1.200.163.28:46965 GET http://4.3.2.1:8080/live/live.flv, content-length=-1
[2021-04-06 03:26:45.136][Trace][12139][008457c1] flv: source url=/live/live, is_edge=1, source_id=/051s80ei
[2021-04-06 03:26:45.136][Warn][12139][008457c1][11] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:45.136][Warn][12139][008457c1][11] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:45.136][Warn][12139][008457c1][11] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:45.136][Warn][12139][008457c1][4] SrsSTCoroutine: st_thread_join end
[2021-04-06 03:26:45.136][Trace][12139][008457c1] ignore disabled exec for vhost=__defaultVhost__
[2021-04-06 03:26:45.136][Trace][12139][008457c1] dispatch cached gop success. count=0, duration=-1
[2021-04-06 03:26:45.136][Trace][12139][008457c1] create consumer, active=1, queue_size=-nan, jitter=10000000
[2021-04-06 03:26:45.136][Trace][12139][008457c1] set fd=10 TCP_NODELAY 0=>1
[2021-04-06 03:26:45.136][Trace][12139][008457c1] set fd=10, SO_SNDBUF=87040=>50000, buffer=100ms
[2021-04-06 03:26:45.136][Trace][12139][008457c1] FLV /live/live.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128
[2021-04-06 03:26:45.136][Trace][12139][008457c1] update source_id=008457c1/051s80ei
[2021-04-06 03:26:45.140][Trace][12139][44qkh929] complex handshake success.
[2021-04-06 03:26:45.140][Trace][12139][44qkh929] protocol in.buffer=0, in.ack=0, out.ack=0, in.chunk=128, out.chunk=128
[2021-04-06 03:26:45.183][Trace][12139][44qkh929] connected, version=4.0.89.0, ip=10.140.0.187, pid=17767, id=0, dsu=1
[2021-04-06 03:26:45.184][Trace][12139][44qkh929] edge-pull publish url rtmp://104.199.212.229:1935/live/live, stream=live as live
[2021-04-06 03:26:45.184][Trace][12139][44qkh929] edge change from 100 to state 101 (pull).
[2021-04-06 03:26:45.186][Trace][12139][44qkh929] got metadata, width=1280, height=720, vcodec=7, acodec=2
[2021-04-06 03:26:45.186][Warn][12139][44qkh929][11] VIDEO: stream not monotonically increase, please open mix_correct.
[2021-04-06 03:26:45.186][Trace][12139][44qkh929] 52B video sh,  codec(7, profile=High, level=3.1, 1280x720, 0kbps, 0.0fps, 0.0s)
[2021-04-06 03:26:45.235][Trace][12139][008457c1] update source_id=44qkh929/051s80ei
[2021-04-06 03:26:45.236][Trace][12139][008457c1] FLV: write header audio=1, video=1
[2021-04-06 03:26:47.448][Warn][12139][008457c1][32] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:47.448][Warn][12139][008457c1][32] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:47.448][Warn][12139][008457c1][32] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:47.448][Warn][12139][008457c1][32] SrsSTCoroutine: st_thread_join end
[2021-04-06 03:26:47.449][Warn][12139][008457c1][32] SrsEdgeIngester: trd->stop start
[2021-04-06 03:26:47.449][Warn][12139][008457c1][32] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:47.449][Warn][12139][008457c1][32] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:47.449][Warn][12139][008457c1][32] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:47.449][Warn][12139][44qkh929][4] origin disconnected, retry, error code=1007 : recv message : recv interlaced message : read basic header : basic header requires 1 bytes : read bytes : read
thread [12139][44qkh929]: ingest() [src/app/srs_app_edge.cpp:341][errno=4]
thread [12139][44qkh929]: recv_message() [src/protocol/srs_rtmp_stack.cpp:403][errno=4]
thread [12139][44qkh929]: recv_interlaced_message() [src/protocol/srs_rtmp_stack.cpp:913][errno=4]
thread [12139][44qkh929]: read_basic_header() [src/protocol/srs_rtmp_stack.cpp:1008][errno=4]
thread [12139][44qkh929]: grow() [src/protocol/srs_protocol_stream.cpp:179][errno=4]
thread [12139][44qkh929]: read() [src/protocol/srs_service_st.cpp:524][errno=4]
[2021-04-06 03:26:48.648][Trace][12139][m7e5xna0] HTTP #0 1.200.163.28:44327 GET http://4.3.2.1:8080/live/live.flv, content-length=-1
[2021-04-06 03:26:48.649][Trace][12139][m7e5xna0] dispatch cached gop success. count=395, duration=5777
[2021-04-06 03:26:48.649][Trace][12139][m7e5xna0] create consumer, active=1, queue_size=0.00, jitter=10000000
[2021-04-06 03:26:48.649][Trace][12139][m7e5xna0] set fd=12 TCP_NODELAY 0=>1
[2021-04-06 03:26:48.649][Trace][12139][m7e5xna0] set fd=12, SO_SNDBUF=87040=>50000, buffer=100ms
[2021-04-06 03:26:48.649][Trace][12139][m7e5xna0] FLV /live/live.flv, encoder=FLV, nodelay=1, mw_sleep=100ms, cache=0, msgs=128
[2021-04-06 03:26:48.649][Trace][12139][m7e5xna0] FLV: write header audio=1, video=1
[2021-04-06 03:26:50.448][Warn][12139][008457c1][11] SrsSTCoroutine: st_thread_join end
[2021-04-06 03:26:50.449][Warn][12139][008457c1][11] SrsEdgeIngester: strd->stop end
[2021-04-06 03:26:50.449][Warn][12139][008457c1][11] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:50.449][Warn][12139][008457c1][11] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:50.449][Warn][12139][008457c1][11] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:50.449][Warn][12139][008457c1][4] SrsSTCoroutine: st_thread_join end
[2021-04-06 03:26:50.449][Trace][12139][008457c1] cleanup when unpublish
[2021-04-06 03:26:50.449][Trace][12139][008457c1] edge change from 101 to 300 then 0 (init).
[2021-04-06 03:26:50.449][Trace][12139][008457c1] TCP: before dispose resource(HttpStream)(0x28a5fc0), conns=2, zombies=0, ign=0, inz=0, ind=0
[2021-04-06 03:26:50.449][Warn][12139][008457c1][4] client disconnect peer. ret=1009
[2021-04-06 03:26:50.449][Trace][12139][5sh036w4] TCP: clear zombies=1 resources, conns=2, removing=0, unsubs=0
[2021-04-06 03:26:50.449][Trace][12139][008457c1] TCP: disposing #0 resource(HttpStream)(0x28a5fc0), conns=2, disposing=1, zombies=0
[2021-04-06 03:26:50.449][Warn][12139][008457c1][4] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:50.449][Warn][12139][008457c1][4] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:50.449][Warn][12139][008457c1][4] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:50.449][Warn][12139][008457c1][4] SrsSTCoroutine: st_thread_join end
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][11] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][11] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][11] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][4] SrsSTCoroutine: st_thread_join end
[2021-04-06 03:26:51.688][Trace][12139][m7e5xna0] TCP: before dispose resource(HttpStream)(0x2ae7aa0), conns=1, zombies=0, ign=0, inz=0, ind=0
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][4] server disconnect. ret=4040
[2021-04-06 03:26:51.688][Trace][12139][5sh036w4] TCP: clear zombies=1 resources, conns=1, removing=0, unsubs=0
[2021-04-06 03:26:51.688][Trace][12139][m7e5xna0] TCP: disposing #0 resource(HttpStream)(0x2ae7aa0), conns=1, disposing=1, zombies=0
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][4] SrsSTCoroutine: interrupt start
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][4] SrsSTCoroutine: interrupt end
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][4] SrsSTCoroutine: st_thread_join start
[2021-04-06 03:26:51.688][Warn][12139][m7e5xna0][4] SrsSTCoroutine: st_thread_join end

TRANS_BY_GPT3

@chia7
Copy link
Author

chia7 commented Apr 6, 2021

@winlinvip Upon further investigation, it was found that the 3-second delay caused by srs_usleep(SRS_EDGE_INGESTER_CIMS) is the issue. Would removing this line directly cause any other problems? Or do you have any suggestions on how to fix it?

// when error, edge ingester sleep for a while and retry.
#define SRS_EDGE_INGESTER_CIMS (3 * SRS_UTIME_SECONDS)

srs_error_t SrsEdgeIngester::cycle()
{
    srs_error_t err = srs_success;
    
    while (true) {
        // We always check status first.
        // @see https://github.com/ossrs/srs/issues/1634#issuecomment-597571561
        if ((err = trd->pull()) != srs_success) {
            return srs_error_wrap(err, "edge ingester");
        }

        if ((err = do_cycle()) != srs_success) {
            srs_warn("EdgeIngester: Ignore error, %s", srs_error_desc(err).c_str());
            srs_freep(err);
        }

        srs_usleep(SRS_EDGE_INGESTER_CIMS);
    }
    
    return err;
}

TRANS_BY_GPT3

@zhouxiaojun2008
Copy link
Contributor

zhouxiaojun2008 commented Jun 15, 2021

I also encountered the same problem. The reason for the issue is similar to the original poster's. I searched on Git and indeed found someone who had encountered a similar problem. Changing the time of srs_usleep is not a good solution, as there still exists a window of time where the problem can occur, although the probability is relatively low.

TRANS_BY_GPT3

@HowQuitVim
Copy link

HowQuitVim commented Aug 19, 2021

@winlinvip
Can this be resolved in this way?


srs_error_t SrsEdgeIngester::cycle()
{
    srs_error_t err = srs_success;
    
    while (true) {
        // We always check status first.
        // @see https://github.com/ossrs/srs/issues/1634#issuecomment-597571561
        if ((err = trd->pull()) != srs_success) {
            return srs_error_wrap(err, "edge ingester");
        }

        if ((err = do_cycle()) != srs_success) {
            srs_warn("EdgeIngester: Ignore error, %s", srs_error_desc(err).c_str());
            srs_freep(err);
           //fix issue 2215
            srs_usleep(SRS_EDGE_INGESTER_CIMS);
        }
       //fix issue 2215
       //srs_usleep(SRS_EDGE_INGESTER_CIMS);
    }
    
    return err;
}

TRANS_BY_GPT3

@winlinvip winlinvip changed the title srs 集群边缘模式客户端拉流异常 Edge: 最后一个拉流断开后,在3秒内再次请求相同的流,无法正常顺畅播放 Aug 21, 2021
@winlinvip winlinvip self-assigned this Aug 21, 2021
@duiniuluantanqin duiniuluantanqin added the PullRequest Has PR or solution in issue. label Nov 3, 2021
@duiniuluantanqin duiniuluantanqin linked a pull request Nov 3, 2021 that will close this issue
@winlinvip winlinvip changed the title Edge: 最后一个拉流断开后,在3秒内再次请求相同的流,无法正常顺畅播放 Edge: After the last stream is disconnected, if the same stream is requested again within 3 seconds, it cannot be played smoothly. Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug It might be a bug. PullRequest Has PR or solution in issue. TransByAI Translated by AI/GPT.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants