Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRT: There are performance bottlenecks and crashes during stress testing. #1944

Closed
zhouxiaojun2008 opened this issue Sep 9, 2020 · 4 comments · Fixed by #2429
Closed

SRT: There are performance bottlenecks and crashes during stress testing. #1944

zhouxiaojun2008 opened this issue Sep 9, 2020 · 4 comments · Fixed by #2429
Assignees
Labels
Enhancement Improvement or enhancement. PullRequest Has PR or solution in issue. SRT It's about SRT protocol. TransByAI Translated by AI/GPT.
Milestone

Comments

@zhouxiaojun2008
Copy link
Contributor

zhouxiaojun2008 commented Sep 9, 2020

Description'

Please ensure that the markdown structure is maintained.

Please describe the issue you encountered here.
'
Make sure to maintain the markdown structure.

  1. SRS version: XCORE-SRS/4.0.37(Leo)
  2. The log of SRS is as follows:
xxxxxxxxxxxx
  1. The configuration of SRS is as follows:
    Default configuration of srt.conf
    xxxxxxxxxxxx

**Replay**
The push streaming command is as follows:

ffmpeg -loglevel error -re -stream_loop -1 -i test264_1M.mp4 -vcodec copy -acodec copy -f mpegts srt://10.10.0.62:10080?streamid=#!::h=live/test264_1M.mp4_16,m=publish

When there are approximately 50 or more streams, it is easier for SRS to crash when I run the script to push 60 streams in a loop. The more streams there are, the more likely SRS will crash.

The crash issue has been almost identified and narrowed down to the following location:

srs_error_t rtmp_client::rtmp_write_packet(char type, uint32_t timestamp, char* data, int size) {
srs_error_t err = srs_success;
SrsSharedPtrMessage* msg = NULL;
There is no text to translate.
if ((err = srs_rtmp_create_msg(type, timestamp, data, size, _rtmp_conn_ptr->sid(), &msg)) != srs_success) {
return srs_error_wrap(err, "create message");
}
srs_assert(msg);
There is no text to translate.
// send out encoded msg.
if ((err = _rtmp_conn_ptr->send_and_free_message(msg)) != srs_success) {
close();
return srs_error_wrap(err, "send messages");
}

return err;

}

send_and_free_message' function fails when there are too many routes. The existing logic calls 'close' function, which releases the '_rtmp_conn_ptr' pointer. However, the code continues to use the null pointer '_rtmp_conn_ptr', leading to a crash. Therefore, there is a problem with the handling in this section.

Looking back, the root cause is the error in 'send_and_free_message'. To further investigate, we need to trace the logs and analyze the captured packets. It seems that the built-in RTMP server is struggling to handle the load, causing a 500ms timeout in the client's 'writev' operation. This suggests a performance bottleneck, but where exactly is the bottleneck?

If the 'srt2rtmp' module is not enabled and only SRT pushing and pulling are performed under the same test conditions, the CPU utilization (mainly referring to the SRS main thread utilization, as SRT forwarding is handled by a separate thread, including 'libsrt.so' which also creates threads) and memory usage are not high. Can this to some extent indicate a bottleneck in the built-in RTMP client?



`TRANS_BY_GPT3`
@winlinvip
Copy link
Member

winlinvip commented Dec 1, 2020

It would be even better if there is PR.

TRANS_BY_GPT3

@zhouxiaojun2008
Copy link
Contributor Author

zhouxiaojun2008 commented Dec 2, 2020

It would be even better if there is PR.

There is indeed a performance bottleneck in srt2rtmp. I ran it with perf and the bottleneck can be seen in the following graph, mainly in the parsing of H.264 NAL units.
Snipaste_2020-12-01_19-21-15

TRANS_BY_GPT3

@winlinvip winlinvip added Enhancement Improvement or enhancement. SRT It's about SRT protocol. labels Aug 30, 2021
@winlinvip winlinvip added this to the 5.0 milestone Aug 30, 2021
@winlinvip winlinvip added the PullRequest Has PR or solution in issue. label Aug 30, 2021
@winlinvip winlinvip changed the title muli pusher test srt performance will crash,and maybe have bottleneck? SRT:压力测试时有性能瓶颈,并且会崩溃 Aug 30, 2021
@winlinvip winlinvip reopened this Aug 30, 2021
@winlinvip winlinvip modified the milestones: 5.0, 4.0 Aug 30, 2021
@runner365 runner365 linked a pull request Aug 31, 2021 that will close this issue
@runner365
Copy link
Contributor

runner365 commented Aug 31, 2021

Description
'
Make sure to maintain the markdown structure.

Please describe the issue you encountered here.
Make sure to maintain the markdown structure.

  1. SRS version: XCORE-SRS/4.0.37(Leo)
  2. The log of SRS is as follows:
    Make sure to maintain the markdown structure.
xxxxxxxxxxxx
  1. The configuration of SRS is as follows:
    Make sure to maintain the markdown structure.
xxxxxxxxxxxx

Replay
The push streaming command is as follows:
Make sure to maintain the markdown structure.

ffmpeg -loglevel error -re -stream_loop -1 -i test264_1M.mp4 -vcodec copy -acodec copy -f mpegts srt://10.10.0.62:10080?streamid=#!::h=live/test264_1M.mp4_16,m=publish

When there are approximately 50 or more streams, it is more likely for my script to encounter difficulties in continuously pushing 60 streams. The more streams there are, the more likely SRS will crash.

In the process of troubleshooting the crash issue, we have narrowed down the location to the following:

srs_error_t rtmp_client::rtmp_write_packet(char type, uint32_t timestamp, char* data, int size) {
    srs_error_t err = srs_success;
    SrsSharedPtrMessage* msg = NULL;

    if ((err = srs_rtmp_create_msg(type, timestamp, data, size, _rtmp_conn_ptr->sid(), &msg)) != srs_success) {
        return srs_error_wrap(err, "create message");
    }
    srs_assert(msg);


    // send out encoded msg.
    if ((err = _rtmp_conn_ptr->send_and_free_message(msg)) != srs_success) {        
        close();
        return srs_error_wrap(err, "send messages");
    }
    
    return err;
}

send_and_free_message fails when there are too many routes. The existing logic calls close to release the _rtmp_conn_ptr pointer, but later continues to use the null pointer _rtmp_conn_ptr, leading to a crash. Therefore, there is a problem with the handling here.
Looking back, the root cause is the error in send_and_free_message. Continuing to trace the logs and analyze the packet capture, it seems that the built-in RTMP server is struggling to handle the situation, causing a 500ms timeout in the client's writev. So, personally, I feel that it is a performance bottleneck issue. But where is the performance bottleneck?
If the srt2rtmp module is not enabled and only SRT pushing and pulling are performed under the same test conditions, the CPU utilization (mainly referring to the SRS main thread utilization, as SRT forwarding is done in another thread, including libsrt.so which also opens a thread) and memory usage are not high. Can this to some extent indicate a bottleneck in the built-in RTMP client?

Hi, thank you for your PR. The crash issue has been resolved: link

TRANS_BY_GPT3

@runner365
Copy link
Contributor

runner365 commented Oct 22, 2021

As mentioned above: The problem has been resolved, close this issue.

TRANS_BY_GPT3

@winlinvip winlinvip changed the title SRT:压力测试时有性能瓶颈,并且会崩溃 SRT: There are performance bottlenecks and crashes during stress testing. Jul 28, 2023
@winlinvip winlinvip added the TransByAI Translated by AI/GPT. label Jul 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improvement or enhancement. PullRequest Has PR or solution in issue. SRT It's about SRT protocol. TransByAI Translated by AI/GPT.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants