-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot. #603
Comments
ACK, will start investigation this issue. |
I found there are 2 place have infinite loop issue: We need exit these infinite loop when receive SIGTERM by add a SIGTERM handler and check SIGTERM status in loop. also we may need check if other signal also need handle, for example SIGKILL Also need check if some other place in swss common have same issue. |
@qiluo-msft any reason the issue is not opened on buildimage for tracking? i believe we only track it and not submodules. |
To minimize code change and make code scalable, my proposal is: |
Here is a draft PR for this issue:
|
Update: here is latest example:
^C>>> |
If it is obvious issue in submodule, we will move it to submodule's repo. |
@liuh-80, @qiluo-msft as this one is blocking sonic-net/sonic-buildimage#10510. can you please see how we can expedite the solution and the review? |
…ystemd (#2133)" (#2161) This reverts commit 23e9398. - What I did Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
…ystemd (#2133)" (#2166) - What I did This reverts commit a5f55aa. Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed. - How I did it git revert a5f55aa - How to verify it Run tests
…token support. (#606) Why I did it There are infinite loops inside PubSub::listen() method, so application using this method can't handle SIGTERM correctly. #603 How I did it Add following class: 1. CancellationToken: this class will help exist the infinite loops when SIGTERM or other signal happen. 2. SignalHandlerHelper: Provide a native signal handler. How to verify it 1. manually test. 2. Pass all test case.
…token support. (sonic-net#606) Why I did it There are infinite loops inside PubSub::listen() method, so application using this method can't handle SIGTERM correctly. sonic-net#603 How I did it Add following class: 1. CancellationToken: this class will help exist the infinite loops when SIGTERM or other signal happen. 2. SignalHandlerHelper: Provide a native signal handler. How to verify it 1. manually test. 2. Pass all test case.
…ystemd (#2133)" (#2161) This reverts commit 23e9398. - What I did Revert "[scripts/fast-reboot] Shutdown remaining containers through systemd (#2133)" This reverted PR is part of a story that refactors warm/fast shutdown sequence to gracefully stop services instead of killing them without any ordering and dependency requirements which creates several issues and is error prone for the future. This PR must come together with sonic-net/sonic-buildimage#10510. However, #10510 is blocked due to an issue in swss-common sonic-net/sonic-swss-common#603 And a fix by MSFT is in review sonic-net/sonic-swss-common#606 I am reverting it because its dependency is still blocked and we cannot update submodule pointer. Once the dependency of the reverted PR is resolved, it shall be re-committed.
There is a problem that hostcfgd blocks forever and does not react to SIGTERM causing a delay in warm boot.
When hitting this line https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-host-services/scripts/hostcfgd#L1241 .
Example:
@qiluo-msft This looks like a swss-common issue.
This wasn't seen for some reason during 500 warm boot tests on 202012 since warm reboot is executed early enough before hostcfgd stucks in a listen().
Originally posted by @stepanblyschak in sonic-net/sonic-buildimage#10510 (comment)
The text was updated successfully, but these errors were encountered: