-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SONiC docker containers show as "Exited" after config-reload #7180
Comments
Observed the same issue and also investigated down to systemd/systemd#13124. In my scenario the switch stuck at "systemctl try-restart systemd-timesyncd.service". @lguohan How can we take this fix systemd/systemd#13124 into SONiC? We currently do not build systemd from sources or should we use newer from buster-backports? |
@stepanblyschak - could you please look into this issue, thanks. |
PR - #7228 |
Fix #7180 Update systemd to v247 in order to pick the fix for "core: coldplug possible nop_job" systemd/systemd#13124 Install systemd, systemd-sysv from buster-backports. Pass "systemd.unified_cgroup_hierarchy=0" as kernel argument to force systemd to not use unified cgroup hierarchy, otherwise dockerd won't start moby/moby#16238. Also, chown $FILSYSTEM_ROOT for root, otherwise apt systemd installation complains, see similar https://unix.stackexchange.com/questions/593529/can-not-configure-systemd-inside-a-chrooted-environment Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Fix #7180 Update systemd to v247 in order to pick the fix for "core: coldplug possible nop_job" systemd/systemd#13124 Install systemd, systemd-sysv from buster-backports. Pass "systemd.unified_cgroup_hierarchy=0" as kernel argument to force systemd to not use unified cgroup hierarchy, otherwise dockerd won't start moby/moby#16238. Also, chown $FILSYSTEM_ROOT for root, otherwise apt systemd installation complains, see similar https://unix.stackexchange.com/questions/593529/can-not-configure-systemd-inside-a-chrooted-environment Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Relevant for 202012 as systemd 247 was reverted. |
…#7228) Fix sonic-net#7180 Update systemd to v247 in order to pick the fix for "core: coldplug possible nop_job" systemd/systemd#13124 Install systemd, systemd-sysv from buster-backports. Pass "systemd.unified_cgroup_hierarchy=0" as kernel argument to force systemd to not use unified cgroup hierarchy, otherwise dockerd won't start moby/moby#16238. Also, chown $FILSYSTEM_ROOT for root, otherwise apt systemd installation complains, see similar https://unix.stackexchange.com/questions/593529/can-not-configure-systemd-inside-a-chrooted-environment Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
…#7228) Fix sonic-net#7180 Update systemd to v247 in order to pick the fix for "core: coldplug possible nop_job" systemd/systemd#13124 Install systemd, systemd-sysv from buster-backports. Pass "systemd.unified_cgroup_hierarchy=0" as kernel argument to force systemd to not use unified cgroup hierarchy, otherwise dockerd won't start moby/moby#16238. Also, chown $FILSYSTEM_ROOT for root, otherwise apt systemd installation complains, see similar https://unix.stackexchange.com/questions/593529/can-not-configure-systemd-inside-a-chrooted-environment Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Description
Occasionally on
config reload
we have observed that the docker containers fail to properly restart and show as "Exited" whendocker ps
is run.I have debugged this and the containers that fail to start all depend on the systemd service
interfaces-config.service
which has a start job that is hanging when this bug appears. This start job has an indefinite timeout so the entire system hangs indefinitely waiting on it to start.I determined that the start job is hanging where it executes an
ifupdown
command which then executessystemctl try-restart ntp.service
. When you executesystemctl list-jobs
you can see that a "nop" job has been inserted into the queue because the ntp service is not running. However it seems to get stuck in the queue and never execute.Execution trace of the locked up interfaces-config.service start job.
Systemd jobs showing the held up ntp nop job.
I found the following systemd issue which I believe this to be an instance of:
systemd/systemd#13124
I confirmed that the systemd version that we are running:
241
was released before this bug was reported and fixed. Additionally the infrequent nature of the systemd bug explains why we rarely see this occur in sonic in practice.I would recommend upgrading the version of systemd used by sonic to resolve this issue.
Attached below is the
show inventory
outputsonic_dump_r-lionfish-07_20210329_195525.tar.gz
Steps to reproduce the issue:
Describe the results you received:
The test fails because many of the docker containers are shown as "exited" with only
bgp
anddatabase
containers running.Describe the results you expected:
All containers back up and running, test passes successfully.
Output of
show version
:The text was updated successfully, but these errors were encountered: