You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of this, syncd is stopped and Orchagent still pushing some request to syncd and in this flow for a bulk request there is no response (expected, as syncd is down) and an exception is thrown, which is not handled by swss-orchagent, eventually supervisord catch this exception and terminates the Orchagent process.
Jul 13 01:44:02.117199 ixs-7215-pizza4 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=False strip_empty_ends=True _raw_params=soft-reboot removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Jul 13 01:44:04.570998 ixs-7215-pizza4 NOTICE admin: Collecting logs to check ssd health before soft-reboot...
**syncd down log**
Jul 13 01:44:10.133423 ixs-7215-pizza4 NOTICE syncd#syncd_request_shutdown: :- loadFromFile: no context config specified, will load default context config
Jul 13 01:44:10.133423 ixs-7215-pizza4 NOTICE syncd#syncd_request_shutdown: :- insert: added switch: 0:
Jul 13 01:44:10.144254 ixs-7215-pizza4 NOTICE syncd#syncd_request_shutdown: :- send: requested COLD shutdown
**Orchagent pushing request and TIMESOUT**
Jul 13 01:50:13.555322 ixs-7215-pizza4 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Jul 13 01:50:13.555322 ixs-7215-pizza4 ERR swss#orchagent: :- wait: failed to get response for getresponse
Jul 13 01:50:13.555322 ixs-7215-pizza4 ERR swss#orchagent: :- set: set status: SAI_STATUS_FAILURE
Jul 13 01:50:13.555322 ixs-7215-pizza4 ERR swss#orchagent: :- setRouterIntfsMtu: Failed to set router interface PortChannel0004 MTU to 9100, rv:-1
Jul 13 01:50:16.451985 ixs-7215-pizza4 ERR monit[451]: 'container_checker' status failed (3) -- Expected containers not running: pmon
Jul 13 01:50:26.487275 ixs-7215-pizza4 ERR monit[451]: 'container_checker' status failed (3) -- Expected containers not running: pmon
Jul 13 01:50:36.516905 ixs-7215-pizza4 ERR monit[451]: 'container_checker' status failed (3) -- Expected containers not running: pmon
Jul 13 01:50:46.548083 ixs-7215-pizza4 ERR monit[451]: 'container_checker' status failed (3) -- Expected containers not running: pmon
Jul 13 01:50:46.769762 ixs-7215-pizza4 DEBUG /disk_check.py: /etc is Read-Write
Jul 13 01:50:46.770963 ixs-7215-pizza4 DEBUG /disk_check.py: /home is Read-Write
Jul 13 01:50:56.600778 ixs-7215-pizza4 ERR monit[451]: 'container_checker' status failed (3) -- Expected containers not running: pmon
Jul 13 01:51:06.625046 ixs-7215-pizza4 ERR monit[451]: 'container_checker' status failed (3) -- Expected containers not running: pmon
Jul 13 01:51:13.600351 ixs-7215-pizza4 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Jul 13 01:51:13.600351 ixs-7215-pizza4 ERR swss#orchagent: :- wait: failed to get response for getresponse
Jul 13 01:51:13.600351 ixs-7215-pizza4 ERR swss#orchagent: :- remove: remove status: SAI_STATUS_FAILURE
Jul 13 01:51:13.600351 ixs-7215-pizza4 ERR swss#orchagent: :- removeLagMember: Failed to remove member Ethernet51 from LAG PortChannel0004 lid:2000000000594 lmid:1b000000000598
Jul 13 01:52:14.035422 ixs-7215-pizza4 ERR swss#orchagent: :- wait: SELECT operation result: TIMEOUT on getresponse
Jul 13 01:52:14.035814 ixs-7215-pizza4 ERR swss#orchagent: :- wait: failed to get response for getresponse
Jul 13 01:52:14.036177 ixs-7215-pizza4 ERR swss#orchagent: :- waitForBulkResponse: wrong number of counters, got 0, expected 1000
Jul 13 01:52:14.036422 ixs-7215-pizza4 INFO swss#/supervisord: orchagent terminate called after throwing an instance of 'std::runtime_error'
Jul 13 01:52:14.036626 ixs-7215-pizza4 INFO swss#/supervisord: orchagent what(): :- waitForBulkResponse: wrong number of counters, got 0, expected 1000
Jul 13 01:52:15.447453 ixs-7215-pizza4 INFO ansible-command: Invoked with creates=None executable=None _uses_shell=False strip_empty_ends=True _raw_params=date +"%Y-%m-%d %H:%M:%S" removes=None argv=None warn=True chdir=None stdin_add_newline=True stdin=None
Jul 13 01:52:15.987562 ixs-7215-pizza4 INFO swss#supervisord 2021-07-13 01:52:15,986 INFO exited: orchagent (terminated by SIGABRT (core dumped); not expected)
**root@str-marvell-acs-1:~# show version**
SONiC Software Version: SONiC.202012.23973-1d3939b7f
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-armmp
Build commit: 1d3939b7f
Build date: Sat Jul 17 06:38:27 UTC 2021
Built by: AzDevOps@sonic-build-workers-0000DC
@lguohan and @rajkumar38 the sycd shutdown came from cold reboot script which soft-reboot was based off. As I understand, it was added to prevent some rare reboot failure - sonic-net/sonic-utilities#223 is one of the changes that introduce the syncd shutdown.
Issue happens in test-bed testing in t0-52 topology, while executing reboot test suite,
As part of this, syncd is stopped and Orchagent still pushing some request to syncd and in this flow for a bulk request there is no response (expected, as syncd is down) and an exception is thrown, which is not handled by swss-orchagent, eventually supervisord catch this exception and terminates the Orchagent process.
Attaching complete syslog.
syslog.txt
The text was updated successfully, but these errors were encountered: