Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warm-reboot] INIT_VIEW failure: syncd crash due to SIGABRT in VendorSai::create #8300

Closed
vaibhavhd opened this issue Jul 30, 2021 · 2 comments · Fixed by #8684
Closed

[warm-reboot] INIT_VIEW failure: syncd crash due to SIGABRT in VendorSai::create #8300

vaibhavhd opened this issue Jul 30, 2021 · 2 comments · Fixed by #8684

Comments

@vaibhavhd
Copy link
Contributor

Description

Warm reboot fails during INIT_VIEW during syncd::VendorSai::create

Based on coredump and syslog analysis, the create call failed due to failure to parse JSON file:
SAI_API_UNSPECIFIED:syncdb_data_file_read:2230 Failed to parse JSON: error -2

SYNCD crashes with SIGABRT and keeps warm restarting with every attempt crashing and generating coredump.

SAI version: 5.0.0.6-1

Steps to reproduce the issue:

  1. Install latest master image on Broadcom device.
  2. Run warm-reboot test with traffic
  3. The test would fail - check logs to confirm syncd crashed during INIT_VIEW
  4. Check the coredump file.

Describe the results you received:

SAI_REDIS

2021-07-30.20:45:32.273267|a|INIT_VIEW
2021-07-30.20:46:32.330114|A|SAI_STATUS_FAILURE
2021-07-30.20:47:17.179906|a|INIT_VIEW
2021-07-30.20:48:17.194151|A|SAI_STATUS_FAILURE
2021-07-30.20:49:01.843366|a|INIT_VIEW
2021-07-30.20:50:01.902074|A|SAI_STATUS_FAILURE
2021-07-30.20:50:57.679887|a|INIT_VIEW
2021-07-30.20:50:57.680702|A|SAI_STATUS_SUCCESS

SYSLOG

Jul 30 20:45:32.260646 str-7260cx3-acs-1 NOTICE swss#orchagent: :- main: --- Starting Orchestration Agent ---
Jul 30 20:45:32.272899 str-7260cx3-acs-1 NOTICE swss#orchagent: :- serverThreadFunction: begin

Jul 30 20:45:32.273437 str-7260cx3-acs-1 NOTICE swss#orchagent: :- notifySyncd: sending syncd: INIT_VIEW
Jul 30 20:45:32.949934 str-7260cx3-acs-1 NOTICE syncd#syncd: :- checkWarmStart: syncd doing warm start, restore count 1
Jul 30 20:45:46.214799 str-7260cx3-acs-1 ERR syncd#syncd: [none] SAI_API_UNSPECIFIED:syncdb_data_file_read:2230 Failed to parse JSON: error -2
Jul 30 20:45:49.952759 str-7260cx3-acs-1 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
Jul 30 20:45:49.952900 str-7260cx3-acs-1 INFO syncd#/supervisord: syncd [5] child /usr/bin/syncd exited status: 134
Jul 30 20:45:49.953438 str-7260cx3-acs-1 INFO syncd#supervisord 2021-07-30 20:45:49,953 INFO exited: syncd (exit status 3; not expected)
Jul 30 20:45:50.965118 str-7260cx3-acs-1 INFO syncd#/supervisor-proc-exit-listener: Process 'syncd' exited unexpectedly. Terminating supervisor 'syncd'

Jul 30 20:46:32.330549 str-7260cx3-acs-1 ERR swss#orchagent: :- initSaiRedis: Failed to notify syncd INIT_VIEW, rv:-1 gSwitchId 0
..
Jul 30 20:48:17.194373 str-7260cx3-acs-1 ERR swss#orchagent: :- initSaiRedis: Failed to notify syncd INIT_VIEW, rv:-1 gSwitchId 0
..
Jul 30 20:50:01.902245 str-7260cx3-acs-1 ERR swss#orchagent: :- initSaiRedis: Failed to notify syncd INIT_VIEW, rv:-1 gSwitchId 0

CORE ANALYSIS

Core was generated by `/usr/bin/syncd --diag -u -s -p /etc/sai.d/sai.profile'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f5caeb6e7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7f5cadf9c040 (LWP 31))]
(gdb) 
(gdb) 
(gdb) 
(gdb) 
(gdb) info threads
  Id   Target Id                      Frame 
* 1    Thread 0x7f5cadf9c040 (LWP 31) 0x00007f5caeb6e7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
  2    Thread 0x7f5cad794700 (LWP 38) 0x00007f5caf0ce00c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  3    Thread 0x7f5cadf95700 (LWP 34) 0x00007f5caec307ef in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
  4    Thread 0x7f5cac538700 (LWP 44) 0x00007f5caf0ce00c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  5    Thread 0x7f5c9bed6700 (LWP 53) 0x00007f5caf0ce00c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  6    Thread 0x7f5c93ecd700 (LWP 47) 0x00007f5caf0ce3f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  7    Thread 0x7f5ca229d700 (LWP 55) 0x00007f5caf0ce3f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  8    Thread 0x7f5ca22c1700 (LWP 48) 0x00007f5caf0ce3f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  9    Thread 0x7f5cac598680 (LWP 43) 0x00007f5caec27427 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
  10   Thread 0x7f5ca22b8700 (LWP 49) 0x00007f5caf0ce3f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  11   Thread 0x7f5ca22af700 (LWP 52) 0x00007f5caf0ce3f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  12   Thread 0x7f5cac00b700 (LWP 45) 0x00007f5caec27427 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
  13   Thread 0x7f5ca2294700 (LWP 56) 0x00007f5caf0ce3f9 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
  14   Thread 0x7f5ca22a6700 (LWP 54) 0x00007f5caf0ce00c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) 
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f5cadf9c040 (LWP 31))]
#0  0x00007f5caeb6e7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f5caeb6e7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f5caeb59535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f5cb352a9fa in syncdb_data_file_read () from /usr/lib/libsai.so.1
#3  0x00007f5cb3533b08 in syncdb_main () from /usr/lib/libsai.so.1
#4  0x00007f5cb35a5556 in _brcm_sai_dm_init () from /usr/lib/libsai.so.1
#5  0x00007f5cb33a8ae9 in ?? () from /usr/lib/libsai.so.1
#6  0x0000563c4a59cd53 in syncd::VendorSai::create (this=0x563c4ac6f8d0, objectType=SAI_OBJECT_TYPE_SWITCH, objectId=0x7ffdef107728, switchId=0, attr_count=4, attr_list=0x563c4ac7dbe0) at VendorSai.cpp:139
#7  0x0000563c4a553479 in syncd::Syncd::performWarmRestartSingleSwitch (this=0x563c4ac70d10, key=...) at /usr/include/c++/8/bits/stl_vector.h:805
#8  0x0000563c4a5538eb in syncd::Syncd::performWarmRestart (this=0x563c4ac70d10) at Syncd.cpp:3977
#9  0x0000563c4a553cd8 in syncd::Syncd::onSyncdStart (this=0x563c4ac70d10, warmStart=<optimized out>) at Syncd.cpp:3637
#10 0x0000563c4a553e8f in syncd::Syncd::run (this=this@entry=0x563c4ac70d10) at /usr/include/c++/8/bits/shared_ptr_base.h:1018
#11 0x0000563c4a540d78 in syncd_main (argc=argc@entry=6, argv=argv@entry=0x7ffdef107f98) at syncd_main.cpp:71
#12 0x0000563c4a53f31e in main (argc=6, argv=0x7ffdef107f98) at main.cpp:9
(gdb) 

Describe the results you expected:

Output of show version:

# show ver

SONiC Software Version: SONiC.master.25538-5e435e05a
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: 5e435e05a
Build date: Mon Jul 26 14:39:47 UTC 2021
Built by: AzDevOps@sonic-build-workers-000J8D

Platform: x86_64-arista_7260cx3_64
HwSKU: Arista-7260CX3-D108C8
# docker exec -it syncd dpkg -s libsaibcm | head
Package: libsaibcm
Status: install ok installed
Priority: extra
Section: libs
Installed-Size: 245525
Maintainer: Guohan Lu <gulv@microsoft.com>
Architecture: amd64
Source: saibcm
Version: 5.0.0.6-1
Provides: libsai

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@gechiang
Copy link
Collaborator

case CS00012201827 filed against BRCM.

@vaibhavhd
Copy link
Contributor Author

BRCM reported that the fix should be present in latest SAI version 5.0.0.7. Leaving this issue open until image verification with the latest SAI.

lguohan pushed a commit that referenced this issue Sep 8, 2021
Catch up on fixes from BRCM SUG repo to pick up fixes after 5.0.0.6 all the way up to 5.0.0.8
Fixes include the following:
```
  CS00012201827: Warmreboot causes syncd crash with  SAI_API_UNSPECIFIED:syncdb_data_file_read:2230 Failed to parse JSON: error -2
  DNX: Fix for ACL table create with v6 next hdr attr
  and many unspecified changes that also went into 5.0.0.8
```
#### How to verify it
Preliminary tests looks fine on both XGS (gechiang) and DNX (judyjoseph)
On XGS testing done as following:
BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7260CX3 T0 DUT and all passed:
```
     ipfwd/test_dir_bcast.py
     fdb/test_fdb.py
     fib/test_fib.py
     vlan/test_valn.py
```
Also validated for for CS00012201827 (#8300)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants