Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fast-reboot] I2C failure after executed fast-reboot #267

Open
LiaoNeil opened this issue Jun 13, 2018 · 6 comments
Open

[fast-reboot] I2C failure after executed fast-reboot #267

LiaoNeil opened this issue Jun 13, 2018 · 6 comments
Assignees
Labels

Comments

@LiaoNeil
Copy link

Description

Sometime I2C topology will failure after executed "fast-reboot".

It cause by fast-reboot won't trigger release procedure (e.q: /ert/rc1~6.d/K*) like normal reboot, and boot into new kernel directly.

Usually, switch vendor will implement some control or monitor mechanism to handle their deivce via I2C, and it will generate I2C traffic periodically
It will make a situation about someone is using I2C during fast-reboot, then new kernel will initial I2C topology fail or some device under busy state.

Therefore, suggest to modify /usr/bin/fast-reboot to add either one as below:
(1) Trigger /ert/rc1~6.d/K* before reboot step of fast-reboot.

OR

(2) Trigger switch vendor's release procedure (/etc/init.d/platform-module-xxxxx stop) before reboot step of fast-reboot.
=> Usually, platform-module-xxxxx response for start/stop I2C topology / I2C relate driver.


Steps to reproduce the issue

In order to look pure behavior of SoNIC fast-reboot.
This reproduce procedure exclude switch vendor's module firstly.

  1. Exclude switch vendor's initial script
    => Comment out context of "start" part of /etc/init.d/platform-module-xxxxx, or remove the script.
    [Ex]
    root@SONiC-Inventec-d7054:~# cat /etc/init.d/platform-modules-d7054q28b
    ...
    start)
    echo -n "Setting up board... "
    depmod -a
    # /usr/local/bin/inventec_d7054_util.py -f install <<<<<<< comment out here!
    echo "done."
    ;;
    ...

  2. reboot system to ensure no switch vendor's module
    [Ex]
    root@SONiC-Inventec-d7054:# reboot

  3. Probe I2C modules and setup I2C topology manually.
    [Ex]
    root@SONiC-Inventec-d7054:# modprobe i2c-mux
    root@SONiC-Inventec-d7054:# modprobe i2c-mux-pca954x
    root@SONiC-Inventec-d7054:# modprobe i2c-dev
    root@SONiC-Inventec-d7054:# echo pca9548 0x71 > /sys/bus/i2c/devices/i2c-0/new_device

  4. Check I2C can be access
    [Ex]
    root@SONiC-Inventec-d7054:# ls /sys/bus/i2c/devices/
    0-0071 i2c-0 i2c-1 i2c-2 i2c-3 i2c-4 i2c-5 i2c-6 i2c-7 i2c-8
    root@SONiC-Inventec-d7054:#
    root@SONiC-Inventec-d7054:# i2cget -y 3 0x20 0
    0xff
    root@SONiC-Inventec-d7054:# i2cget -y 6 0x20 0
    0xff

  5. Prepare a I2C stress script
    [Ex]
    root@SONiC-Inventec-d7054:# cat stress_i2c.sh
    #!/bin/bash
    while [ 1 ]
    do
    i2cget -y 3 0x20 0 > /dev/null
    i2cget -y 6 0x20 0 > /dev/null
    done

  6. Execute stress script in background
    [Ex]
    root@SONiC-Inventec-d7054:# sh stress_i2c.sh &
    [1] 2430
    root@SONiC-Inventec-d7054:# sh stress_i2c.sh &
    [2] 2699
    root@SONiC-Inventec-d7054:# sh stress_i2c.sh &
    [3] 3614
    root@SONiC-Inventec-d7054:# sh stress_i2c.sh &
    [4] 4400
    root@SONiC-Inventec-d7054:# sh stress_i2c.sh &
    [5] 5323
    root@SONiC-Inventec-d7054:#

  7. Execute fast-reboot
    [Ex]
    root@SONiC-Inventec-d7054:# fast-reboot

  8. After fast-reboot, probe I2C modules and setup I2C topology manually.
    => The same with step-3

  9. Get issues
    [Ex]
    root@SONiC-Inventec-d7054:# i2cget -y 3 0x20 0
    [ 125.419639] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
    Error: Read failed
    root@SONiC-Inventec-d7054:# i2cget -y 6 0x20 0
    [ 131.274139] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
    Error: Read failed
    root@SONiC-Inventec-d7054:#

Note:

  1. We can find error msg like SMBus is busy.
  2. Please try it again if issues not happen.
    => Issues happen condition: someone is using I2C during fast-reboot

Describe the results you received

I2C can't be accessed after execute fast-reboot

root@SONiC-Inventec-d7054:# i2cget -y 3 0x20 0
[ 125.419639] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
Error: Read failed
root@SONiC-Inventec-d7054:# i2cget -y 6 0x20 0
[ 131.274139] i801_smbus 0000:00:1f.3: SMBus is busy, can't use it!
Error: Read failed
root@SONiC-Inventec-d7054:#


Describe the results you expected

I2C should be accessible

root@SONiC-Inventec-d7054:# i2cget -y 3 0x20 0
0xff
root@SONiC-Inventec-d7054:# i2cget -y 6 0x20 0
0xff


Additional information you deem important (e.g. issue happens only occasionally)

(1) Usually, this issues issue happens occasionally.
That because of the issues condition is "someone is using I2C during fast-reboot"

(2) This issues will be solved If you are invoke "/etc/init.d/platform-module-xxxxx stop" before final step (execute reboot) of fast-reboot. or invoke /etc/rc6.d/Kxxxx is ok too.


Output of show version

root@SONiC-Inventec-d7054:~# show version
SONiC Software Version: SONiC.HEAD.603-a917517
Distribution: Debian 8.10
Kernel: 3.16.0-5-amd64
Build commit: a917517
Build date: Sun May 27 07:05:24 UTC 2018
Built by: johnar@jenkins-worker-4

Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-brcm HEAD.603-a917517 6ea2d437d2af 331.8 MB
docker-syncd-brcm latest 6ea2d437d2af 331.8 MB
docker-orchagent-brcm HEAD.603-a917517 bf49606a6932 252.6 MB
docker-orchagent-brcm latest bf49606a6932 252.6 MB
docker-lldp-sv2 HEAD.603-a917517 b1e74bb1fae3 265.8 MB
docker-lldp-sv2 latest b1e74bb1fae3 265.8 MB
docker-dhcp-relay HEAD.603-a917517 084ac6122760 249.1 MB
docker-dhcp-relay latest 084ac6122760 249.1 MB
docker-database HEAD.603-a917517 1891a9d3a27d 247.8 MB
docker-database latest 1891a9d3a27d 247.8 MB
docker-teamd HEAD.603-a917517 2f5e9cfa61cb 252.3 MB
docker-teamd latest 2f5e9cfa61cb 252.3 MB
docker-snmp-sv2 HEAD.603-a917517 9f02dc564068 286.7 MB
docker-snmp-sv2 latest 9f02dc564068 286.7 MB
docker-router-advertiser HEAD.603-a917517 ea28efd33902 245.4 MB
docker-router-advertiser latest ea28efd33902 245.4 MB
docker-platform-monitor HEAD.603-a917517 b02bc911d236 276.7 MB
docker-platform-monitor latest b02bc911d236 276.7 MB
docker-fpm-quagga HEAD.603-a917517 f543a3f6da39 259.1 MB
docker-fpm-quagga latest f543a3f6da39 259.1 MB

root@SONiC-Inventec-d7054:~#

PS:
This log was dump after execute the reproduce procedure.
Due to it comment out the switch vendor's initial script.
Therefore some service maybe not normally.
sonic_dump_SONiC-Inventec-d7054_20180613_131514.tar.gz

@shivanangi
Copy link

Is the issue resolved?

@pavel-shirshov
Copy link
Contributor

Hey @jleveque What do you think on @LiaoNeil suggestions?

@lguohan
Copy link
Contributor

lguohan commented Sep 28, 2019

Usually, switch vendor will implement some control or monitor mechanism to handle their deivce via I2C, and it will generate I2C traffic periodically
It will make a situation about someone is using I2C during fast-reboot, then new kernel will initial I2C topology fail or some device under busy state.

I do not quite understand who is using the i2c bus during the fast reboot. the above statement says switch vendors which is not clear to me. who? the asic? the platform driver?

@lguohan
Copy link
Contributor

lguohan commented Sep 28, 2019

during the fast reboot after we did kexec, no user space program is running. during this period, who is using the i2c tree?

@pavel-shirshov
Copy link
Contributor

@lguohan
Is it possible to send i2c request and before getting a respond kexec into a new kernel.
We had this with ASIC drivers.
We fixed that by unloading drivers before fast-reboot.
I think we should shutdown all i2c drivers before kexec

@jleveque
Copy link
Contributor

I don't think it would hurt to shut down all platform drivers before kexec to ensure that devices aren't left in a bad state.

stepanblyschak pushed a commit to stepanblyschak/sonic-utilities that referenced this issue Apr 18, 2022
Update sonic-linux-kernel submodule to updated 202012 branch. This brings in the following commits....

```
e97f9fc [202012] Add upstreamed patches which backport support for registers for CPLD PNs (sonic-net#275)
58abcdc Merge pull request sonic-net#267 from Staphylo/202012-log-buf-len
3f16f4f Merge pull request sonic-net#268 from Staphylo/202012-emmc-fixes
a120ae7 Apply kernel patches to fix emmc unreliability
5f4a3f3 Increase log_buf_len to 1M for all architecture
```
mihirpat1 pushed a commit to mihirpat1/sonic-utilities that referenced this issue Sep 15, 2023
Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants