Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIQ interrupt not triggering IRQ interrupt causing USB to dropout #151

Closed
gholling opened this issue Nov 3, 2012 · 9 comments
Closed

FIQ interrupt not triggering IRQ interrupt causing USB to dropout #151

gholling opened this issue Nov 3, 2012 · 9 comments

Comments

@gholling
Copy link

gholling commented Nov 3, 2012

This is the ethernet drop out many people are seeing when running their boxes for extended periods...

To test this wait until it drops out and then do:

sudo -s
echo 08 > /sys/devices/platform/bcm2708_usb/regoffset
cat /sys/devices/platform/bcm2708_usb/regvalue

I expect the value to be 0x00000031

If the value is 0x00000030 then do this:

echo 0x31 > /sys/devices/platform/bcm2708_usb/regvalue

If this fixes the ethernet it will be very useful to know...

@tgghtp
Copy link

tgghtp commented Nov 5, 2012

it fixes,

a test script found elsewhere, modified, running every minute from cron:

#--------------------------------------------------
#!/bin/sh
# cron script for checking wlan connectivity
IP_FOR_TEST="a.b.c.d"
PING_COUNT=1
PING="/bin/ping"
INTERFACE="eth0"
FFLAG="/opt/check_lan/stuck.fflg"
# ping test
$PING -c $PING_COUNT $IP_FOR_TEST > /dev/null 2> /dev/null
if [ $? -ge 1 ]
then
    logger "check_lan: $INTERFACE is down"
touch $FFLAG
date >> $FFLAG
echo 08 > /sys/devices/platform/bcm2708_usb/regoffset
cat /sys/devices/platform/bcm2708_usb/regvalue >> $FFLAG
echo 0x31 > /sys/devices/platform/bcm2708_usb/regvalue
cat /sys/devices/platform/bcm2708_usb/regvalue >> $FFLAG
else
    logger "check_lan: $INTERFACE is up"
#    rm -f $FFLAG 2>/dev/null
fi
#---------------------------------------------------------------

the resulting file:

root@raspberrypi:~# cat /opt/check_lan/stuck.fflg
Mo 5. Nov 06:58:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 08:42:01 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 09:21:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 09:57:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 10:24:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 11:35:04 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031

parts from syslog:

Nov  5 11:34:01 raspberrypi logger: check_lan: eth0 is up
Nov  5 11:34:10 ..................
Nov  5 11:34:20 raspberrypi kernel: [75050.559308] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:25 raspberrypi kernel: [75055.559353] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Nov  5 11:34:30 raspberrypi kernel: [75060.559384] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:35 raspberrypi kernel: [75065.559438] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
Nov  5 11:34:40 raspberrypi kernel: [75070.559486] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:45 raspberrypi kernel: [75075.559530] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Nov  5 11:34:50 raspberrypi kernel: [75080.559585] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:55 raspberrypi kernel: [75085.559620] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
Nov  5 11:35:01 raspberrypi kernel: [75091.659677] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:35:04 raspberrypi logger: check_lan: eth0 is down
Nov  5 11:35:04 ............................
Nov  5 11:35:05 ............................
Nov  5 11:35:11 ............................
Nov  5 11:36:01 raspberrypi logger: check_lan: eth0 is up
Nov  5 11:34:01 raspberrypi logger: check_lan: eth0 is up
Nov  5 11:34:10 ..................
Nov  5 11:34:20 raspberrypi kernel: [75050.559308] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:25 raspberrypi kernel: [75055.559353] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Nov  5 11:34:30 raspberrypi kernel: [75060.559384] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:35 raspberrypi kernel: [75065.559438] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
Nov  5 11:34:40 raspberrypi kernel: [75070.559486] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:45 raspberrypi kernel: [75075.559530] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Nov  5 11:34:50 raspberrypi kernel: [75080.559585] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:34:55 raspberrypi kernel: [75085.559620] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
Nov  5 11:35:01 raspberrypi kernel: [75091.659677] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Nov  5 11:35:04 raspberrypi logger: check_lan: eth0 is down
Nov  5 11:35:04 ........................
Nov  5 11:36:01 raspberrypi logger: check_lan: eth0 is up

hope this helps

@ghollingworth
Copy link

Can you now update and check again?

I believe this is now fixed, can you run a long term test and make sure the
problem has gone away?

On 5 November 2012 13:25, tgghtp notifications@github.com wrote:

it fixes,

a test script running evry minute from cron:

#--------------------------------------------------
#!/bin/sh
cron script for checking wlan connectivity

IP_FOR_TEST="a.b.c.d"
PING_COUNT=1

PING="/bin/ping"

INTERFACE="eth0"

FFLAG="/opt/check_lan/stuck.fflg"
ping test

$PING -c $PING_COUNT $IP_FOR_TEST > /dev/null 2> /dev/null
if [ $? -ge 1 ]
then
logger "check_lan: $INTERFACE is down"

touch $FFLAG
date >> $FFLAG
echo 08 > /sys/devices/platform/bcm2708_usb/regoffset
cat /sys/devices/platform/bcm2708_usb/regvalue >> $FFLAG

echo 0x31 > /sys/devices/platform/bcm2708_usb/regvalue
cat /sys/devices/platform/bcm2708_usb/regvalue >> $FFLAG

else
logger "check_lan: $INTERFACE is up"
rm -f $FFLAG 2>/dev/null

fi
#---------------------------------------------------------------

the resulting file:

root@raspberrypi:~# cat /opt/check_lan/stuck.fflg
Mo 5. Nov 06:58:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 08:42:01 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 09:21:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 09:57:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 10:24:11 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031
Mo 5. Nov 11:35:04 CET 2012
Reg@0x000008 = 0x00000030
Reg@0x000008 = 0x00000031

parts from syslog:

Nov 5 11:34:01 raspberrypi logger: check_lan: eth0 is up
Nov 5 11:34:10 ..................
Nov 5 11:34:20 raspberrypi kernel: [75050.559308] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:25 raspberrypi kernel: [75055.559353] smsc95xx 1-1.1:1.0:
eth0: Failed to write register index 0x00000114
Nov 5 11:34:30 raspberrypi kernel: [75060.559384] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:35 raspberrypi kernel: [75065.559438] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000118
Nov 5 11:34:40 raspberrypi kernel: [75070.559486] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:45 raspberrypi kernel: [75075.559530] smsc95xx 1-1.1:1.0:
eth0: Failed to write register index 0x00000114
Nov 5 11:34:50 raspberrypi kernel: [75080.559585] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:55 raspberrypi kernel: [75085.559620] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000118
Nov 5 11:35:01 raspberrypi kernel: [75091.659677] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:35:04 raspberrypi logger: check_lan: eth0 is down
Nov 5 11:35:04 ............................
Nov 5 11:35:05 ............................
Nov 5 11:35:11 ............................
Nov 5 11:36:01 raspberrypi logger: check_lan: eth0 is up

Nov 5 11:34:01 raspberrypi logger: check_lan: eth0 is up
Nov 5 11:34:10 ..................
Nov 5 11:34:20 raspberrypi kernel: [75050.559308] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:25 raspberrypi kernel: [75055.559353] smsc95xx 1-1.1:1.0:
eth0: Failed to write register index 0x00000114
Nov 5 11:34:30 raspberrypi kernel: [75060.559384] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:35 raspberrypi kernel: [75065.559438] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000118
Nov 5 11:34:40 raspberrypi kernel: [75070.559486] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:45 raspberrypi kernel: [75075.559530] smsc95xx 1-1.1:1.0:
eth0: Failed to write register index 0x00000114
Nov 5 11:34:50 raspberrypi kernel: [75080.559585] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:34:55 raspberrypi kernel: [75085.559620] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000118
Nov 5 11:35:01 raspberrypi kernel: [75091.659677] smsc95xx 1-1.1:1.0:
eth0: Failed to read register index 0x00000114
Nov 5 11:35:04 raspberrypi logger: check_lan: eth0 is down
Nov 5 11:35:04 ........................
Nov 5 11:36:01 raspberrypi logger: check_lan: eth0 is up

hope this helps


Reply to this email directly or view it on GitHubhttps://github.com//issues/151#issuecomment-10070210.

@tgghtp
Copy link

tgghtp commented Nov 6, 2012

Update: from where ?

root@raspberrypi:~# cat /etc/issue
Debian GNU/Linux wheezy/sid \n \l
root@raspberrypi:~# uname -a
Linux raspberrypi 3.2.27+ #250 PREEMPT Thu Oct 18 19:03:02 BST 2012 armv6l GNU/Linux
root@raspberrypi:~# apt-get update
Holen: 1 http://mirrordirector.raspbian.org wheezy InRelease [12,5 kB]
OK   http://archive.raspberrypi.org wheezy InRelease
OK   http://archive.raspberrypi.org wheezy/main armhf Packages
Holen: 2 http://mirrordirector.raspbian.org wheezy/main armhf Packages [7.377 kB]
Ign http://archive.raspberrypi.org wheezy/main Translation-de_DE
Ign http://archive.raspberrypi.org wheezy/main Translation-de
Ign http://archive.raspberrypi.org wheezy/main Translation-en
Holen: 3 http://mirrordirector.raspbian.org wheezy/contrib armhf Packages [23,3 kB]
Holen: 4 http://mirrordirector.raspbian.org wheezy/non-free armhf Packages [47,3 kB]
Holen: 5 http://mirrordirector.raspbian.org wheezy/rpi armhf Packages [14 B]
Ign http://mirrordirector.raspbian.org wheezy/contrib Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/contrib Translation-de
Ign http://mirrordirector.raspbian.org wheezy/contrib Translation-en
Ign http://mirrordirector.raspbian.org wheezy/main Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/main Translation-de
Ign http://mirrordirector.raspbian.org wheezy/main Translation-en
Ign http://mirrordirector.raspbian.org wheezy/non-free Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/non-free Translation-de
Ign http://mirrordirector.raspbian.org wheezy/non-free Translation-en
Ign http://mirrordirector.raspbian.org wheezy/rpi Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/rpi Translation-de
Ign http://mirrordirector.raspbian.org wheezy/rpi Translation-en
Es wurden 7.460 kB in 58 s geholt (128 kB/s).
Paketlisten werden gelesen... Fertig

@asb
Copy link

asb commented Nov 6, 2012

From the git firmware repo, using e.g. rpi-update.

On 6 November 2012 16:58, tgghtp notifications@github.com wrote:

Update: from where ?

root@raspberrypi:~# cat /etc/issue
Debian GNU/Linux wheezy/sid \n \l

root@raspberrypi:~# uname -a
Linux raspberrypi 3.2.27+ #250 PREEMPT Thu Oct 18 19:03:02 BST 2012 armv6l GNU/Linux

root@raspberrypi:~# apt-get update
Holen: 1 http://mirrordirector.raspbian.org wheezy InRelease [12,5 kB]
OK http://archive.raspberrypi.org wheezy InRelease
OK http://archive.raspberrypi.org wheezy/main armhf Packages
Holen: 2 http://mirrordirector.raspbian.org wheezy/main armhf Packages [7.377 kB]
Ign http://archive.raspberrypi.org wheezy/main Translation-de_DE
Ign http://archive.raspberrypi.org wheezy/main Translation-de
Ign http://archive.raspberrypi.org wheezy/main Translation-en
Holen: 3 http://mirrordirector.raspbian.org wheezy/contrib armhf Packages [23,3 kB]
Holen: 4 http://mirrordirector.raspbian.org wheezy/non-free armhf Packages [47,3 kB]
Holen: 5 http://mirrordirector.raspbian.org wheezy/rpi armhf Packages [14 B]
Ign http://mirrordirector.raspbian.org wheezy/contrib Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/contrib Translation-de
Ign http://mirrordirector.raspbian.org wheezy/contrib Translation-en
Ign http://mirrordirector.raspbian.org wheezy/main Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/main Translation-de
Ign http://mirrordirector.raspbian.org wheezy/main Translation-en
Ign http://mirrordirector.raspbian.org wheezy/non-free Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/non-free Translation-de
Ign http://mirrordirector.raspbian.org wheezy/non-free Translation-en
Ign http://mirrordirector.raspbian.org wheezy/rpi Translation-de_DE
Ign http://mirrordirector.raspbian.org wheezy/rpi Translation-de
Ign http://mirrordirector.raspbian.org wheezy/rpi Translation-en
Es wurden 7.460 kB in 58 s geholt (128 kB/s).
Paketlisten werden gelesen... Fertig


Reply to this email directly or view it on GitHubhttps://github.com//issues/151#issuecomment-10118176.

@ghollingworth
Copy link

This seems to work as expected and fixes the ethernet dropping out problem

@RussNelson
Copy link
Contributor

This is a good fix for me, too. Any idea when this is going to get into the Raspbian distribution? Or should I just bite the bullet and start using rpi-update?

@licaon-kter
Copy link

Just wait for it to update in 1 or 2 months? :D Oh, come on,gogogo https://github.com/Hexxeh/rpi-update

@popcornmix
Copy link
Collaborator

I think apt-get will be updated to current rpi-update in the next few days.

popcornmix pushed a commit that referenced this issue Oct 8, 2014
Soothes the following checkpatch warnings:

    WARNING: line over 80 characters
    #151: FILE: drivers/mfd/ab8500-core.c:151:
    +	0, 1, 2, 3, 4, -1, -1, -1, -1, 11, 18, 19, 20, 21, 12, 13, 24, 5, 22, 23,

    ERROR: spaces required around that '=' (ctx:VxW)
    #325: FILE: drivers/mfd/ab8500-core.c:325:
    +	ret= mask_and_set_register_interruptible(ab8500, bank, reg,
     	   ^

    WARNING: line over 80 characters
    #418: FILE: drivers/mfd/ab8500-core.c:418:
    +		else if (offset >= AB9540_INT_GPIO50R && offset <= AB9540_INT_GPIO54R)

    WARNING: line over 80 characters
    #420: FILE: drivers/mfd/ab8500-core.c:420:
    +		else if (offset == AB8540_INT_GPIO43R || offset == AB8540_INT_GPIO44R)

    ERROR: spaces required around that '==' (ctx:VxV)
    #454: FILE: drivers/mfd/ab8500-core.c:454:
    +	if ((i==3) && (*offset >= 24))
     	      ^

    ERROR: code indent should use tabs where possible
    #576: FILE: drivers/mfd/ab8500-core.c:576:
    +        .map    = ab8500_irq_map,$

    WARNING: please, no spaces at the start of a line
    #576: FILE: drivers/mfd/ab8500-core.c:576:
    +        .map    = ab8500_irq_map,$

    ERROR: code indent should use tabs where possible
    #577: FILE: drivers/mfd/ab8500-core.c:577:
    +        .xlate  = irq_domain_xlate_twocell,$

    WARNING: please, no spaces at the start of a line
    #577: FILE: drivers/mfd/ab8500-core.c:577:
    +        .xlate  = irq_domain_xlate_twocell,$

    WARNING: char * array declaration might be better as static const
    #1554: FILE: drivers/mfd/ab8500-core.c:1554:
    +	static char *switch_off_status[] = {

    WARNING: char * array declaration might be better as static const
    #1563: FILE: drivers/mfd/ab8500-core.c:1563:
    +	static char *turn_on_status[] = {

    WARNING: sizeof *ab8500 should be sizeof(*ab8500)
    #1582: FILE: drivers/mfd/ab8500-core.c:1582:
    +	ab8500 = devm_kzalloc(&pdev->dev, sizeof *ab8500, GFP_KERNEL);

    ERROR: space required after that close brace '}'
    #1639: FILE: drivers/mfd/ab8500-core.c:1639:
    +	}/* Configure AB8500 or AB9540 IRQ */

    WARNING: line over 80 characters
    #1652: FILE: drivers/mfd/ab8500-core.c:1652:
    +	ab8500->oldmask = devm_kzalloc(&pdev->dev, ab8500->mask_size, GFP_KERNEL);

    WARNING: Prefer [subsystem eg: netdev]_cont([subsystem]dev, ... then dev_cont(dev, ... then pr_cont(...  to printk(KERN_CONT ...
    #1677: FILE: drivers/mfd/ab8500-core.c:1677:
    +				printk(KERN_CONT " \"%s\"",

    WARNING: Prefer [subsystem eg: netdev]_cont([subsystem]dev, ... then dev_cont(dev, ... then pr_cont(...  to printk(KERN_CONT ...
    #1682: FILE: drivers/mfd/ab8500-core.c:1682:
    +		printk(KERN_CONT "\n");

    WARNING: Prefer [subsystem eg: netdev]_cont([subsystem]dev, ... then dev_cont(dev, ... then pr_cont(...  to printk(KERN_CONT ...
    #1684: FILE: drivers/mfd/ab8500-core.c:1684:
    +		printk(KERN_CONT " None\n");

    WARNING: printk() should include KERN_ facility level
    #1695: FILE: drivers/mfd/ab8500-core.c:1695:
    +				printk("\"%s\" ", turn_on_status[i]);

    WARNING: printk() should include KERN_ facility level
    #1700: FILE: drivers/mfd/ab8500-core.c:1700:
    +		printk("None\n");

    total: 5 errors, 14 warnings, 1869 lines checked

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
@Fabio72
Copy link

Fabio72 commented Mar 3, 2015

I'm on xbian
Distribution: XBian 1.0 (knockout)
Kernel version: Linux 3.18.8+ armv6l
Firmware: 115f63aa0915cdb5b6dff23822eb16699b72ae8b (4 Feb 2015 21:04:27)

Sometimes I'm facing system freezes without errors in logs. I don't know it they are system or network/usb freezes.
I'm getting

cat /sys/devices/platform/bcm2708_usb/regoffset
0xffffffff
cat /sys/devices/platform/bcm2708_usb/regvalue
invalid offset

I'm trying to fix following this article
https://raspberrypispot.wordpress.com/2013/07/08/wifi-and-ethernet-dropout-problems-in-raspberry-pi/

I opened a thread about this problem in
http://forum.xbian.org/thread-2785-post-27814.html

popcornmix pushed a commit that referenced this issue Mar 13, 2017
mem_cgroup_free() indirectly calls wb_domain_exit() which is not
prepared to deal with a struct wb_domain object that hasn't executed
wb_domain_init().  For instance, the following warning message is
printed by lockdep if alloc_percpu() fails in mem_cgroup_alloc():

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 1 PID: 1950 Comm: mkdir Not tainted 4.10.0+ #151
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
   dump_stack+0x67/0x99
   register_lock_class+0x36d/0x540
   __lock_acquire+0x7f/0x1a30
   lock_acquire+0xcc/0x200
   del_timer_sync+0x3c/0xc0
   wb_domain_exit+0x14/0x20
   mem_cgroup_free+0x14/0x40
   mem_cgroup_css_alloc+0x3f9/0x620
   cgroup_apply_control_enable+0x190/0x390
   cgroup_mkdir+0x290/0x3d0
   kernfs_iop_mkdir+0x58/0x80
   vfs_mkdir+0x10e/0x1a0
   SyS_mkdirat+0xa8/0xd0
   SyS_mkdir+0x14/0x20
   entry_SYSCALL_64_fastpath+0x18/0xad

Add __mem_cgroup_free() which skips wb_domain_exit().  This is used by
both mem_cgroup_free() and mem_cgroup_alloc() clean up.

Fixes: 0b8f73e ("mm: memcontrol: clean up alloc, online, offline, free functions")
Link: http://lkml.kernel.org/r/20170306192122.24262-1-tahsin@google.com
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ED6E0F17 pushed a commit to ED6E0F17/linux that referenced this issue Mar 15, 2017
commit 40e952f upstream.

mem_cgroup_free() indirectly calls wb_domain_exit() which is not
prepared to deal with a struct wb_domain object that hasn't executed
wb_domain_init().  For instance, the following warning message is
printed by lockdep if alloc_percpu() fails in mem_cgroup_alloc():

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 1 PID: 1950 Comm: mkdir Not tainted 4.10.0+ raspberrypi#151
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
   dump_stack+0x67/0x99
   register_lock_class+0x36d/0x540
   __lock_acquire+0x7f/0x1a30
   lock_acquire+0xcc/0x200
   del_timer_sync+0x3c/0xc0
   wb_domain_exit+0x14/0x20
   mem_cgroup_free+0x14/0x40
   mem_cgroup_css_alloc+0x3f9/0x620
   cgroup_apply_control_enable+0x190/0x390
   cgroup_mkdir+0x290/0x3d0
   kernfs_iop_mkdir+0x58/0x80
   vfs_mkdir+0x10e/0x1a0
   SyS_mkdirat+0xa8/0xd0
   SyS_mkdir+0x14/0x20
   entry_SYSCALL_64_fastpath+0x18/0xad

Add __mem_cgroup_free() which skips wb_domain_exit().  This is used by
both mem_cgroup_free() and mem_cgroup_alloc() clean up.

Fixes: 0b8f73e ("mm: memcontrol: clean up alloc, online, offline, free functions")
Link: http://lkml.kernel.org/r/20170306192122.24262-1-tahsin@google.com
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Mar 17, 2017
commit 40e952f upstream.

mem_cgroup_free() indirectly calls wb_domain_exit() which is not
prepared to deal with a struct wb_domain object that hasn't executed
wb_domain_init().  For instance, the following warning message is
printed by lockdep if alloc_percpu() fails in mem_cgroup_alloc():

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 1 PID: 1950 Comm: mkdir Not tainted 4.10.0+ #151
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Call Trace:
   dump_stack+0x67/0x99
   register_lock_class+0x36d/0x540
   __lock_acquire+0x7f/0x1a30
   lock_acquire+0xcc/0x200
   del_timer_sync+0x3c/0xc0
   wb_domain_exit+0x14/0x20
   mem_cgroup_free+0x14/0x40
   mem_cgroup_css_alloc+0x3f9/0x620
   cgroup_apply_control_enable+0x190/0x390
   cgroup_mkdir+0x290/0x3d0
   kernfs_iop_mkdir+0x58/0x80
   vfs_mkdir+0x10e/0x1a0
   SyS_mkdirat+0xa8/0xd0
   SyS_mkdir+0x14/0x20
   entry_SYSCALL_64_fastpath+0x18/0xad

Add __mem_cgroup_free() which skips wb_domain_exit().  This is used by
both mem_cgroup_free() and mem_cgroup_alloc() clean up.

Fixes: 0b8f73e ("mm: memcontrol: clean up alloc, online, offline, free functions")
Link: http://lkml.kernel.org/r/20170306192122.24262-1-tahsin@google.com
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 15, 2018
In fib6_add(), pn could be NULL if fib6_add_1() failed to return a fib6
node. Checking pn != fn before accessing pn->leaf makes sure pn is not
NULL.
This fixes the following GPF reported by syzkaller:
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 3201 Comm: syzkaller001778 Not tainted 4.15.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:fib6_add+0x736/0x15a0 net/ipv6/ip6_fib.c:1244
RSP: 0018:ffff8801c7626a70 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000020 RCX: ffffffff84794465
RDX: 0000000000000004 RSI: ffff8801d38935f0 RDI: 0000000000000282
RBP: ffff8801c7626da0 R08: 1ffff10038ec4c35 R09: 0000000000000000
R10: ffff8801c7626c68 R11: 0000000000000000 R12: 00000000fffffffe
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000009
FS:  0000000000000000(0000) GS:ffff8801db200000(0063) knlGS:0000000009b70840
CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000020be1000 CR3: 00000001d585a006 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1006
 ip6_route_multipath_add+0xd14/0x16c0 net/ipv6/route.c:3833
 inet6_rtm_newroute+0xdc/0x160 net/ipv6/route.c:3957
 rtnetlink_rcv_msg+0x733/0x1020 net/core/rtnetlink.c:4411
 netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2408
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4423
 netlink_unicast_kernel net/netlink/af_netlink.c:1275 [inline]
 netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1301
 netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1864
 sock_sendmsg_nosec net/socket.c:636 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:646
 sock_write_iter+0x31a/0x5d0 net/socket.c:915
 call_write_iter include/linux/fs.h:1772 [inline]
 do_iter_readv_writev+0x525/0x7f0 fs/read_write.c:653
 do_iter_write+0x154/0x540 fs/read_write.c:932
 compat_writev+0x225/0x420 fs/read_write.c:1246
 do_compat_writev+0x115/0x220 fs/read_write.c:1267
 C_SYSC_writev fs/read_write.c:1278 [inline]
 compat_SyS_writev+0x26/0x30 fs/read_write.c:1274
 do_syscall_32_irqs_on arch/x86/entry/common.c:327 [inline]
 do_fast_syscall_32+0x3ee/0xf9d arch/x86/entry/common.c:389
 entry_SYSENTER_compat+0x54/0x63 arch/x86/entry/entry_64_compat.S:125

Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 66f5d6c ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
popcornmix pushed a commit that referenced this issue Jan 22, 2019
bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:

Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
 init_page_buffers+0x3e2/0x530 fs/buffer.c:904
 grow_dev_page fs/buffer.c:947 [inline]
 grow_buffers fs/buffer.c:1009 [inline]
 __getblk_slow fs/buffer.c:1036 [inline]
 __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
 __bread_gfp+0x2d/0x310 fs/buffer.c:1347
 sb_bread include/linux/buffer_head.h:307 [inline]
 fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
 fat_ent_read_block fs/fat/fatent.c:441 [inline]
 fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
 __fat_get_block fs/fat/inode.c:148 [inline]
...

Trivial reproducer for the problem looks like:

truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt

Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.

Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.

Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
popcornmix pushed a commit that referenced this issue Jan 23, 2019
commit 04906b2 upstream.

bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:

Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
 init_page_buffers+0x3e2/0x530 fs/buffer.c:904
 grow_dev_page fs/buffer.c:947 [inline]
 grow_buffers fs/buffer.c:1009 [inline]
 __getblk_slow fs/buffer.c:1036 [inline]
 __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
 __bread_gfp+0x2d/0x310 fs/buffer.c:1347
 sb_bread include/linux/buffer_head.h:307 [inline]
 fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
 fat_ent_read_block fs/fat/fatent.c:441 [inline]
 fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
 __fat_get_block fs/fat/inode.c:148 [inline]
...

Trivial reproducer for the problem looks like:

truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt

Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.

Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.

Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 24, 2019
commit 04906b2 upstream.

bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:

Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
 init_page_buffers+0x3e2/0x530 fs/buffer.c:904
 grow_dev_page fs/buffer.c:947 [inline]
 grow_buffers fs/buffer.c:1009 [inline]
 __getblk_slow fs/buffer.c:1036 [inline]
 __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
 __bread_gfp+0x2d/0x310 fs/buffer.c:1347
 sb_bread include/linux/buffer_head.h:307 [inline]
 fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
 fat_ent_read_block fs/fat/fatent.c:441 [inline]
 fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
 __fat_get_block fs/fat/inode.c:148 [inline]
...

Trivial reproducer for the problem looks like:

truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt

Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.

Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.

Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Jan 24, 2019
commit 04906b2 upstream.

bd_set_size() updates also block device's block size. This is somewhat
unexpected from its name and at this point, only blkdev_open() uses this
functionality. Furthermore, this can result in changing block size under
a filesystem mounted on a loop device which leads to livelocks inside
__getblk_gfp() like:

Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10863 Comm: syz-executor0 Not tainted 4.18.0-rc5+ #151
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
01/01/2011
RIP: 0010:__sanitizer_cov_trace_pc+0x3f/0x50 kernel/kcov.c:106
...
Call Trace:
 init_page_buffers+0x3e2/0x530 fs/buffer.c:904
 grow_dev_page fs/buffer.c:947 [inline]
 grow_buffers fs/buffer.c:1009 [inline]
 __getblk_slow fs/buffer.c:1036 [inline]
 __getblk_gfp+0x906/0xb10 fs/buffer.c:1313
 __bread_gfp+0x2d/0x310 fs/buffer.c:1347
 sb_bread include/linux/buffer_head.h:307 [inline]
 fat12_ent_bread+0x14e/0x3d0 fs/fat/fatent.c:75
 fat_ent_read_block fs/fat/fatent.c:441 [inline]
 fat_alloc_clusters+0x8ce/0x16e0 fs/fat/fatent.c:489
 fat_add_cluster+0x7a/0x150 fs/fat/inode.c:101
 __fat_get_block fs/fat/inode.c:148 [inline]
...

Trivial reproducer for the problem looks like:

truncate -s 1G /tmp/image
losetup /dev/loop0 /tmp/image
mkfs.ext4 -b 1024 /dev/loop0
mount -t ext4 /dev/loop0 /mnt
losetup -c /dev/loop0
l /mnt

Fix the problem by moving initialization of a block device block size
into a separate function and call it when needed.

Thanks to Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> for help with
debugging the problem.

Reported-by: syzbot+9933e4476f365f5d5a1b@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Sep 19, 2023
commit f1187ef upstream.

Fix a goof where KVM tries to grab source vCPUs from the destination VM
when doing intrahost migration.  Grabbing the wrong vCPU not only hoses
the guest, it also crashes the host due to the VMSA pointer being left
NULL.

  BUG: unable to handle page fault for address: ffffe38687000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 39 PID: 17143 Comm: sev_migrate_tes Tainted: GO       6.5.0-smp--fff2e47e6c3b-next #151
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.28.0 07/10/2023
  RIP: 0010:__free_pages+0x15/0xd0
  RSP: 0018:ffff923fcf6e3c78 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffe38687000000 RCX: 0000000000000100
  RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffe38687000000
  RBP: ffff923fcf6e3c88 R08: ffff923fcafb0000 R09: 0000000000000000
  R10: 0000000000000000 R11: ffffffff83619b90 R12: ffff923fa9540000
  R13: 0000000000080007 R14: ffff923f6d35d000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff929d0d7c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffe38687000000 CR3: 0000005224c34005 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   sev_free_vcpu+0xcb/0x110 [kvm_amd]
   svm_vcpu_free+0x75/0xf0 [kvm_amd]
   kvm_arch_vcpu_destroy+0x36/0x140 [kvm]
   kvm_destroy_vcpus+0x67/0x100 [kvm]
   kvm_arch_destroy_vm+0x161/0x1d0 [kvm]
   kvm_put_kvm+0x276/0x560 [kvm]
   kvm_vm_release+0x25/0x30 [kvm]
   __fput+0x106/0x280
   ____fput+0x12/0x20
   task_work_run+0x86/0xb0
   do_exit+0x2e3/0x9c0
   do_group_exit+0xb1/0xc0
   __x64_sys_exit_group+0x1b/0x20
   do_syscall_64+0x41/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
   </TASK>
  CR2: ffffe38687000000

Fixes: 6defa24 ("KVM: SEV: Init target VMCBs in sev_migrate_from")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230825022357.2852133-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Sep 19, 2023
commit f1187ef upstream.

Fix a goof where KVM tries to grab source vCPUs from the destination VM
when doing intrahost migration.  Grabbing the wrong vCPU not only hoses
the guest, it also crashes the host due to the VMSA pointer being left
NULL.

  BUG: unable to handle page fault for address: ffffe38687000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] SMP NOPTI
  CPU: 39 PID: 17143 Comm: sev_migrate_tes Tainted: GO       6.5.0-smp--fff2e47e6c3b-next #151
  Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 34.28.0 07/10/2023
  RIP: 0010:__free_pages+0x15/0xd0
  RSP: 0018:ffff923fcf6e3c78 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffffe38687000000 RCX: 0000000000000100
  RDX: 0000000000000100 RSI: 0000000000000000 RDI: ffffe38687000000
  RBP: ffff923fcf6e3c88 R08: ffff923fcafb0000 R09: 0000000000000000
  R10: 0000000000000000 R11: ffffffff83619b90 R12: ffff923fa9540000
  R13: 0000000000080007 R14: ffff923f6d35d000 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff929d0d7c0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffe38687000000 CR3: 0000005224c34005 CR4: 0000000000770ee0
  PKRU: 55555554
  Call Trace:
   <TASK>
   sev_free_vcpu+0xcb/0x110 [kvm_amd]
   svm_vcpu_free+0x75/0xf0 [kvm_amd]
   kvm_arch_vcpu_destroy+0x36/0x140 [kvm]
   kvm_destroy_vcpus+0x67/0x100 [kvm]
   kvm_arch_destroy_vm+0x161/0x1d0 [kvm]
   kvm_put_kvm+0x276/0x560 [kvm]
   kvm_vm_release+0x25/0x30 [kvm]
   __fput+0x106/0x280
   ____fput+0x12/0x20
   task_work_run+0x86/0xb0
   do_exit+0x2e3/0x9c0
   do_group_exit+0xb1/0xc0
   __x64_sys_exit_group+0x1b/0x20
   do_syscall_64+0x41/0x90
   entry_SYSCALL_64_after_hwframe+0x63/0xcd
   </TASK>
  CR2: ffffe38687000000

Fixes: 6defa24 ("KVM: SEV: Init target VMCBs in sev_migrate_from")
Cc: stable@vger.kernel.org
Cc: Peter Gonda <pgonda@google.com>
Reviewed-by: Peter Gonda <pgonda@google.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Link: https://lore.kernel.org/r/20230825022357.2852133-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Aug 22, 2024
When enabling UBSAN on Raspberry Pi 5, we get the following warning:

[  387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3
[  387.903868] index 7 is out of range for type '__u32 [7]'
[  387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G        WC         6.10.3-v8-16k-numa #151
[  387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched]
[  387.932525] Call trace:
[  387.935296]  dump_backtrace+0x170/0x1b8
[  387.939403]  show_stack+0x20/0x38
[  387.942907]  dump_stack_lvl+0x90/0xd0
[  387.946785]  dump_stack+0x18/0x28
[  387.950301]  __ubsan_handle_out_of_bounds+0x98/0xd0
[  387.955383]  v3d_csd_job_run+0x3a8/0x438 [v3d]
[  387.960707]  drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
[  387.966862]  process_one_work+0x62c/0xb48
[  387.971296]  worker_thread+0x468/0x5b0
[  387.975317]  kthread+0x1c4/0x1e0
[  387.978818]  ret_from_fork+0x10/0x20
[  387.983014] ---[ end trace ]---

This happens because the UAPI provides only seven configuration
registers and we are reading the eighth position of this u32 array.

Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by
accessing only seven positions on the '__u32 [7]' array. The eighth
register exists indeed on V3D 7.1, but it isn't currently used. That
being so, let's guarantee that it remains unused and add a note that it
could be set in a future patch.

Fixes: 0ad5bc1 ("drm/v3d: fix up register addresses for V3D 7.x")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809152001.668314-1-mcanal@igalia.com
popcornmix pushed a commit that referenced this issue Aug 30, 2024
[ Upstream commit 497d370 ]

When enabling UBSAN on Raspberry Pi 5, we get the following warning:

[  387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3
[  387.903868] index 7 is out of range for type '__u32 [7]'
[  387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G        WC         6.10.3-v8-16k-numa #151
[  387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched]
[  387.932525] Call trace:
[  387.935296]  dump_backtrace+0x170/0x1b8
[  387.939403]  show_stack+0x20/0x38
[  387.942907]  dump_stack_lvl+0x90/0xd0
[  387.946785]  dump_stack+0x18/0x28
[  387.950301]  __ubsan_handle_out_of_bounds+0x98/0xd0
[  387.955383]  v3d_csd_job_run+0x3a8/0x438 [v3d]
[  387.960707]  drm_sched_run_job_work+0x520/0x6d0 [gpu_sched]
[  387.966862]  process_one_work+0x62c/0xb48
[  387.971296]  worker_thread+0x468/0x5b0
[  387.975317]  kthread+0x1c4/0x1e0
[  387.978818]  ret_from_fork+0x10/0x20
[  387.983014] ---[ end trace ]---

This happens because the UAPI provides only seven configuration
registers and we are reading the eighth position of this u32 array.

Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by
accessing only seven positions on the '__u32 [7]' array. The eighth
register exists indeed on V3D 7.1, but it isn't currently used. That
being so, let's guarantee that it remains unused and add a note that it
could be set in a future patch.

Fixes: 0ad5bc1 ("drm/v3d: fix up register addresses for V3D 7.x")
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809152001.668314-1-mcanal@igalia.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Nov 5, 2024
The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.

Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.

Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.

[1]
 # ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
 # ip link set dev bla type ip6gre remote 2001:db8:3::1
 # ip link del dev bla
 # devlink dev reload pci/0000:01:00.0

[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_router_netdevice_event+0x55f/0x1240
 notifier_call_chain+0x5a/0xd0
 call_netdevice_notifiers_info+0x39/0x90
 unregister_netdevice_many_notify+0x63e/0x9d0
 rtnl_dellink+0x16b/0x3a0
 rtnetlink_rcv_msg+0x142/0x3f0
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x242/0x390
 netlink_sendmsg+0x1de/0x420
 ____sys_sendmsg+0x2bd/0x320
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
unreferenced object 0xffff898081f597a0 (size 32):
  comm "ip", pid 1626, jiffies 4294719324
  hex dump (first 32 bytes):
    20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01   ...............
    21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00  !Ia.............
  backtrace (crc fd9be911):
    [<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
    [<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
    [<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
    [<00000000743e7757>] notifier_call_chain+0x5a/0xd0
    [<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
    [<000000002509645d>] register_netdevice+0x5f7/0x7a0
    [<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
    [<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
    [<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
    [<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
    [<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
    [<00000000908bca63>] netlink_unicast+0x242/0x390
    [<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
    [<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
    [<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
    [<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0

Fixes: cf42911 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
popcornmix pushed a commit that referenced this issue Nov 8, 2024
[ Upstream commit 12ae97c ]

The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.

Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.

Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.

[1]
 # ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
 # ip link set dev bla type ip6gre remote 2001:db8:3::1
 # ip link del dev bla
 # devlink dev reload pci/0000:01:00.0

[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_router_netdevice_event+0x55f/0x1240
 notifier_call_chain+0x5a/0xd0
 call_netdevice_notifiers_info+0x39/0x90
 unregister_netdevice_many_notify+0x63e/0x9d0
 rtnl_dellink+0x16b/0x3a0
 rtnetlink_rcv_msg+0x142/0x3f0
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x242/0x390
 netlink_sendmsg+0x1de/0x420
 ____sys_sendmsg+0x2bd/0x320
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
unreferenced object 0xffff898081f597a0 (size 32):
  comm "ip", pid 1626, jiffies 4294719324
  hex dump (first 32 bytes):
    20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01   ...............
    21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00  !Ia.............
  backtrace (crc fd9be911):
    [<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
    [<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
    [<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
    [<00000000743e7757>] notifier_call_chain+0x5a/0xd0
    [<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
    [<000000002509645d>] register_netdevice+0x5f7/0x7a0
    [<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
    [<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
    [<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
    [<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
    [<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
    [<00000000908bca63>] netlink_unicast+0x242/0x390
    [<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
    [<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
    [<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
    [<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0

Fixes: cf42911 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Nov 8, 2024
[ Upstream commit 12ae97c ]

The device stores IPv6 addresses that are used for encapsulation in
linear memory that is managed by the driver.

Changing the remote address of an ip6gre net device never worked
properly, but since cited commit the following reproducer [1] would
result in a warning [2] and a memory leak [3]. The problem is that the
new remote address is never added by the driver to its hash table (and
therefore the device) and the old address is never removed from it.

Fix by programming the new address when the configuration of the ip6gre
net device changes and removing the old one. If the address did not
change, then the above would result in increasing the reference count of
the address and then decreasing it.

[1]
 # ip link add name bla up type ip6gre local 2001:db8:1::1 remote 2001:db8:2::1 tos inherit ttl inherit
 # ip link set dev bla type ip6gre remote 2001:db8:3::1
 # ip link del dev bla
 # devlink dev reload pci/0000:01:00.0

[2]
WARNING: CPU: 0 PID: 1682 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3002 mlxsw_sp_ipv6_addr_put+0x140/0x1d0
Modules linked in:
CPU: 0 UID: 0 PID: 1682 Comm: ip Not tainted 6.12.0-rc3-custom-g86b5b55bc835 #151
Hardware name: Nvidia SN5600/VMOD0013, BIOS 5.13 05/31/2023
RIP: 0010:mlxsw_sp_ipv6_addr_put+0x140/0x1d0
[...]
Call Trace:
 <TASK>
 mlxsw_sp_router_netdevice_event+0x55f/0x1240
 notifier_call_chain+0x5a/0xd0
 call_netdevice_notifiers_info+0x39/0x90
 unregister_netdevice_many_notify+0x63e/0x9d0
 rtnl_dellink+0x16b/0x3a0
 rtnetlink_rcv_msg+0x142/0x3f0
 netlink_rcv_skb+0x50/0x100
 netlink_unicast+0x242/0x390
 netlink_sendmsg+0x1de/0x420
 ____sys_sendmsg+0x2bd/0x320
 ___sys_sendmsg+0x9a/0xe0
 __sys_sendmsg+0x7a/0xd0
 do_syscall_64+0x9e/0x1a0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
unreferenced object 0xffff898081f597a0 (size 32):
  comm "ip", pid 1626, jiffies 4294719324
  hex dump (first 32 bytes):
    20 01 0d b8 00 02 00 00 00 00 00 00 00 00 00 01   ...............
    21 49 61 83 80 89 ff ff 00 00 00 00 01 00 00 00  !Ia.............
  backtrace (crc fd9be911):
    [<00000000df89c55d>] __kmalloc_cache_noprof+0x1da/0x260
    [<00000000ff2a1ddb>] mlxsw_sp_ipv6_addr_kvdl_index_get+0x281/0x340
    [<000000009ddd445d>] mlxsw_sp_router_netdevice_event+0x47b/0x1240
    [<00000000743e7757>] notifier_call_chain+0x5a/0xd0
    [<000000007c7b9e13>] call_netdevice_notifiers_info+0x39/0x90
    [<000000002509645d>] register_netdevice+0x5f7/0x7a0
    [<00000000c2e7d2a9>] ip6gre_newlink_common.isra.0+0x65/0x130
    [<0000000087cd6d8d>] ip6gre_newlink+0x72/0x120
    [<000000004df7c7cc>] rtnl_newlink+0x471/0xa20
    [<0000000057ed632a>] rtnetlink_rcv_msg+0x142/0x3f0
    [<0000000032e0d5b5>] netlink_rcv_skb+0x50/0x100
    [<00000000908bca63>] netlink_unicast+0x242/0x390
    [<00000000cdbe1c87>] netlink_sendmsg+0x1de/0x420
    [<0000000011db153e>] ____sys_sendmsg+0x2bd/0x320
    [<000000003b6d53eb>] ___sys_sendmsg+0x9a/0xe0
    [<00000000cae27c62>] __sys_sendmsg+0x7a/0xd0

Fixes: cf42911 ("mlxsw: spectrum_ipip: Use common hash table for IPv6 address mapping")
Reported-by: Maksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/e91012edc5a6cb9df37b78fd377f669381facfcb.1729866134.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants