-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usb-storage kernel oops #268
Comments
The stack trace dump appears corrupted after posting, here's the source for it: |
Seems to be a null pointer defererence in dwc_otg/usb. |
kernel 3.8.4 compile CC [M] drivers/usb/host/dwc_otg/dwc_otg_pcd_intr.o |
I've seen several OOPSes during driver shutdown, port disconnect, device reset and the like. Sporadic and hard to find. I note that you are using quite wide frequency ranges and an overclock at 850MHz - does this happen with a fixed 700MHz default frequency? |
I ´m I am use 850 MhZ on my type B Pi´s . Thats why i am here :-) . Note : I had Ops when using external HDD without an Powered HUB. |
@P33M it happens also when not overclocked, but I didn't test with a fixed 700 MHz. I will and let you know. |
Running plain 700 MHz and got the following in dmesg. On a first look it seems unrelated (br), but then again eth runs off the USB, so I thought you might want to have a look: [33612.297448] usb 1-1.3.6: reset high-speed USB device number 6 using dwc_otg |
sorry, corrupted again: http://www.dawidwrobel.com/files/br_oops.txt |
OK, had another USB storage kernel oops this morning (same as initially reported) with fixed 700 MHz frequency, so frequency scaling is not the cause here. Funny thing is that it always happens in the morning. The only pattern that comes to mind and makes sense with this scenario is that after not being used for a long time at night, some memory swapping occurs (?) and whenever I connect to WiFi in the morning, it oops exactly at this very moment. Is there an equivalent to "dmesg -T" in kdb that would help me to confirm the timing of the oops? |
did you disable any sort of power management? say USB? or WiFi? |
@wrobelda may you use an POWERD USB HUB ? Model B require ~900 mA for itsself under load at least for Stable work. With external HDD the power exhaused with standard ( 1000 mA ) power sources. |
@licaon-kter I did not. @remsnet "Plugable 7 Port" hub that I use is powered and said to be one of the most compatible with Raspberry. The HDD is 2,5" inch USB-powered with no external power source connector, so I seriously doubt it would exhaust the standard USB limits. BTW, something interesting I sometimes also notice: [pon kwi 15 22:04:54 2013] sd 0:0:0:0: [sda] Device not ready |
@wrobelda , problem looks like the USB POWER EXhaused !! What it looks for me been that kernel starts to load , and at the point where the kernel access the drive UNDER LOAD ( kernel boot IO load..) the USB Power seem exhaused. This shuold never ever happen . Same will Issue happen on some old Laptops that don´t deliver enoth Power. I HAD the same with 64G & 128G USB sticks on RPI 3.x kernal equal withóut extra Power for the PI I use an 7 port belkim POWERD with 2500 mA , have 7 x 64G sticks attached witz raid5 fuse-zfs As written : 900mA for the PI and addional arround 1500mA für exernal disk see your HDD disk specs. |
"Your 2,5 HDD seems to exhause what your RPI´s powersource can deliver via the RPI USB Conector." |
reconect the external disk and SOUND Card to the HUB and try again. http://www.ti.com/lit/ds/symlink/pcm2900c.pdf saying +400mA for the USB Sound if i not read that false. Its KNOWN that the >>Pis Connector << can´t deliver that amount. |
@remsnet the external disk IS connected to the HUB. |
Just to asume required Power under Full Load of your RPI Envirement look like
|
@remsnet: I believe your numbers may be a little exaggerated. For example I measured my RaspberryPi model B under load and it never reaches 500mA. @wrobelda: Are you sure your HDD can be run from one USB 2.0 port? Maybe it's designed for using it with Y shaped cables? Or for USB 3.0 where you have 900mA per port? Even then, however, it should be able to run on 1000mA max. It does seems to me that the HDD may be a problem (there must be a reason for those I/O errors). It may go to powersafe mode at night and then try to spin in the morning (and take much more power at that time). This may cause some voltage drop. Maybe you should consider powering RaspberryPi from separate power supply just to see if there is any difference? |
@kadamski it's 2.0 and designed to be used with one port only. I can try to power Raspberry with external USB power source, but even if this worked it still wouldn't explain why did it start to fail all of sudden after upgrading the firmware, whereas previously I could get it running without problems even when overclocked quite noticeably? Also, on a related note, it seems that one of these crashes also corrupted one of HDD's partition (XFS filesystem) to the extent of causing an immediate kernel oops upon attempting to mount it on both RaspberyPI and my Ubuntu 13.04 laptop. Just mind-blowing. |
So your RPi shouldn't use more than 500mA, your HDD shouldn't use more than 500mA and you have 2500mA power adapter so I believe you should be safe. But of course you can never trust that those cheap power adapters work as they should. |
@wrobelda About hard drives - if disk comes with single link USB cable it should work fine with it. Disk uses more power during spin up, so it could be connected with your comments (quote "that it always happens in the morning"). |
Kernel 3.8.7 kernel build error LD drivers/usb/built-in.o ^C[1]+ Exit 2 nohup make CPPFLAGS="-Ofast -mfpu=vfp -mfloat-abi=hard -march=armv6zk -mtune=arm1176jzf-s" CFLAGS="-Ofast -mfpu=vfp -mfloat-abi=hard -march=armv6zk -mtune=arm1176jzf-s" dep zImage Modules Tried to force hard-float. anyone hints ? |
The kernel does not use floating point anyway, afaik, so forcing has no effect. |
@licaon-kter okay , issue resolved then , closed MY issue see #276 |
Issue resolved - was due to a faulty USB drive. Everything is stable now. |
This issue is still on unfortunately. I assumed it was gone because the system was stable after I replaced HDD with pendrive temporarily, so I could RMA the HDD. I received a new disk couple of days ago and have experienced the same issue every day since. I just looked into dmesg and noticed a lot of the following again: [ 1302.346947] sd 0:0:0:0: [sda] Unhandled error code Let me repeat, that the USB HDD is attached to the HUB. The Pi is not overclocked - in fact, no configuration option is set at all except for GPU memory adjusted to 128 MB. No USB device is attached to Pi, except for the HUB itself. To be honest, I lost my faith in having this configuration stable EVER. I had previously run Allwinner A10 based Mele A1000 device for almost a year without any of these issues, and it only had a community support for its kernel and firmware, to which I happily contributed. I am not a fan of RPi anymore - it's hard to stay positive about the concept when the basics fail to work. My USB serial console cable literally got physically broken today because of having constantly connecting it on and off to grab the kdb dump. I just wanted to have a nice, stable, low-power HTPC with a low power 1TB drive attached. It's hard to justify the amount of time I spent so far to have this **** working. |
It could be a problem with USB dequeing since it seems to make the Ethernet drop out. That would mean there must be something opening and closing one of the USB devices. What are you doing whilst this is happening? Is it sat doing nothing or are you running some kind of script Gordon On 10 May 2013, at 21:58, "Dawid Wr?bel" <notifications@github.commailto:notifications@github.com> wrote: This issue is still on unfortunately. I assumed it was gone because the system was stable after I replaced HDD with pendrive temporarily, so I could RMA the HDD. I received a new disk couple of days ago and have experienced the same issue every day since. I just looked into dmesg and noticed a lot of the following again: [ 1302.346947] sd 0:0:0:0: [sda] Unhandled error code [ 1302.347024] Result: hostbyte=0x00 driverbyte=0x00 To be honest, I lost my faith in having this configuration stable EVER. I had previously run Allwinner A10 based Mele A1000 device for almost a year without any of these issues, and it only had a community support for its kernel and firmware. Reply to this email directly or view it on GitHubhttps://github.com//issues/268#issuecomment-17744192. |
@wrobelda |
The enclosure was new, but it could still be a refurbished hdd inside. I will test it and get back to you. |
I tested the replaced disk some time ago and it turned out to be OK. Something interesting happened today, though. After turning my PC on in the morning, I could not access the hdd connected to Raspberry. dmesg shown the following: http://www.dawidwrobel.com/files/usb_storage_oops_3.txt This is a bit different from the previous reports in a way that it does not report any I/O issues, just the usb-storage process hanging. So I disconnected the drive from USB port and connected to my laptop's - it was detected just fine, there were no FS errors in dmesg whatsoever. After attempting to mount the FS, serial console hung, but I still had working WiFi connection, so Raspberry did not hang. |
There's too much going on here to begin to determine what the root cause is. Yes there are still issues with the USB driver. It's just a question of whether you are seeing something new or something I already know about and am working on. You should note that crashes related to dwc_otg_hcd_urb_dequeue and friends are known about and are on my to-do list. The thread of the issue has evolved somewhat since you first reported kernel OOPSes. Your kernel version appears to also have changed. There have also been several commits to USB since you first reported the problem. What is the minimum set of circumstances to replicate this broken behaviour on your USB HDD? I assume that is the one device that is the common thread throughout. Please post lsusb -v for the devices in question. Please try to replicate with
|
my issue are Gone with 3.11.6 , close it if you wisch |
Liu Bo <bo.li.liu@oracle.com> reported a lockdep warning of delayed_iput_sem in xfstests generic/241: [ 2061.345955] ============================================= [ 2061.346027] [ INFO: possible recursive locking detected ] [ 2061.346027] 4.1.0+ #268 Tainted: G W [ 2061.346027] --------------------------------------------- [ 2061.346027] btrfs-cleaner/3045 is trying to acquire lock: [ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] but task is already holding lock: [ 2061.346027] (&fs_info->delayed_iput_sem){++++..}, at: [<ffffffff814063ab>] btrfs_run_delayed_iputs+0x6b/0x100 [ 2061.346027] other info that might help us debug this: [ 2061.346027] Possible unsafe locking scenario: [ 2061.346027] CPU0 [ 2061.346027] ---- [ 2061.346027] lock(&fs_info->delayed_iput_sem); [ 2061.346027] lock(&fs_info->delayed_iput_sem); [ 2061.346027] *** DEADLOCK *** It is rarely happened, about 1/400 in my test env. The reason is recursion of btrfs_run_delayed_iputs(): cleaner_kthread -> btrfs_run_delayed_iputs() *1 -> get delayed_iput_sem lock *2 -> iput() -> ... -> btrfs_commit_transaction() -> btrfs_run_delayed_iputs() *1 -> get delayed_iput_sem lock (dead lock) *2 *1: recursion of btrfs_run_delayed_iputs() *2: warning of lockdep about delayed_iput_sem When fs is in high stress, new iputs may added into fs_info->delayed_iputs list when btrfs_run_delayed_iputs() is running, which cause second btrfs_run_delayed_iputs() run into down_read(&fs_info->delayed_iput_sem) again, and cause above lockdep warning. Actually, it will not cause real problem because both locks are read lock, but to avoid lockdep warning, we can do a fix. Fix: Don't do btrfs_run_delayed_iputs() in btrfs_commit_transaction() for cleaner_kthread thread to break above recursion path. cleaner_kthread is calling btrfs_run_delayed_iputs() explicitly in code, and don't need to call btrfs_run_delayed_iputs() again in btrfs_commit_transaction(), it also give us a bonus to avoid stack overflow. Test: No above lockdep warning after patch in 1200 generic/241 tests. Reported-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
Add various tests to check maximum number of supported programs being attached: # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.185325] bpf_testmod: loading out-of-tree module taints kernel. [ 1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz [ 1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns [ 1.276408] clocksource: Switched to clocksource tsc #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK <--- (new test) #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_replace:OK #269 tc_opts_revision:OK Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230929204121.20305-2-daniel@iogearbox.net
Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK <--- (new test) #269 tc_opts_replace:OK #270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK #269 tc_opts_query_attach:OK <--- (new test) #270 tc_opts_replace:OK #271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
commit 0199d2f upstream. MSGF_LEG_MASK is laid out with INTA in bit 0, INTB in bit 1, INTC in bit 2, and INTD in bit 3. Hardware IRQ numbers start at 0, and we register PCI_NUM_INTX IRQs. So to enable INTA (aka hwirq 0) we should set bit 0. Remove the subtraction of one. This bug would cause INTx interrupts not to be delivered, as enabling INTB would actually enable INTA, and enabling INTA wouldn't enable anything at all. It is likely that this got overlooked for so long since most PCIe hardware uses MSIs. This fixes the following UBSAN error: UBSAN: shift-out-of-bounds in ../drivers/pci/controller/pcie-xilinx-nwl.c:389:11 shift exponent 18446744073709551615 is too large for 32-bit type 'int' CPU: 1 PID: 61 Comm: kworker/u10:1 Not tainted 6.6.20+ #268 Hardware name: xlnx,zynqmp (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace (arch/arm64/kernel/stacktrace.c:235) show_stack (arch/arm64/kernel/stacktrace.c:242) dump_stack_lvl (lib/dump_stack.c:107) dump_stack (lib/dump_stack.c:114) __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:387) nwl_unmask_leg_irq (drivers/pci/controller/pcie-xilinx-nwl.c:389 (discriminator 1)) irq_enable (kernel/irq/internals.h:234 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345) __irq_startup (kernel/irq/internals.h:239 kernel/irq/chip.c:180 kernel/irq/chip.c:250) irq_startup (kernel/irq/chip.c:270) __setup_irq (kernel/irq/manage.c:1800) request_threaded_irq (kernel/irq/manage.c:2206) pcie_pme_probe (include/linux/interrupt.h:168 drivers/pci/pcie/pme.c:348) Fixes: 9a181e1 ("PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts") Link: https://lore.kernel.org/r/20240531161337.864994-3-sean.anderson@linux.dev Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0199d2f upstream. MSGF_LEG_MASK is laid out with INTA in bit 0, INTB in bit 1, INTC in bit 2, and INTD in bit 3. Hardware IRQ numbers start at 0, and we register PCI_NUM_INTX IRQs. So to enable INTA (aka hwirq 0) we should set bit 0. Remove the subtraction of one. This bug would cause INTx interrupts not to be delivered, as enabling INTB would actually enable INTA, and enabling INTA wouldn't enable anything at all. It is likely that this got overlooked for so long since most PCIe hardware uses MSIs. This fixes the following UBSAN error: UBSAN: shift-out-of-bounds in ../drivers/pci/controller/pcie-xilinx-nwl.c:389:11 shift exponent 18446744073709551615 is too large for 32-bit type 'int' CPU: 1 PID: 61 Comm: kworker/u10:1 Not tainted 6.6.20+ #268 Hardware name: xlnx,zynqmp (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace (arch/arm64/kernel/stacktrace.c:235) show_stack (arch/arm64/kernel/stacktrace.c:242) dump_stack_lvl (lib/dump_stack.c:107) dump_stack (lib/dump_stack.c:114) __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:387) nwl_unmask_leg_irq (drivers/pci/controller/pcie-xilinx-nwl.c:389 (discriminator 1)) irq_enable (kernel/irq/internals.h:234 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345) __irq_startup (kernel/irq/internals.h:239 kernel/irq/chip.c:180 kernel/irq/chip.c:250) irq_startup (kernel/irq/chip.c:270) __setup_irq (kernel/irq/manage.c:1800) request_threaded_irq (kernel/irq/manage.c:2206) pcie_pme_probe (include/linux/interrupt.h:168 drivers/pci/pcie/pme.c:348) Fixes: 9a181e1 ("PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts") Link: https://lore.kernel.org/r/20240531161337.864994-3-sean.anderson@linux.dev Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0199d2f upstream. MSGF_LEG_MASK is laid out with INTA in bit 0, INTB in bit 1, INTC in bit 2, and INTD in bit 3. Hardware IRQ numbers start at 0, and we register PCI_NUM_INTX IRQs. So to enable INTA (aka hwirq 0) we should set bit 0. Remove the subtraction of one. This bug would cause INTx interrupts not to be delivered, as enabling INTB would actually enable INTA, and enabling INTA wouldn't enable anything at all. It is likely that this got overlooked for so long since most PCIe hardware uses MSIs. This fixes the following UBSAN error: UBSAN: shift-out-of-bounds in ../drivers/pci/controller/pcie-xilinx-nwl.c:389:11 shift exponent 18446744073709551615 is too large for 32-bit type 'int' CPU: 1 PID: 61 Comm: kworker/u10:1 Not tainted 6.6.20+ #268 Hardware name: xlnx,zynqmp (DT) Workqueue: events_unbound deferred_probe_work_func Call trace: dump_backtrace (arch/arm64/kernel/stacktrace.c:235) show_stack (arch/arm64/kernel/stacktrace.c:242) dump_stack_lvl (lib/dump_stack.c:107) dump_stack (lib/dump_stack.c:114) __ubsan_handle_shift_out_of_bounds (lib/ubsan.c:218 lib/ubsan.c:387) nwl_unmask_leg_irq (drivers/pci/controller/pcie-xilinx-nwl.c:389 (discriminator 1)) irq_enable (kernel/irq/internals.h:234 kernel/irq/chip.c:170 kernel/irq/chip.c:439 kernel/irq/chip.c:432 kernel/irq/chip.c:345) __irq_startup (kernel/irq/internals.h:239 kernel/irq/chip.c:180 kernel/irq/chip.c:250) irq_startup (kernel/irq/chip.c:270) __setup_irq (kernel/irq/manage.c:1800) request_threaded_irq (kernel/irq/manage.c:2206) pcie_pme_probe (include/linux/interrupt.h:168 drivers/pci/pcie/pme.c:348) Fixes: 9a181e1 ("PCI: xilinx-nwl: Modify IRQ chip for legacy interrupts") Link: https://lore.kernel.org/r/20240531161337.864994-3-sean.anderson@linux.dev Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hi.
After upgrading the firmware and kernel packages to 20130405, I experience random kernel oops when my USB HDD wakes up from idle.
The hardware is RaspberryPi type B, 512MB, being powered by Plugable 7 Port High Speed USB 2.0 Hub - commonly recommended for compatibility with raspberry. The HUB is connected to Seagate 2,5" HDD and a RTL3072 wifi card that is operating in Master mode (AP) using hostapd. Additionally, an USB sound card is connected to raspberry's second USB port.
Software wise, the system runs raspbian off the HDD. SD is only being used for /boot partition.
The text was updated successfully, but these errors were encountered: