Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hung_task_timeout_secs and blocked for more than 120 seconds #783

Closed
dhimantaa opened this issue Mar 28, 2017 · 24 comments
Closed

hung_task_timeout_secs and blocked for more than 120 seconds #783

dhimantaa opened this issue Mar 28, 2017 · 24 comments

Comments

@dhimantaa
Copy link

dhimantaa commented Mar 28, 2017

Hi Everyone,

I am facing an issue, I am unable to boot the SD card properly

My Configuration
RPi 2 and Linux raspberrypi 3.18.7-v7+ #755 SMP PREEMPT Thu Feb 12 17:20:48 GMT 2015 armv7l GNU/Linux.

So i connected FTDI to my Raspberry pi. And checking the logs, why SD card is unable to boot

Here is the boot logs

[    2.987644] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.997666] hub 1-1:1.0: USB hub found
[    3.003300] hub 1-1:1.0: 5 ports detected
[    3.288862] usb 1-1.1: new high-speed USB device number 3 using dwc_otg
[    3.409102] usb 1-1.1: New USB device found, idVendor=0424, idProduct=ec00
[    3.417701] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.429727] smsc95xx v1.0.4
[    3.492810] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:f0:2f:99
[    3.598863] usb 1-1.4: new high-speed USB device number 4 using dwc_otg
[    3.719492] usb 1-1.4: New USB device found, idVendor=0bda, idProduct=8179
[    3.728191] usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[    3.737327] usb 1-1.4: Product: 802.11n NIC
[    3.743299] usb 1-1.4: Manufacturer: Realtek
[    3.749342] usb 1-1.4: SerialNumber: 00E04C0001
[   13.844359] random: nonblocking pool is initialized
[  241.348762] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  241.356974]       Not tainted 3.18.7-v7+ #755
[  241.363065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  241.374362] swapper/0       D 8052fcb8     0     1      0 0x00000000
[  241.382605] [<8052fcb8>] (__schedule) from [<805301a8>] (schedule+0x40/0x8c)
[  241.391498] [<805301a8>] (schedule) from [<8053049c>] (io_schedule+0xb0/0x11c)
[  241.400614] [<8053049c>] (io_schedule) from [<80530c50>] (bit_wait_io+0x44/0x68)
[  241.411669] [<80530c50>] (bit_wait_io) from [<80530904>] (__wait_on_bit+0x90/0xc8)
[  241.423033] [<80530904>] (__wait_on_bit) from [<800f1458>] (wait_on_page_bit+0xc0/0xd8)
[  241.434892] [<800f1458>] (wait_on_page_bit) from [<800f15b0>] (filemap_fdatawait_range+0xdc/0x138)
[  241.447766] [<800f15b0>] (filemap_fdatawait_range) from [<800f1670>] (filemap_fdatawait+0x64/0x70)
[  241.460791] [<800f1670>] (filemap_fdatawait) from [<800f3010>] (filemap_write_and_wait+0x5c/0x84)
[  241.473972] [<800f3010>] (filemap_write_and_wait) from [<80179174>] (__sync_blockdev+0x30/0x3c)
[  241.487151] [<80179174>] (__sync_blockdev) from [<8017919c>] (sync_blockdev+0x1c/0x20)
[  241.499611] [<8017919c>] (sync_blockdev) from [<8021085c>] (jbd2_journal_recover+0x100/0x10c)
[  241.512735] [<8021085c>] (jbd2_journal_recover) from [<80215a6c>] (jbd2_journal_load+0xcc/0x314)
[  241.526216] [<80215a6c>] (jbd2_journal_load) from [<801e3c78>] (ext4_fill_super+0x1cd8/0x2f60)
[  241.539592] [<801e3c78>] (ext4_fill_super) from [<8014668c>] (mount_bdev+0x164/0x19c)
[  241.552171] [<8014668c>] (mount_bdev) from [<801dcd90>] (ext4_mount+0x24/0x30)
[  241.561841] [<801dcd90>] (ext4_mount) from [<80146ffc>] (mount_fs+0x24/0xd4)
[  241.571295] [<80146ffc>] (mount_fs) from [<801609d0>] (vfs_kern_mount+0x58/0x100)
[  241.583396] [<801609d0>] (vfs_kern_mount) from [<801638c8>] (do_mount+0x194/0xa18)
[  241.595584] [<801638c8>] (do_mount) from [<801644a0>] (SyS_mount+0x84/0xb0)
[  241.604924] [<801644a0>] (SyS_mount) from [<80768210>] (mount_block_root+0x124/0x250)
[  241.617330] [<80768210>] (mount_block_root) from [<80768514>] (mount_root+0xec/0x110)
[  241.629728] [<80768514>] (mount_root) from [<807686a0>] (prepare_namespace+0x168/0x1c8)
[  241.642301] [<807686a0>] (prepare_namespace) from [<80767ea4>] (kernel_init_freeable+0x188/0x1d0)
[  241.655756] [<80767ea4>] (kernel_init_freeable) from [<8052a60c>] (kernel_init+0x18/0xfc)
[  241.668503] [<8052a60c>] (kernel_init) from [<8000ec88>] (ret_from_fork+0x14/0x20)
[  241.680635] INFO: task kworker/u8:2:54 blocked for more than 120 seconds.
[  241.689754]       Not tainted 3.18.7-v7+ #755
[  241.696367] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  241.708784] kworker/u8:2    D 8052fcb8     0    54      2 0x00000000
[  241.717503] Workqueue: kmmcd mmc_rescan
[  241.723623] [<8052fcb8>] (__schedule) from [<805301a8>] (schedule+0x40/0x8c)
[  241.732969] [<805301a8>] (schedule) from [<80407548>] (__mmc_claim_host+0xac/0x1c8)
[  241.745059] [<80407548>] (__mmc_claim_host) from [<80407684>] (mmc_get_card+0x20/0x24)
[  241.757438] [<80407684>] (mmc_get_card) from [<8040eb60>] (mmc_sd_detect+0x2c/0x80)
[  241.769560] [<8040eb60>] (mmc_sd_detect) from [<80409a9c>] (mmc_rescan+0xc8/0x318)
[  241.781587] [<80409a9c>] (mmc_rescan) from [<8003b2d0>] (process_one_work+0x138/0x408)
[  241.793968] [<8003b2d0>] (process_one_work) from [<8003bd14>] (worker_thread+0x4c/0x4d0)
[  241.806526] [<8003bd14>] (worker_thread) from [<80040530>] (kthread+0xe0/0xfc)
[  241.816054] [<80040530>] (kthread) from [<8000ec88>] (ret_from_fork+0x14/0x20)
[  361.818761] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  361.818768]       Not tainted 3.18.7-v7+ #755
[  361.818781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  361.818797] swapper/0       D 8052fcb8     0     1      0 0x00000000
[  361.818822] [<8052fcb8>] (__schedule) from [<805301a8>] (schedule+0x40/0x8c)
[  361.818840] [<805301a8>] (schedule) from [<8053049c>] (io_schedule+0xb0/0x11c)
[  361.818858] [<8053049c>] (io_schedule) from [<80530c50>] (bit_wait_io+0x44/0x68)
[  361.818876] [<80530c50>] (bit_wait_io) from [<80530904>] (__wait_on_bit+0x90/0xc8)
[  361.818892] [<80530904>] (__wait_on_bit) from [<800f1458>] (wait_on_page_bit+0xc0/0xd8)
[  361.818909] [<800f1458>] (wait_on_page_bit) from [<800f15b0>] (filemap_fdatawait_range+0xdc/0x138)
[  361.818923] [<800f15b0>] (filemap_fdatawait_range) from [<800f1670>] (filemap_fdatawait+0x64/0x70)
[  361.818938] [<800f1670>] (filemap_fdatawait) from [<800f3010>] (filemap_write_and_wait+0x5c/0x84)
[  361.818957] [<800f3010>] (filemap_write_and_wait) from [<80179174>] (__sync_blockdev+0x30/0x3c)
[  361.818976] [<80179174>] (__sync_blockdev) from [<8017919c>] (sync_blockdev+0x1c/0x20)
[  361.818995] [<8017919c>] (sync_blockdev) from [<8021085c>] (jbd2_journal_recover+0x100/0x10c)
[  361.819013] [<8021085c>] (jbd2_journal_recover) from [<80215a6c>] (jbd2_journal_load+0xcc/0x314)
[  361.819036] [<80215a6c>] (jbd2_journal_load) from [<801e3c78>] (ext4_fill_super+0x1cd8/0x2f60)
[  361.819056] [<801e3c78>] (ext4_fill_super) from [<8014668c>] (mount_bdev+0x164/0x19c)
[  361.819074] [<8014668c>] (mount_bdev) from [<801dcd90>] (ext4_mount+0x24/0x30)
[  361.819091] [<801dcd90>] (ext4_mount) from [<80146ffc>] (mount_fs+0x24/0xd4)
[  361.819111] [<80146ffc>] (mount_fs) from [<801609d0>] (vfs_kern_mount+0x58/0x100)
[  361.819131] [<801609d0>] (vfs_kern_mount) from [<801638c8>] (do_mount+0x194/0xa18)
[  361.819152] [<801638c8>] (do_mount) from [<801644a0>] (SyS_mount+0x84/0xb0)
[  361.819172] [<801644a0>] (SyS_mount) from [<80768210>] (mount_block_root+0x124/0x250)
[  361.819188] [<80768210>] (mount_block_root) from [<80768514>] (mount_root+0xec/0x110)
[  361.819204] [<80768514>] (mount_root) from [<807686a0>] (prepare_namespace+0x168/0x1c8)
[  361.819221] [<807686a0>] (prepare_namespace) from [<80767ea4>] (kernel_init_freeable+0x188/0x1d0)
[  361.819237] [<80767ea4>] (kernel_init_freeable) from [<8052a60c>] (kernel_init+0x18/0xfc)
[  361.819254] [<8052a60c>] (kernel_init) from [<8000ec88>] (ret_from_fork+0x14/0x20)
@pelwell
Copy link
Contributor

pelwell commented Mar 28, 2017

  1. When did this start failing?

  2. What did you change before it started to fail?

  3. Have you tried a fresh image on a different card?

  4. Why are you running such an old kernel image (over two years old)? We are on 4.4 now (soon to switch to 4.9), and we can't support firmware from so long ago.

@dhimantaa
Copy link
Author

Thanks for quick reply!

  • Today i checked my RPi and unable to login. Then i restarted the the RPi after that unable to boot.

  • No changes before it started to fail.

  • Yes the same image is working well with different card.

  • This kernel image is working well for me, So i did not switched to new one. If this issue is fixed in new kernel, I am happy to move.

@pelwell
Copy link
Contributor

pelwell commented Mar 28, 2017

It sounds to me like the card has become corrupted. Unfortunately your trace starts too late to be able to see what is failing. You could try adding earlyprintk loglevel=8 to the kernel command line in cmdline.txt (all in one line).

However, even if we do find what is failing to load, it is still likely that you will have to reflash the card with a new image. I hope you have a backup of your data. If not, there are ways to mount non-FAT filing systems on Windows, so it may be possible to recover some or all of it.

@dhimantaa
Copy link
Author

I added this earlyprintk loglevel=8 to cmdline.txt.

But the changes are not retaining after safely removing the SD card.
Same issue (Sync issue) i am also facing with different SD card as well.
#774

If these both issue are related to each other. Can anyone please clear my doubts about

  • Is both of the issue are of SD card hardware issue (In this case should change my SD card Brand) ?
  • Is both issue are because i am using old kernel image ?

@pelwell
Copy link
Contributor

pelwell commented Mar 29, 2017

To answer your questions:

  • I think it sounds like an SD card hardware problem. You could test your cards using one of the testing applications out there. We recommend h2testw (Windows/Mac) or f3write/f3read (Linux), but other tools are available. Note that these tests are likely to be destructive - you will probably lose any data on the cards.
  • The new kernels are likely to be better in many ways. In particular we now use the other SD interface on the chip (known as SDHOST), which gives better performance and has proven to be compatible with more cards, i.e. we aren't aware of any (non-fake) cards it doesn't work with.

@rodizio1
Copy link

I also get this messages, see here raspberrypi/linux#1777

Tried several different SDcards and also the sdhost interface, same problem.

@hankapsu
Copy link

I also have this message
858.082616] INFO: task HTV Notify:3465 blocked for more than 120 seconds.
[ 858.082622] Not tainted 4.9.28-v7+ #998
[ 858.082624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 858.082627] HTV Notify D 0 3465 3458 0x00000004
[ 858.082651] [<80713b70>] (__schedule) from [<80714094>] (schedule+0x50/0xa8)
[ 858.082658] [<80714094>] (schedule) from [<80717304>] (schedule_timeout+0x220/0x364)
[ 858.082666] [<80717304>] (schedule_timeout) from [<80715d48>] (__down+0x88/0xc0)
[ 858.082675] [<80715d48>] (__down) from [<8016ace8>] (down+0x54/0x68)
[ 858.082684] [<8016ace8>] (down) from [<805ff958>] (vchiq_release+0x134/0x31c)
[ 858.082693] [<805ff958>] (vchiq_release) from [<80271d4c>] (__fput+0x9c/0x1e8)
[ 858.082700] [<80271d4c>] (__fput) from [<80271f08>] (____fput+0x18/0x1c)
[ 858.082709] [<80271f08>] (____fput) from [<8013aad0>] (task_work_run+0xcc/0xfc)
[ 858.082716] [<8013aad0>] (task_work_run) from [<801212d4>] (do_exit+0x370/0xaec)
[ 858.082723] [<801212d4>] (do_exit) from [<80121ae8>] (do_group_exit+0x4c/0xe4)
[ 858.082729] [<80121ae8>] (do_group_exit) from [<8012cb68>] (get_signal+0x370/0x6d4)
[ 858.082737] [<8012cb68>] (get_signal) from [<8010b2cc>] (do_signal+0x74/0x3e4)
[ 858.082745] [<8010b2cc>] (do_signal) from [<8010b824>] (do_work_pending+0xb8/0xd0)
[ 858.082753] [<8010b824>] (do_work_pending) from [<801080e8>] (slow_work_pending+0xc/0x20)

I use 10 RPIs and it happens randomly.
Anyone solved this problem?

@JamesH65
Copy link
Contributor

Do any of the posters still encounter this when using the latest release (4.9) or even the latest dev kernel (4.14), on fresh SD cards?

@karneaud
Copy link

Just experienced it 2hrs ago and found this thread

I was running ubuntu 16.0.4 on Pi3 B model with a PiDrive usb

@SilvioRafaelLS
Copy link

SilvioRafaelLS commented Jan 19, 2018

I also faced this same issue today, and i'm running 4.9.40-v7+.(Pi3)
Did anyone reach same conclusion about this problem?
Thx

@JamesH65
Copy link
Contributor

It's a little difficult to diagnose from the reports given, and we are a bit busy at RPFT at the moment. Note that in order to look at it we need the ability to reproduce it on a Raspbian system (not Ubuntu - not our distro), with the latest kernel version (4.9), so if anyone can find a way of reliably replicating that would be very useful.

@geekycow
Copy link

On the current 13th March 2018 Stretch version of Raspbian Desktop on a new Pi 3 B+ with various known good MicroSD cards (a Samsung EVO+ 32Gb Class 10 and a new SanDisk Ultra 16Gb Class 10) and the both the older 5V 2A and the newer 5V 2.1A official PSUs, I can reliably reproduce this issue by running-

sudo dd if=/dev/mmcblk0 bs=4M status=progress >/dev/null

Interestingly, I've encouraged others to try it and they're seeing it too, so it does genuinely seem easy to reproduce.

@geekycow
Copy link

It also happens if running the OS from USB and using any card in the sdcard slot to run this test.

@pelwell
Copy link
Contributor

pelwell commented Mar 19, 2018

Thanks for the update - we'll investigate.

@geekycow
Copy link

geekycow commented Mar 19, 2018 via email

@pelwell
Copy link
Contributor

pelwell commented Mar 20, 2018

I've not seen any problems with the Pi 3B+ idle (apart from the dd) (ARM @ 600MHz), busy (ARM @ 1.4GHz), or oscillating between the two states. I'm not saying there isn't a problem of some kind, but so far this test hasn't triggered it for me.

@pelwell
Copy link
Contributor

pelwell commented Mar 20, 2018

Spoke too soon - my card just got trashed.

@dimovnike
Copy link

dimovnike commented Oct 30, 2018

hello, is there a workaround for this? For some reason this doesn't happen in release 1.20171029 (4.9.59-v7) but quite often in 1.20180417 (4.14.34-v7+) and 1.20180919. May be some defaults changed in the kernel config?

EDIT: turns out this is not completely true. The "task kworker/0:1:37 blocked" happens in all versions but on the newer firmwares kernel also says about user apps being blocked.

I'm testing with a QT app that plays video in background and on the old firmware, when I copy a large file, the QT app works ok, I occasionaly get "kworker blocked" and that's all. On the new firmware - the video playback is affected it often stops and waits, and when kernel logs the "qt thread blocked" the video stops completely and only reboot helps.

@lowflyerUK
Copy link

I have just had something similar to this on a 3B+ running raspbian stretch lite with 4.14.71-v7+ #1145 SMP Fri Sep 21 15:38:35 BST 2018 armv7l GNU/Linux. I updated to this a few days ago.

The SDcard is Sandisk man:0x000003 oem:0x5344 name:SC16G hwrev:0x8 fwrev:0x0

There were a few processes going on that were reading and writing to SDcard at the time.

syslog (on a different disk) is normal up until this:
[76920.994254] INFO: task mmcqd/0:68 blocked for more than 120 seconds.
[76920.994263] Tainted: G C 4.14.71-v7+ #1145
[76920.994266] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[76920.994270] mmcqd/0 D 0 68 2 0x00000000
[76920.994299] [<8079ded0>] (__schedule) from [<8079e538>] (schedule+0x50/0xa8)
[76920.994307] [<8079e538>] (schedule) from [<8079e9b0>] (schedule_preempt_disabled+0x18/0x1c)
[76920.994316] [<8079e9b0>] (schedule_preempt_disabled) from [<807a02b8>] (__mutex_lock.constprop.3+0x190/0x58c)
[76920.994325] [<807a02b8>] (__mutex_lock.constprop.3) from [<807a07d0>] (__mutex_lock_slowpath+0x1c/0x20)
[76920.994332] [<807a07d0>] (__mutex_lock_slowpath) from [<807a0830>] (mutex_lock+0x5c/0x60)
[76920.994341] [<807a0830>] (mutex_lock) from [<802861a4>] (cma_alloc+0xe8/0x288)
[76920.994351] [<802861a4>] (cma_alloc) from [<8054c3ec>] (dma_alloc_from_contiguous+0x48/0x4c)
[76920.994361] [<8054c3ec>] (dma_alloc_from_contiguous) from [<801155b0>] (__alloc_from_contiguous+0x40/0xc8)
[76920.994368] [<801155b0>] (__alloc_from_contiguous) from [<80115680>] (cma_allocator_alloc+0x48/0x50)
[76920.994374] [<80115680>] (cma_allocator_alloc) from [<8011585c>] (__dma_alloc+0x1d4/0x32c)
[76920.994380] [<8011585c>] (__dma_alloc) from [<80115a4c>] (arm_dma_alloc+0x50/0x58)
[76920.994387] [<80115a4c>] (arm_dma_alloc) from [<802729e0>] (dma_pool_alloc+0x208/0x23c)
[76920.994397] [<802729e0>] (dma_pool_alloc) from [<804fb694>] (bcm2835_dma_create_cb_chain+0x1b4/0x214)
[76920.994406] [<804fb694>] (bcm2835_dma_create_cb_chain) from [<804fbb9c>] (bcm2835_dma_prep_slave_sg+0x12c/0x2a8)
[76920.994416] [<804fbb9c>] (bcm2835_dma_prep_slave_sg) from [<806385d8>] (bcm2835_sdhost_request+0x418/0x5fc)
[76920.994426] [<806385d8>] (bcm2835_sdhost_request) from [<806194bc>] (__mmc_start_request+0x78/0x1a8)
[76920.994434] [<806194bc>] (__mmc_start_request) from [<8061a030>] (mmc_start_request+0x158/0x184)
[76920.994440] [<8061a030>] (mmc_start_request) from [<8061a2d8>] (mmc_start_areq+0x27c/0x3d0)
[76920.994449] [<8061a2d8>] (mmc_start_areq) from [<8062b864>] (mmc_blk_issue_rw_rq+0xa8/0x3ac)
[76920.994457] [<8062b864>] (mmc_blk_issue_rw_rq) from [<8062c838>] (mmc_blk_issue_rq+0x2b4/0x74c)
[76920.994465] [<8062c838>] (mmc_blk_issue_rq) from [<8062cd48>] (mmc_queue_thread+0x78/0x14c)
[76920.994473] [<8062cd48>] (mmc_queue_thread) from [<8013dad4>] (kthread+0x13c/0x16c)
[76920.994483] [<8013dad4>] (kthread) from [<8010810c>] (ret_from_fork+0x14/0x28)

Then there were a few other similar blocks with:
[76920.994489] INFO: task jbd2/mmcblk0p2-:69 blocked for more than 120 seconds.

[76920.994609] INFO: task VCHIQ completio:639 blocked for more than 120 seconds.

[76920.994800] INFO: task kworker/0:1:9932 blocked for more than 120 seconds.

[76920.994894] INFO: task python3:9951 blocked for more than 120 seconds.

Then it completely froze. Fortunately the systemd reboot worked with this in /etc/systemd/system.conf:

CrashReboot=yes
RuntimeWatchdogSec=10s
ShutdownWatchdogSec=5min

Any suggestions? As ever, many thanks!

@pelwell
Copy link
Contributor

pelwell commented Nov 9, 2018

This looks a lot like another issue (raspberrypi/linux#2557). The problem appears to be fragmentation of the CMA area, caused by a mixture of interleaved large and small allocations. The latest rpi-update kernel includes a patch to use a pool for the small allocations, which appears to be quite effective. Please run sudo rpi-update and see if it makes a difference.

@lowflyerUK
Copy link

@pelwell Thanks! I have done that and I will let you know how it goes.

@lowflyerUK
Copy link

@pelwell So far no further problems. This is good, although it only happened once before, and I never discovered how to force it to happen.

@pelwell
Copy link
Contributor

pelwell commented Nov 27, 2018

Thanks for the feedback. I'm going to tentatively close this issue - reopen it or comment if the problem recurs.

@pelwell pelwell closed this as completed Nov 27, 2018
@tg12
Copy link

tg12 commented Apr 25, 2020

I've just come here because I have this issue on a new RPI4 and a fresh SD card.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests