-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failure in Debug mode while trying to write to a zvol with mkswap #978
Comments
This trace has the HW_INVALID_HOSTID sentinel in it, which indicates that the host lacks some important system components or has broken packaging. An undefined hostid can cause subtle problems. |
The purpose of these installation (vmware guest) is only for searching those critical KM_SLEEP calls which leads to deadlocks in the memory management. This is a test case only, debug mode enabled and kmem.h edited. I found no way to insert a edited kmem.h and switch on the debug mode for zfs/spl in the regular installation routines. So i compiled it manually ( autogen, configure, make and make install ) tested and catched it short in advance of a total deadlock while writing to the zvol. Im happy to find this occurence, maybe it helps to find the reason for the remaining deadlocks, while using zvols as swapdevices. The reason why the hostid is set to 0xfffffff is maybe indeed a crude installation method of the system ( there where some missing and broken packages indeed, but i dont care) otherwise i cant imagine how this happens and i hope this doesn´t influence the Assertion failure. |
Can you rerun the test with debugging enabled but drop the kmem.h patch. This shouldn't be needed with Ubuntus 's default kernel. In fact it might not be safe I'd need to verify with the exact source. |
@pyavdr The assertion was almost certainly caused by the change to kmem.h which is right for SLES but not for Ubuntu. If you rerun without you shouldn't see this assert. |
Ok, i reverted kmem.h to the original version. Compiled and rebooted. Running mkswap to the zvol gives the same assertion. Additional i added the last 100 lines of /tmp/spl xxxx.log. SPL: Loaded module v0.6.0-rc11 (DEBUG mode) root@u-test:/tmp# tail -100 spl* |
I got another one: [ 60.392770] SPL: Loaded module v0.6.0-rc11 (DEBUG mode) |
in fact, after a reboot and doing any zfs related command ( zfs list, zpool status, ... ) this one showed up, [ 67.235804] SPL: Loaded module v0.6.0-rc11 (DEBUG mode) |
The second set of stack traces were fixed in master and are related to dedup. You should grab the latest source and retry. Also, do you have dedup enabled for your zvol swap dataset? I've never tested this and I could easily seeing it causing severe slowness but no outright failures. If you should disable it and see if your results improve. |
Ok, although i was sure using the latest source; i loaded it again from github, verified the changes in ddt.c. I did enable dedup, but dumped that now. Created a new pool. Compiled it with --enable-debug. Created a zvol and formated it with mkswap. It runs through, no messages. But using this zvol as swap space, while stressing the system with pythons, leads immediately to a deadlock once the system starts using the swap space. No messages in syslog. So the situation is pretty the same as with Opensuse. |
Interesting, it sounds like there are still some gremlins here. I was able to run a python test case with RHEL6.2 and I know @ryao has been using it successfully with Gentoo. But clearly there's still work to be done here. If your game to keep experimenting you might try two more things:
|
@behlendorf Is there anything we can do to make this more smooth? To have a hanging system for several minutes isn´t the optimal situation, when it comes to real boxes ... it will take very long time ( more then 60 mins) to recover with more then 16GB RAM. It looks like, that the bs of 4k is mandatory, if yes, there should be a hint to use zvol swap only with this 4k bs. It think we can close this issue, but keeping in mind, that using zvolswap is only ok for very small RAM sizes. |
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](tokio-rs/tokio@tokio-1.28.2...tokio-1.29.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
With Ubuntu 12.04 ( kernel 3.2) latest patches and SPL/ZFS latest github i can't format a zvol with
mkswap, in messages appear the following ASSERTION Failure. mkswap hangs, needed to reset.
The same Assertion failure can be seen while using zvol as swapspace on the same configuration.
I set #define PF_NOFS 0x00080000 in kmem.h as suggested.
[ 68.138204] SPL: Loaded module v0.6.0-rc11 (DEBUG mode)
[ 68.158312] zunicode: module license 'CDDL' taints kernel.
[ 68.158314] Disabling lock debugging due to kernel taint
[ 68.296506] ZFS: Loaded module v0.6.0-rc11 (DEBUG mode), ZFS pool version 28, ZFS filesystem version 5
[ 68.506516] SPL: using hostid 0xffffffff
[ 68.761104] zd0: unknown partition table
[ 96.599895] zd0: unknown partition table
[ 167.347091] SPLError: 2136:0:(zvol.c:548:zvol_write()) ASSERTION(!(current->flags & PF_NOFS)) failed
[ 167.347100] SPLError: 2136:0:(zvol.c:548:zvol_write()) SPL PANIC
[ 167.347104] SPL: Showing stack for process 2136
[ 167.347109] Pid: 2136, comm: zvol/15 Tainted: P O 3.2.0-30-generic #48-Ubuntu
[ 167.347112] Call Trace:
[ 167.347132] [] spl_debug_dumpstack+0x27/0x40 [spl]
[ 167.347141] [] spl_debug_bug+0x82/0xe0 [spl]
[ 167.347200] [] zvol_write+0x49c/0x4b0 [zfs]
[ 167.347210] [] ? default_spin_lock_flags+0x9/0x10
[ 167.347222] [] taskq_thread+0x266/0x840 [spl]
[ 167.347229] [] ? finish_task_switch+0x4a/0xf0
[ 167.347237] [] ? try_to_wake_up+0x200/0x200
[ 167.347248] [] ? __taskq_create+0x6e0/0x6e0 [spl]
[ 167.347254] [] kthread+0x8c/0xa0
[ 167.347263] [] kernel_thread_helper+0x4/0x10
[ 167.347268] [] ? flush_kthread_worker+0xa0/0xa0
[ 167.347273] [] ? gs_change+0x13/0x13
[ 167.347672] SPL: Dumping log to /tmp/spl-log.1348134249.2136
[ 368.026212] INFO: task zvol/15:2136 blocked for more than 120 seconds.
[ 368.026217] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 368.026221] zvol/15 D ffffffff81806200 0 2136 2 0x00000000
[ 368.026229] ffff880036781d30 0000000000000046 0000000000000000 0000000000001000
[ 368.026237] ffff880036781fd8 ffff880036781fd8 ffff880036781fd8 00000000000137c0
[ 368.026242] ffff880137c01700 ffff88003677ae00 ffffffff00000001 0000000000000000
[ 368.026249] Call Trace:
[ 368.026263] [] schedule+0x3f/0x60
[ 368.026279] [] spl_debug_bug+0xbd/0xe0 [spl]
[ 368.026339] [] zvol_write+0x49c/0x4b0 [zfs]
[ 368.026348] [] ? default_spin_lock_flags+0x9/0x10
[ 368.026360] [] taskq_thread+0x266/0x840 [spl]
[ 368.026367] [] ? finish_task_switch+0x4a/0xf0
[ 368.026374] [] ? try_to_wake_up+0x200/0x200
[ 368.026385] [] ? __taskq_create+0x6e0/0x6e0 [spl]
[ 368.026391] [] kthread+0x8c/0xa0
[ 368.026398] [] kernel_thread_helper+0x4/0x10
[ 368.026403] [] ? flush_kthread_worker+0xa0/0xa0
[ 368.026409] [] ? gs_change+0x13/0x13
[ 368.026426] INFO: task mkswap:2474 blocked for more than 120 seconds.
[ 368.026429] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 368.026432] mkswap D 0000000000000000 0 2474 2417 0x00000000
[ 368.026438] ffff880100ef7d18 0000000000000086 ffff8800b9ba0000 ffff8800b9ba0000
[ 368.026444] ffff880100ef7fd8 ffff880100ef7fd8 ffff880100ef7fd8 00000000000137c0
[ 368.026449] ffff8801383c0000 ffff88010a4f8000 ffff880100ef7cf8 ffff88013ae14080
[ 368.026455] Call Trace:
[ 368.026463] [] ? __lock_page+0x70/0x70
[ 368.026470] [] schedule+0x3f/0x60
[ 368.026475] [] io_schedule+0x8f/0xd0
[ 368.026480] [] sleep_on_page+0xe/0x20
[ 368.026486] [] __wait_on_bit+0x5f/0x90
[ 368.026492] [] wait_on_page_bit+0x78/0x80
[ 368.026497] [] ? autoremove_wake_function+0x40/0x40
[ 368.026503] [] filemap_fdatawait_range+0x10c/0x1a0
[ 368.026510] [] ? do_writepages+0x21/0x40
[ 368.026517] [] filemap_write_and_wait_range+0x68/0x80
[ 368.026524] [] blkdev_fsync+0x24/0x50
[ 368.026531] [] do_fsync+0x56/0x80
[ 368.026536] [] sys_fsync+0x10/0x20
[ 368.026541] [] system_call_fastpath+0x16/0x1b
The text was updated successfully, but these errors were encountered: