-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PANIC on already held dp_config_rwlock after zvol_set_volsize #2039
Comments
I would assume that dsl_pool_hold gets called on zfs create pool/foo. I did that, and it doesn't trigger that assert. Also changing other attributes of the zvol doesn't trigger this assert, e.g. changing compression. |
Ah, it's line 344 of zvol.c that's most likely causing the problem:
I mistook the dsl_prop_get_integer invocation for line 319 above, but the offset into zvol_set_volsize makes this unlikely (zvol_set_volsize+0x1e9/0x3c0). Moving the readonly check (Line 344-348) outside the dmu_objset_hold/dmu_objset_rele scope should fix the problem. I'll test that tomorrow. |
Fixing that bumps me into another lock related panic. [89598.753361] SPLError: 21252:0:(dsl_pool.c:1032:dsl_pool_config_enter()) ASSERTION(!rrw_held(&dp->dp_config_rwlock, RW_READER)) failed [89598.756105] SPLError: 21252:0:(dsl_pool.c:1032:dsl_pool_config_enter()) SPL PANIC [89598.757486] SPL: Showing stack for process 21252 [89598.757489] CPU: 1 PID: 21252 Comm: zfs Tainted: P O 3.10.25-1-lts #1 [89598.757491] Hardware name: /DQ45CB, BIOS CBQ4510H.86A.0079.2009.0414.1340 04/14/2009 [89598.757492] ffffffffa0363654 ffff88022fa2dbb0 ffffffff814b1e2d ffff88022fa2dbc0 [89598.757496] ffffffffa01876d7 ffff88022fa2dbe8 ffffffffa0188a2f ffffffffa01a1a51 [89598.757498] ffff88022fa2dc98 ffff88022ea2ddd8 ffff88022fa2dc10 ffffffffa02a4117 [89598.757501] Call Trace: [89598.757521] [] dump_stack+0x19/0x1b [89598.757528] [] spl_debug_dumpstack+0x27/0x40 [spl] [89598.757533] [] spl_debug_bug+0x7f/0xe0 [spl] [89598.757553] [] dsl_pool_config_enter+0x87/0x90 [zfs] [89598.757570] [] dsl_pool_hold+0x42/0x50 [zfs] [89598.757586] [] dmu_objset_hold+0x26/0xb0 [zfs] [89598.757604] [] dsl_prop_get+0x35/0x80 [zfs] [89598.757620] [] dsl_prop_get_integer+0x1e/0x20 [zfs] [89598.757632] [] zvol_set_volsize+0x1e9/0x3c0 [zfs] [89598.757639] [] ? nvpair_value_common.part.21+0xc2/0x150 [znvpair] [89598.757653] [] zfs_prop_set_special+0x144/0x4f0 [zfs] [89598.757668] [] ? zfs_check_settable.isra.9+0x100/0x450 [zfs] [89598.757674] [] ? nvpair_value_uint64+0x33/0x40 [znvpair] [89598.757680] [] ? fnvpair_value_uint64+0x1a/0x90 [znvpair] [89598.757694] [] zfs_set_prop_nvlist+0x113/0x370 [zfs] [89598.757709] [] zfs_ioc_set_prop+0xdf/0x120 [zfs] [89598.757723] [] zfsdev_ioctl+0x501/0x570 [zfs] [89598.757726] [] ? mutex_unlock+0xe/0x10 [89598.757730] [] do_vfs_ioctl+0x2dd/0x4b0 [89598.757733] [] SyS_ioctl+0x81/0xa0 [89598.757736] [] ? do_page_fault+0xe/0x10 [89598.757739] [] system_call_fastpath+0x1a/0x1f which is due to zvol_update_volsize calling dmu_tx_assign with TXG_WAIT. The first panic is properly fixed by using dsl_prop_get_int_ds. The second panic, I worked around by switching to TXG_NOWAIT. See pull request. |
Under Linux the zvol_set_volsize() function was originally written to use dmu_objset_hold()/dmu_objset_rele(). Subsequently, the dmu_objset_own()/dmu_objset_disown() interfaces were added but the ZVOL code wasn't updated to take advantage of them. This was never an issue but after the dsl_pool_config changes the code now takes the config lock twice. The cleanest solution is to shift to using dmu_objset_own() which takes a long hold on the dataset and does not hold the dsl pool lock. This patch also slightly restructures the existing code such that it more closely resembles the upstream Illumos code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2039
@clefru Thanks for getting to the root cause of this issue. Your fix does resolve the issue but it has some minor downsides such as switching to TXG_NOWAIT. I'd like to take this opportunity fix this issue cleanly and sync up with the Illumos version of |
Under Linux the zvol_set_volsize() function was originally written to use dmu_objset_hold()/dmu_objset_rele(). Subsequently, the dmu_objset_own()/dmu_objset_disown() interfaces were added but the ZVOL code wasn't updated to take advantage of them. This was never an issue but after the dsl_pool_config changes the code now takes the config lock twice. The cleanest solution is to shift to using dmu_objset_own() which takes a long hold on the dataset and does not hold the dsl pool lock. This patch also slightly restructures the existing code such that it more closely resembles the upstream Illumos code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2039
With a debug build I get the following ASSERT reliably triggered on
zfs set volsize=somebiggernummber pool/zvol
Apparently something already holds the reader lock when entering dsl_pool_config_enter, but I couldn't spot anything obvious in the call path down from zfs_ioc_set_prop. I didn't check all the "side calls" that happen before this stack descent.
I am running 5d862cb zfs and 921a35a spl on linux-lts 3.10.25
The text was updated successfully, but these errors were encountered: