Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS kernel panic VERIFY3(sa.sa_magic == SA_MAGIC) in zpl_get_file_info() on Ubuntu 22.04 #16276

Open
siebenmann opened this issue Jun 17, 2024 · 12 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@siebenmann
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04
Kernel Version 5.15.0-88-generic
Architecture x86_64
OpenZFS Version v2.1.5-1ubuntu6~22.04.1

Describe the problem you're observing

We operate a number of ZFS fileservers exporting filesystems via NFS; these are the descendants of what were originally Solaris fileservers set up in 2008, with filesystems moved from generation to generation with 'zfs send | zfs recv'. As a result, we wound up with some filesystems that were still on ZFS filesystem version 4. In early December of 2023, we did 'zfs upgrade' to upgrade all of those to ZFS filesystem version 5. Since then, two fileservers have experienced a ZFS kernel panic due to an assertion failure, one once and one fileserver twice. These panics can't be provoked by simply reading the filesystems (we do this regularly for backups, and we did this for testing). We are not using encryption.

It's possible that this is related to #12659, #13144, and/or #13937, however our situation seems somewhat different than all of these (eg, lack of encryption).

Include any warning/errors/backtraces from the system logs

This is from our most recent panic and crash (and automated reboot because we set ZFS panics to be kernel panics and we reboot on panics), which happened the second time on the same fileserver. Past panics have had different values for the actual sa.sa_magic field: 1446876386, 1682687197, and 1682672016. The stack traces are otherwise consistent with each other (well, from zpl_get_file_info() backward).

[8956269.023527] VERIFY3(sa.sa_magic == SA_MAGIC) failed (1690920705 == 3100762)
[8956269.030785] PANIC at zfs_quota.c:89:zpl_get_file_info()
[8956269.036291] Kernel panic - not syncing: VERIFY3(sa.sa_magic == SA_MAGIC) failed (1690920705 == 3100762)
[8956269.046001] CPU: 4 PID: 4569 Comm: dp_sync_taskq Tainted: P           O      5.15.0-88-generic #98-Ubuntu
[8956269.055873] Hardware name: Supermicro Super Server/X11SPH-nCTF, BIOS 2.0 11/29/2017
[8956269.063830] Call Trace:
[8956269.066536]  <TASK>
[8956269.068894]  show_stack+0x52/0x5c
[8956269.072480]  dump_stack_lvl+0x4a/0x63
[8956269.076413]  dump_stack+0x10/0x16
[8956269.080001]  panic+0x15c/0x334
[8956269.083327]  spl_panic+0xcc/0xe9 [spl]
[8956269.087361]  ? zio_execute_stack_check.constprop.0+0x20/0x20 [zfs]
[8956269.094042]  ? __wake_up+0x13/0x20
[8956269.097708]  ? taskq_dispatch_ent+0xec/0x150 [spl]
[8956269.102778]  ? select_idle_sibling+0x2b/0xa60
[8956269.107407]  ? wake_affine+0x74/0x310
[8956269.111332]  ? newidle_balance+0x12e/0x470
[8956269.115692]  zpl_get_file_info+0x1fe/0x240 [zfs]
[8956269.120781]  dmu_objset_userquota_get_ids+0x382/0x490 [zfs]
[8956269.126761]  dnode_sync+0x557/0x620 [zfs]
[8956269.131179]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
[8956269.136233]  dmu_objset_sync_dnodes+0x68/0x90 [zfs]
[8956269.141526]  sync_dnodes_task+0x2b/0x50 [zfs]
[8956269.146294]  taskq_thread+0x21e/0x400 [spl]
[8956269.150756]  ? wake_up_q+0x90/0x90
[8956269.154428]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
[8956269.159581]  kthread+0x127/0x150
[8956269.163073]  ? set_kthread_struct+0x50/0x50
[8956269.167518]  ret_from_fork+0x1f/0x30
[8956269.171362]  </TASK>
[8956269.173874] Kernel Offset: 0x24800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[8956269.246646] Rebooting in 10 seconds..
@siebenmann siebenmann added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jun 17, 2024
@rincebrain
Copy link
Contributor

I am curious if you're using Lustre or not, since one of those seemed to be Lustre calling the interface wrong. (I'm assuming not, or you'd have mentioned it, but it seemed like a reasonable question to be explicit about.)

It reminds me of the illumos bug where zfs upgrade on a clone pre-v5 with native encryption would panic - I know you're not using native encryption, but I'm wondering if it's still doing something unsafe with quotas all the same.

Those SA_MAGIC values being similarly wild makes me wonder if it got the same wild value from something, e.g. a buffer that it shouldn't have, and wrote it out happily, and now reading it back is going "oh no".

I suppose, in theory, if this is reading quota metadata and it's calculated quota data, we could do the same thing I added in the native encryption failure to decrypt quota metadata case and just trigger regenerating it, but that stacktrace suggests it's finding an insane SA, which I don't think the quota stuff uses directly, it just reads it to try and figure out how much space it's using, I imagine...

So it'd probably be useful to backport the changes to ASSERTs from the future that let you print custom info when they trigger, and then make it print the object it's on when it panics like that, so that you can then go poking around with zdb.

Alternately, one imagines asking zdb to walk the relevant dataset will panic in a similar way if you use flags that will make zdb attempt to do similar things...assuming it's an actual on-disk problem and not an in-memory bug, which seeing what object id it's complaining about would be informative for.

If it's a dnode update race, then if Ubuntu hasn't cherrypicked it, some of the fixes around that might be useful.

@siebenmann
Copy link
Contributor Author

We're not using Lustre; it's all straight NFS (originally v3, now mostly v4). We routinely ask ZFS for all user quota information, so I think raw quota information must be good. Based on the code it looks like the objectid isn't directly available at the VERIFY3, as zpl_get_file_info() is only passed the bonustype and the hypothetical data for the SA (to reach this VERIFY3 we must be working on a SA, not a dnode). Only dmu_objset_userquota_get_ids() has the dnode itself. So in theory if I can find a zdb operation that will walk a ZFS filesystem reading all SAs for all dnodes, it should trigger this failure in user space.

The infrequency of this panic is puzzling to me. Regardless of whether this is on-disk corruption or (consistent) in-memory corruption, almost all of the ZFS filesystem activity is NFS activity, and NFS clients should retry any operation that fails because the fileserver crashes and restarts, which should trigger it again.

I doubt Ubuntu 22.04 has cherry-picked very much into their ZFS, and I can't spot signs of it in their confusing git repository of ZFS.

@rincebrain
Copy link
Contributor

They've written their own data loss-inducing patches before, so I wouldn't recommend trusting them not to break things, in general.

For debugging, I'd probably smuggle the object id in through one of the fields of the zfs_file_info_t, since it's just going to get overwritten later on in normal operation, so things that don't trip the assert shouldn't care.

@siebenmann
Copy link
Contributor Author

If I'm reading the code correctly, using zdb to search for this requires using zdb -Gddd .... and then looking at the end of the output for the output from module/zfs/sa.c's sa_build_index(), which I believe will look like Buffer Header: %x != SA_MAGIC:%x object=%#llx. It appears that the SA_MAGIC value is only verified in zpl_get_file_info() and here, and this is called from sa_handle_get_from_db() and then sa_handle_get(), which zdb calls for every znode. However I'm not sure I understand what is going on in these two places and why one panics and the other doesn't.

@rincebrain
Copy link
Contributor

rincebrain commented Jun 17, 2024

If zdb doesn't panic, or the object isn't the same every time this breaks, my guess would be that it's mangling something in memory (and panicking before it makes it out to disk).

You could try a kASAN kernel, see if someone's reaching out and mangling something they shouldn't be, potentially.

I don't recall or see many changes that might be immediately obviously relevant, though I wouldn't be astonished if this is somehow a race in dnode_sync.

You could also try 2.1.15 or 2.2.4 and see if they play any better here, potentially.

@siebenmann
Copy link
Contributor Author

Here is what I think I see about what is happening in the code, to try to keep track of this.

  • the panic happens when a dnode's bonus data is supposed to be system attributes, but does not have SA_MAGIC in the right place.
  • because the call chain is going through dnode_sync(), we are looking at the new dnode (and bonus data) that is supposed to be written out as part of the transaction. As a corollary, this panic is likely a good thing for us because it is probably preventing corrupted SA bonus data being written to disk.
  • the obvious thing that starts happening when you update a filesystem from ZFS filesystem version 4 to ZFS filesystem version 5 is that filesystem objects start getting their system attributes converted from the old v4 form to the new, better v5 form if they're being updated anyway.

One possible theory is that there is a bug in the attribute update code, one that probably only triggers some of the time (perhaps depending on what else is already in memory).

@snajpa
Copy link
Contributor

snajpa commented Nov 28, 2024

can you try to put together a reproducer? I'd look into this if I can reproduce it locally

@siebenmann
Copy link
Contributor Author

siebenmann commented Nov 28, 2024 via email

@snajpa
Copy link
Contributor

snajpa commented Nov 28, 2024

mhm got it... would it be possible for you to try current git master to rule out the possibility of this being fixed there?

@siebenmann
Copy link
Contributor Author

siebenmann commented Nov 28, 2024 via email

@snajpa
Copy link
Contributor

snajpa commented Nov 28, 2024

ok :) one last question, when you say you tried to figure out a reproducer, did you also try creating the test pool with ZoL 0.6.x or the original OI/OpenSolaris?

@siebenmann
Copy link
Contributor Author

siebenmann commented Nov 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants