-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic in NFS filehandle decoding for snapshots [code issue identified and reproducible] #7764
Comments
I believe that this is a bug in This path is (Any other kernel errors returned from I think the direct fix would be to change
This should turn a negative kernel error into a positive ZFS error if there is one. (As an additional safety check, one could make A quick grep of the source code suggests that this is the only use of |
awesome.... please could one of the developers fix this as soon as
possible.
we would be very grateful. ..
Regards
Brandon
…On Tue, 28 Aug 2018, 03:34 Chris Siebenmann, ***@***.***> wrote:
I believe that this is a bug in zfs_snapdir_vget(). The erroneous dentry
returned is 0x28, or decimal 40, which is ELOOP, which is very suggestive
of a confusion between the ZFS code (where error returns are positive
values, eg ELOOP) and the Linux kernel code (where error returns are
negative values, eg -ELOOP). In zpl_fh_to_dentry(), ZFS converts any
non-zero return from zfs_vget(), assumed to be a ZFS error (positive) to
a kernel error (negative) by negating it, in the 0.7 code here
<https://github.com/zfsonlinux/zfs/blob/zfs-0.7-release/module/zfs/zpl_export.c#L106>.
I believe there is a path through zfs_vget() that will return a *kernel*
(negative) error, which will then be negated into a small positive number
by this code.
This path is zfs_vget() -> zfs_snapdir_vget() -> a failed call to
kern_path() with the comment 'trigger automount' (in the 0.7 code here
<https://github.com/zfsonlinux/zfs/blob/zfs-0.7-release/module/zfs/zfs_ctldir.c#L1183>).
As the comment in zfsctl_snapshot_mount() covers, in highly contended
situations the VFS lookup can return ELOOP, which is really -ELOOP because
this is a kernel (negative) error. This negative kernel error will be
returned as-is up through zfs_vget() and then we explode.
(Any other kernel errors returned from kern_path() would cause similar
explosions.)
I think the direct fix would be to change zfsctl_snapdir_vget() to do:
/* Trigger automount */
error = -kern_path(mnt, LOOKUP_FOLLOW|LOOKUP_DIRECTORY, &path);
if (error)
goto out;
This should turn a negative kernel error into a positive ZFS error if
there is one.
(As an additional safety check, one could make zpl_fh_to_dentry()
explicitly verify that the error is not negative before it negates it.)
A quick grep of the source code suggests that this is the only use of
kern_path() in the ZoL code, but similar issues with directly returning
kernel errors as ZFS errors may be lurking in other spots.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#7764 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASSZpnc4-7aHLHGpsotWP7V3gpgRGA7cks5uVJ4egaJpZM4VtrVy>
.
|
This of course doesn't explain why the ELOOP error occurs in the first place..... There is definitely a behaviour under high load and concurrent access to snaps via samba/nfs that causes the ELOOP - we are seeing this current all the time. Interestingly - an fuser -c shows the same prcoesses accessing all the snaps in the .zfs/snapshot directory and it itself causing the ELOOP - The act of testing the directory causes the loop (as in the Heisenberg principle). |
By the way, it turns out that there is a simple reproduction for this, because we can artificially induce an
The NFS server will immediately panic with this error. This happens completely reliably because ZFS refuses to mount an already-mounted snapshot a second time, so the kernel's upcall to try to mount the snapshot on its normal location of /tank/test/.zfs/snapshot/boom will always fail. |
hi
Chris this is awesome.
are we getting closer to possibly now explaining the "too many levels of
symbolic links" fiasco where a snap is pointed to by a symlink and the link
is in an nfs/samba shared file system....
our experience (a major production server crashing once a day now) it that
it is always preceded by a snap becoming unreadable with this error but
there is no evidence of it being mounted on the server at that point. as
though it was being accessed when the mount was destroyed and the symlink
becomes recursive thereafter....
any attempt to release the link (say by restarting nfsd) and boom the
server panics.
i am desperately trying to save zol reputation in my organization right now
and we are bleeding so badly with this bug.
regards
bh
ps: i noted in the code that the original coders of the snapshot module are
still active in this community... hoping they will jump in soon and assist
with a desperate solution and fix.
…On Tue, 28 Aug 2018, 21:48 Chris Siebenmann, ***@***.***> wrote:
By the way, it turns out that there is a simple reproduction for this,
because we can artificially induce an ELOOP error by mounting the
snapshot by hand:
1. NFS export a filesystem, say /tank/test, with the crossmnt option.
Mount it on a client machine somewhere, say at /mnt.
2. make a snapshot: zfs snapshot ***@***.***
3. manually mount the snapshot on the NFS server at a convenient but
non-default location: mount -t zfs -o
ro,nosuid,nodev,relatime,xattr,noacl ***@***.*** /mnt
4. On the client machine, attempt to access the snapshot: cd
/mnt/.zfs/snapshot/boom
The NFS server will immediately panic with this error. This happens
completely reliably because ZFS refuses to mount an already-mounted
snapshot a second time, so the kernel's upcall to try to mount the snapshot
on its normal location of /tank/test/.zfs/snapshot/boom will always fail.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7764 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASSZpifBTt3ggvaXF2CheHi8YhQlVWX1ks5uVZ55gaJpZM4VtrVy>
.
|
This is perhaps better:
|
@siebenmann I agree, this looks like it's entirely due to failing to properly negate the return value of |
As a regular kernel function, kern_path() returns errors as negative errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into the positive errnos used throughout the ZFS code when it returns them to other ZFS functions so that the ZFS code properly sees them as errors. Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu> Closes openzfs#7764
As a regular kernel function, kern_path() returns errors as negative errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into the positive errnos used throughout the ZFS code when it returns them to other ZFS functions so that the ZFS code properly sees them as errors. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu> Closes #7764 Closes #7864
As a regular kernel function, kern_path() returns errors as negative errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into the positive errnos used throughout the ZFS code when it returns them to other ZFS functions so that the ZFS code properly sees them as errors. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu> Closes openzfs#7764 Closes openzfs#7864
As a regular kernel function, kern_path() returns errors as negative errnos, such as -ELOOP. zfsctl_snapdir_vget() must convert these into the positive errnos used throughout the ZFS code when it returns them to other ZFS functions so that the ZFS code properly sees them as errors. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chris Siebenmann <cks.git01@cs.toronto.edu> Closes openzfs#7764 Closes openzfs#7864
Case Title : Kernel Panic BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
Case Number : 02151530
Case Open Date : 2018-07-30 18:05:51
Severity : 3 (Normal)
Problem Type : Account / Customer Service Request
Most recent comment: On 2018-08-03 09:48:15, Red Hat Engineer commented:
"Hello,
I have analyzed the vmcore and my findings to outline the details of the cause of the crash and the discrepancies.
The next steps mentioned towards the end.
Summary of the crash context:
The crash occurred due to an invalid dentry deference in the context of panic. The control of execution was attempting the deference the inode off the dentry structure and obtain its mode for a condition check.
The dentry corruption has happened a few instructions prior to the crash. The function pointer responsible for returning the dentry was pointing to a zfs file system. The ZFS module is involved in the crash as its returning a dentry which is invalid and thereafter a deference of that dentry caused a crash.
------ vmcore analysis --
crash> sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.21.1.el7.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/290988954/crash/vmcore [PARTIAL DUMP]
CPUS: 20
DATE: Mon Jul 30 06:48:55 2018
UPTIME: 4 days, 23:11:30
LOAD AVERAGE: 1.21, 1.41, 1.70
TASKS: 1292
NODENAME: xxxxxxxxxx
RELEASE: 3.10.0-693.21.1.el7.x86_64
VERSION: #1 SMP Fri Feb 23 18:54:16 UTC 2018
MACHINE: x86_64 (2599 Mhz)
MEMORY: 32 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000058"
xxxxxxxxxx:root:/hubvol> modinfo zfs | grep -iw version
version: 0.7.7-1
xxxxxxxxxx:root:/hubvol> modinfo spl | grep -iw version
version: 0.7.7-1
Backtrace of the crashed task:
crash> bt
PID: 9151 TASK: ffff8807c8bedee0 CPU: 16 COMMAND: "nfsd"
#0 [ffff8808085f7898] machine_kexec at ffffffff8105d77b
#1 [ffff8808085f78f8] __crash_kexec at ffffffff81108732
#2 [ffff8808085f79c8] crash_kexec at ffffffff81108820
#3 [ffff8808085f79e0] oops_end at ffffffff816b8778
#4 [ffff8808085f7a08] no_context at ffffffff816a7c7a
#5 [ffff8808085f7a58] __bad_area_nosemaphore at ffffffff816a7d10
#6 [ffff8808085f7aa0] bad_area_nosemaphore at ffffffff816a7e7a
#7 [ffff8808085f7ab0] __do_page_fault at ffffffff816bb68e
#8 [ffff8808085f7b10] do_page_fault at ffffffff816bb835
#9 [ffff8808085f7b40] page_fault at ffffffff816b7768
[exception RIP: exportfs_decode_fh+0x8d]
RIP: ffffffff812947fd RSP: ffff8808085f7bf0 RFLAGS: 00010207
RAX: 0000000000000028 RBX: ffff8808172b3aa0 RCX: 0000000006f457e4
RDX: 0000000000000028 RSI: 0000000000001000 RDI: ffff88017fc03300
RBP: ffff8808085f7d38 R8: 000000000001ba40 R9: ffffffffc0444f65
R10: ffff88081ea1ba40 R11: ffffea001f179800 R12: ffffffffc03c6540
R13: 0000000000000028 R14: ffffffffc08365e0 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff8808085f7d40] fh_verify at ffffffffc03c72b8 [nfsd]
#11 [ffff8808085f7db0] nfsd3_proc_fsinfo at ffffffffc03d13e5 [nfsd]
#12 [ffff8808085f7de0] nfsd_dispatch at ffffffffc03c35a0 [nfsd]
#13 [ffff8808085f7e18] svc_process_common at ffffffffc0366646 [sunrpc]
#14 [ffff8808085f7e78] svc_process at ffffffffc03669f3 [sunrpc]
#15 [ffff8808085f7ea0] nfsd at ffffffffc03c2eff [nfsd]
#16 [ffff8808085f7ec8] kthread at ffffffff810b4031
#17 [ffff8808085f7f50] ret_from_fork at ffffffff816c0577
crash>
== Disassembly of crash IP
crash> dis -lr exportfs_decode_fh+0x8d | tail -15
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 392
0xffffffff812947d3 <exportfs_decode_fh+0x63>: mov -0x144(%rbp),%edx
0xffffffff812947d9 <exportfs_decode_fh+0x69>: mov -0x140(%rbp),%rsi
0xffffffff812947e0 <exportfs_decode_fh+0x70>: callq 0xffffffff81337db0 <__x86_indirect_thunk_rax>
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 393
0xffffffff812947e5 <exportfs_decode_fh+0x75>: test %rax,%rax
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 392
0xffffffff812947e8 <exportfs_decode_fh+0x78>: mov %rax,%r13
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 393
0xffffffff812947eb <exportfs_decode_fh+0x7b>: je 0xffffffff812948b0 <exportfs_decode_fh+0x140>
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 395
0xffffffff812947f1 <exportfs_decode_fh+0x81>: cmp $0xfffffffffffff000,%rax
0xffffffff812947f7 <exportfs_decode_fh+0x87>: ja 0xffffffff812948a2 <exportfs_decode_fh+0x132>
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 398
0xffffffff812947fd <exportfs_decode_fh+0x8d>: mov 0x30(%rax),%rax << Crashed here
=== source code of crash context
struct dentry *exportfs_decode_fh(struct vfsmount *mnt, struct fid *fid,
int fh_len, int fileid_type,
int (*acceptable)(void *, struct dentry *), void *context)
{
const struct export_operations *nop = mnt->mnt_sb->s_export_op; << I:[1] export operations of zfs
struct dentry *result, *alias;
char nbuf[NAME_MAX+1];
int err;
/*
*/
if (!nop || !nop->fh_to_dentry)
return ERR_PTR(-ESTALE);
result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); << I:[2] to return the dentry of zfs
if (!result)
result = ERR_PTR(-ESTALE);
if (IS_ERR(result))
return result;
if (S_ISDIR(result->d_inode->i_mode)) { << Crashed here I:[3] Crashed here
/*
*/
<..>
Source code with assembly snippet for the checks
387 /*
388 * Try to get any dentry for the given file handle from the filesystem.
389 */
390 if (!nop || !nop->fh_to_dentry)
391 return ERR_PTR(-ESTALE);
392 result = nop->fh_to_dentry(mnt->mnt_sb, fid, fh_len, fileid_type); = > mov %rax,%r13
=> The dentry returned for this zfs filesystem was corrupted and held value shown in RAX which is result
R13: 0000000000000028 RAX: 0000000000000028
393 if (!result) << condition check passed as it was not 0x0
394 result = ERR_PTR(-ESTALE);
395 if (IS_ERR(result))
396 return result;
397
398 if (S_ISDIR(result->d_inode->i_mode)) { << Crashed
crashed here as the (dentry*) in result is invalid and hence an invalid dereference.
<exportfs_decode_fh+0x8d>: mov 0x30(%rax),%rax
RAX: 0000000000000028
RAX + 0x30 = 0x58 which was the address tried to deference and hence the crash as this in invalid addr
PANIC: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
== Obtain the export_operations in this context and the
crash> dis -lr exportfs_decode_fh+0x8d | tail -30
0xffffffff81294797 <exportfs_decode_fh+0x27>: mov %gs:0x28,%rax
0xffffffff812947a0 <exportfs_decode_fh+0x30>: mov %rax,-0x30(%rbp)
0xffffffff812947a4 <exportfs_decode_fh+0x34>: xor %eax,%eax
0xffffffff812947a6 <exportfs_decode_fh+0x36>: mov %edx,-0x144(%rbp)
0xffffffff812947ac <exportfs_decode_fh+0x3c>: mov %r9,-0x138(%rbp)
/usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/fs/exportfs/expfs.c: 382
0xffffffff812947b3 <exportfs_decode_fh+0x43>: mov 0x48(%rdi),%r14
crash> bt | grep R14
R13: 0000000000000028 R14: ffffffffc08365e0 R15: 0000000000000001
crash>
The export operations passed to exportfs_decode_fh() was ffffffffc08365e0
which points to zpl_export_operations [zfs] from zfs module.
The zfs module used in this context is tainting the kernel.
crash> sym ffffffffc08365e0
ffffffffc08365e0 (r) zpl_export_operations [zfs] << I:[1] as mentioned above
crash> export_operations ffffffffc08365e0
struct export_operations {
encode_fh = 0xffffffffc07b94b0,
fh_to_dentry = 0xffffffffc07b9560, << I:[2] Function ptr invoked for dentry
fh_to_parent = 0x0,
get_name = 0x0,
get_parent = 0xffffffffc07b93f0,
commit_metadata = 0xffffffffc07b9370,
get_uuid = 0x0,
map_blocks = 0x0,
commit_blocks = 0x0
}
crash> sym 0xffffffffc07b9560 << I:[2]
ffffffffc07b9560 (t) zpl_fh_to_dentry [zfs]
This zpl_fh_to_dentry() returned an invalid dentry while called in exportfs_decode_fh() from the export_operations of this zfs filesystem, which nfsd tried to invoke. This zpl_fh_to_dentry() is called because the operation was related to the dentry of the zfs file from the given file handle (nop->fh_to_dentry)
crash> sym zpl_export_operations
ffffffffc08365e0 (r) zpl_export_operations [zfs]
crash> rd -S ffffffffc08365e0 32
ffffffffc08365e0: zpl_encode_fh zpl_fh_to_dentry
ffffffffc08365f0: 0000000000000000 0000000000000000
ffffffffc0836600: zpl_get_parent zpl_commit_metadata
ffffffffc0836610: 0000000000000000 0000000000000000
ffffffffc0836620: 0000000000000000 0000000000000000
ffffffffc0836630: 0000000000000000 0000000000000000
ffffffffc0836640: 746972775f6c707a 0000736567617065
ffffffffc0836650: 0000000000000000 0000000000000000
ffffffffc0836660: 0000000000000000 generic_file_llseek
ffffffffc0836670: generic_read_dir 0000000000000000
ffffffffc0836680: 0000000000000000 0000000000000000
ffffffffc0836690: zpl_readdir 0000000000000000
ffffffffc08366a0: zpl_ioctl zpl_compat_ioctl
ffffffffc08366b0: 0000000000000000 0000000000000000
ffffffffc08366c0: 0000000000000000 0000000000000000
ffffffffc08366d0: zpl_fsync 0000000000000000
As you note the zpl_export_operations ffffffffc08365e0 in the stack of the crashed task 9151
crash> search -t ffffffffc08365e0
PID: 9151 TASK: ffff8807c8bedee0 CPU: 16 COMMAND: "nfsd"
ffff8808085f77a8: ffffffffc08365e0
ffff8808085f7908: ffffffffc08365e0
ffff8808085f7a88: ffffffffc08365e0
ffff8808085f7af8: ffffffffc08365e0
ffff8808085f7b50: ffffffffc08365e0
crash> bt -f | grep fh_verify -A10
#10 [ffff8808085f7d40] fh_verify at ffffffffc03c72b8 [nfsd]
ffff8808085f7d48: ffff880807d82018 ffff880800000006
ffff8808085f7d58: 00000100bada3ac0 ffff8808085f7dd8
ffff8808085f7d68: ffffffffc03ced94 ffff8808085f7e00
ffff8808085f7d78: 00000000af705346 ffff880807d81000
ffff8808085f7d88: 0000000000100000 ffff880807d82000
ffff8808085f7d98: ffff8800bada2000 000000000000001c
ffff8808085f7da8: ffff8808085f7dd8 ffffffffc03d13e5
#11 [ffff8808085f7db0] nfsd3_proc_fsinfo at ffffffffc03d13e5 [nfsd]
r14: svc_rqst ffff8800bada2000
r13: svc_fh ffff880807d82000
=== The ZFS file in context of the crash was
crash> files -d ffff8807f48a5800
DENTRY INODE SUPERBLK TYPE PATH
ffff8807f48a5800 ffff880433360a00 ffff880819935800 DIR /hubvol/Staging/Hub/Dynamic/Apps/ZA/.zfs/snapshot
crash> p_dentry ffff8807f48a5800
(struct dentry *)0xffff8807f48a5800
/.zfs/snapshot
.d_flags = 0x01000084 - negative dentry - OP_REVALIDATE | RCUACCESS (undecoded bits: 0x0000000001000000)
.d_parent - (struct dentry *)0xffff8807f48a46c0
/.zfs
.d_flags = 0x01000084 - negative dentry - OP_REVALIDATE | RCUACCESS (undecoded bits: 0x0000000001000000)
.d_parent - (struct dentry *)0xffff88081e189800
.d_inode - (struct inode *)0xffff8807f258a6a8
.d_sb - (struct super_block *)0xffff880819935800 zfs
.d_inode - (struct inode *)0xffff880433360a00
.i_mode - 40777, .i_size - 0, .i_uid - 0, .i_uid - 0
.i_state = (nil) -
.i_flags = 0x00000000 -
.i_sb - (struct super_block *)0xffff880819935800 zfs
=== ZFS Mount details:
crash> mount | grep "/hubvol/Staging/Hub/Dynamic/Apps/ZA"
ffff8808172b3a80 ffff880819935800 zfs hubvol/Staging/ZA /hubvol/Staging/Hub/Dynamic/Apps/ZA
== ZFS module version details:
crash> module.version,name,srcversion ffffffffc0873220
version = 0xffff880819e01330 "0.7.7-1"
name = "zfs
=== Tainted modules on this system:
crash> sys -t
TAINTED_MASK: 3001 POE
crash> mod -t
NAME TAINTS
spl OE
zcommon POE
zavl POE
znvpair POE
icp POE
zunicode POE
zfs POE
The text was updated successfully, but these errors were encountered: