You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempting to access to a snapshot over NFS fails (stale file handle error); deleting the snapshot fails at this point and blocks usage of the zfs tools. The filesystem itself is still alive and well, fulfilling requests from NFS and locally, but any attempt to issue a zfs commands fails (hangs).
On systems with snapshots being created/deleted, like many with automated frequent/hourly/... snapshots, and remote NFS users, this means a remote user can wedge the server's ZFS management interfaces (for any purpose, not just on the particular dataset) just by listing the contents of a snapshot that is later scheduled for deletion.
I initially (June) ran into this with automated snapshot expiration and (attempted) deletion, where I directly observed the issue due to zfs sends no longer working; I didn't connect the dots between the failed NFS access, later snapshot deletion, and subsequent wedging of the server's zfs commands until Michel's [bug report].(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266236).
Reproducing
Mount NFS-exported zfs filesystem on client.
Try to enter a snapshot (.zfs/snapshot/foo) directory on client -> Stale file handle error.
Additional steps I checked at this point to see if it illuminated anything; not required to reproduce:
a) Unmount on client; stop nfsd on server
b) mount -v on server shows the requested snapshot as mounted
c) Try explicit unmount of the snapshot path -> umount hangs (but the snapshot path no longer shows up in mount -v)
At this point no zfs or zpool commands succeed. (Or at least, none that I tried; all hang.)
Restart required to unwedge.
Edit: This system had been running (prior to 13.1 upgrade) on 12.1 (and earlier) with these actions (user NFS snapshot access, which is very useful for users to be able to recover files, snapshot rotations; etc.) all working beautifully for years.
Additional context
I initially experienced this on a custom kernel, but have reproduced with GENERIC; users on irc have reproduced on CURRENT. Reported by multiple other users as well on the FreeBSD bug report.
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check. That appears to be Linux-specific:
zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
snapshot directory is mounted. On FreeBSD it fails, making snapshot
dirs inaccessible via NFS.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88 ("FreeBSD: vfsops: use setgen for error case")
Closesopenzfs#14001Closesopenzfs#13974
ghost
pushed a commit
to truenas/zfs
that referenced
this issue
Oct 21, 2022
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check. That appears to be Linux-specific:
zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
snapshot directory is mounted. On FreeBSD it fails, making snapshot
dirs inaccessible via NFS.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88 ("FreeBSD: vfsops: use setgen for error case")
Closesopenzfs#14001Closesopenzfs#13974
(cherry picked from commit ed566bf)
- Add a zfs_exit() call in an error path, otherwise a lock is leaked.
- Remove the fid_gen > 1 check. That appears to be Linux-specific:
zfsctl_snapdir_fid() sets fid_gen to 0 or 1 depending on whether the
snapshot directory is mounted. On FreeBSD it fails, making snapshot
dirs inaccessible via NFS.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Andriy Gapon <avg@FreeBSD.org>
Signed-off-by: Mark Johnston <markj@FreeBSD.org>
Fixes: 43dbf88 ("FreeBSD: vfsops: use setgen for error case")
Closes#14001Closes#13974
(cherry picked from commit ed566bf)
System information
Attempting to access to a snapshot over NFS fails (stale file handle error); deleting the snapshot fails at this point and blocks usage of the zfs tools. The filesystem itself is still alive and well, fulfilling requests from NFS and locally, but any attempt to issue a zfs commands fails (hangs).
On systems with snapshots being created/deleted, like many with automated frequent/hourly/... snapshots, and remote NFS users, this means a remote user can wedge the server's ZFS management interfaces (for any purpose, not just on the particular dataset) just by listing the contents of a snapshot that is later scheduled for deletion.
I initially (June) ran into this with automated snapshot expiration and (attempted) deletion, where I directly observed the issue due to zfs sends no longer working; I didn't connect the dots between the failed NFS access, later snapshot deletion, and subsequent wedging of the server's zfs commands until Michel's [bug report].(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266236).
Reproducing
a) Unmount on client; stop nfsd on server
b) mount -v on server shows the requested snapshot as mounted
c) Try explicit unmount of the snapshot path -> umount hangs (but the snapshot path no longer shows up in mount -v)
At this point no zfs or zpool commands succeed. (Or at least, none that I tried; all hang.)
Restart required to unwedge.
Edit: This system had been running (prior to 13.1 upgrade) on 12.1 (and earlier) with these actions (user NFS snapshot access, which is very useful for users to be able to recover files, snapshot rotations; etc.) all working beautifully for years.
Additional context
I initially experienced this on a custom kernel, but have reproduced with GENERIC; users on irc have reproduced on CURRENT. Reported by multiple other users as well on the FreeBSD bug report.
A suggestion was made on the FreeBSD bugzilla to have an OpenZFS bug report, so here I am.
The text was updated successfully, but these errors were encountered: