Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool remove cache device blocks pool with no device I/O #11635

Closed
stuartthebruce opened this issue Feb 22, 2021 · 7 comments
Closed

zpool remove cache device blocks pool with no device I/O #11635

stuartthebruce opened this issue Feb 22, 2021 · 7 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@stuartthebruce
Copy link

System information

Type Version/Name
Distribution Name Scientific Linux
Distribution Version 7.9
Linux Kernel 3.10.0-1160.15.1.el7.test
Architecture x86_64
ZFS Version 0.8.6
SPL Version 0.8.6

Describe the problem you're observing

Removing a cache device causes the underlying pool to reproducibly lock up for at least a few minutes until Pacemaker fences off the server.

Describe how to reproduce the problem

zpool remove home1 nvme9n1p1

Include any warning/errors/backtraces from the system logs

While waiting for Pacemaker to STONITH, iostat reports no activity to any of the pool devices, and a system reset is required to restore access to the pool. After a reboot all of the original devices are still present and a second attempt at zpool remove results in the same behavior. However, I was able to run "zpool offline", and I have deliberately faulted the 4 cache devices below to prepare them from physical removal from this system.

This has also been seen by others as reported here

[root@cascade2 ~]# zpool status
  pool: home1
 state: ONLINE
status: One or more devices are faulted in response to persistent errors.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
	repaired.
  scan: resilvered 0B in 0 days 03:24:45 with 0 errors on Sun Feb  7 18:49:50 2021
config:

	NAME                      STATE     READ WRITE CKSUM
	home1                     ONLINE       0     0     0
	  raidz3-0                ONLINE       0     0     0
	    35000cca253134c28     ONLINE       0     0     0
	    35000cca253146b40     ONLINE       0     0     0
	    35000cca253155e44     ONLINE       0     0     0
	    35000cca25319f9ac     ONLINE       0     0     0
	    35000cca2531a6ba0     ONLINE       0     0     0
	    35000cca2531b8108     ONLINE       0     0     0
	    35000cca2531bcadc     ONLINE       0     0     0
	    35000cca2531d41f4     ONLINE       0     0     0
	    35000cca2531d46c8     ONLINE       0     0     0
	    35000cca2531d4cac     ONLINE       0     0     0
	    35000cca2531da728     ONLINE       0     0     0
	    35000cca2531da880     ONLINE       0     0     0
	    35000cca2531dad74     ONLINE       0     0     0
	    35000cca2531ff2bc     ONLINE       0     0     0
	    35000cca2531ffff8     ONLINE       0     0     0
	  raidz3-1                ONLINE       0     0     0
	    35000cca253204e9c     ONLINE       0     0     0
	    35000cca253205ffc     ONLINE       0     0     0
	    35000cca2532067e0     ONLINE       0     0     0
	    35000cca253207fdc     ONLINE       0     0     0
	    35000cca253207ff4     ONLINE       0     0     0
	    35000cca2532081b0     ONLINE       0     0     0
	    35000cca25320d79c     ONLINE       0     0     0
	    35000cca25320dad0     ONLINE       0     0     0
	    35000cca25320e460     ONLINE       0     0     0
	    35000cca2532105a8     ONLINE       0     0     0
	    35000cca253217370     ONLINE       0     0     0
	    35000cca2532176f4     ONLINE       0     0     0
	    35000cca2532178d8     ONLINE       0     0     0
	    35000cca25321b168     ONLINE       0     0     0
	    35000cca25321b5f8     ONLINE       0     0     0
	  raidz3-2                ONLINE       0     0     0
	    35000cca25321b774     ONLINE       0     0     0
	    35000cca25321c2e0     ONLINE       0     0     0
	    35000cca25321c61c     ONLINE       0     0     0
	    35000cca25321c804     ONLINE       0     0     0
	    35000cca25321c870     ONLINE       0     0     0
	    35000cca25321c898     ONLINE       0     0     0
	    35000cca25321c910     ONLINE       0     0     0
	    35000cca25321c938     ONLINE       0     0     0
	    35000cca25321ca74     ONLINE       0     0     0
	    35000cca25323980c     ONLINE       0     0     0
	    35000cca253241428     ONLINE       0     0     0
	    35000cca253241574     ONLINE       0     0     0
	    35000cca253246560     ONLINE       0     0     0
	    35000cca2532479a4     ONLINE       0     0     0
	    35000cca253247c68     ONLINE       0     0     0
	  raidz3-3                ONLINE       0     0     0
	    35000cca25324a360     ONLINE       0     0     0
	    35000cca25324b7c0     ONLINE       0     0     0
	    35000cca25324d8e8     ONLINE       0     0     0
	    35000cca25324dc4c     ONLINE       0     0     0
	    35000cca253251828     ONLINE       0     0     0
	    35000cca253256f0c     ONLINE       0     0     0
	    35000cca253257210     ONLINE       0     0     0
	    35000cca2532572e4     ONLINE       0     0     0
	    35000cca2532586ec     ONLINE       0     0     0
	    35000cca25325c5f4     ONLINE       0     0     0
	    35000cca25325c610     ONLINE       0     0     0
	    35000cca25325c76c     ONLINE       0     0     0
	    35000cca25325fb38     ONLINE       0     0     0
	    35000cca25325fb5c     ONLINE       0     0     0
	    35000cca25325fb78     ONLINE       0     0     0
	special
	  mirror-6                ONLINE       0     0     0
	    zfs-64e840b01f4e178c  ONLINE       0     0     0
	    zfs-2901cad643f112c3  ONLINE       0     0     0
	    zfs-8fa1031490ad0ab2  ONLINE       0     0     0
	    zfs-3cf59d1d145ab04b  ONLINE       0     0     0
	  mirror-7                ONLINE       0     0     0
	    zfs-cc86d88f575882ba  ONLINE       0     0     0
	    zfs-039251109fb434af  ONLINE       0     0     0
	    zfs-e7c241bb7dcd4fb8  ONLINE       0     0     0
	    zfs-b13cf73ec91bb5d2  ONLINE       0     0     0
	logs
	  mirror-4                ONLINE       0     0     0
	    zfs-b30c96b1eb59e20f  ONLINE       0     0     0
	    zfs-0f8de84266666364  ONLINE       0     0     0
	  mirror-5                ONLINE       0     0     0
	    zfs-72606fae4de92cf2  ONLINE       0     0     0
	    zfs-0978488aae2a7c56  ONLINE       0     0     0
	cache
	  nvme4n1p1               ONLINE       0     0     0
	  nvme5n1p1               ONLINE       0     0     0
	  nvme9n1p1               FAULTED      0     0     0  external device fault
	  nvme10n1p1              FAULTED      0     0     0  external device fault
	  nvme11n1p1              FAULTED      0     0     0  external device fault
	  nvme12n1p1              FAULTED      0     0     0  external device fault

errors: No known data errors

Note, this is a fairly large pool and the 4 devices to be removed are the same model Intel 7.6TB DC P4610 NVMe as the 2 remaining cache devices

[root@cascade2 ~]# zpool list -v
NAME                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
home1                      669T   487T   182T        -         -     9%    72%  1.00x    ONLINE  -
  raidz3                   164T   119T  44.5T        -         -     9%  72.8%      -  ONLINE
    35000cca253134c28         -      -      -        -         -      -      -      -  ONLINE
    35000cca253146b40         -      -      -        -         -      -      -      -  ONLINE
    35000cca253155e44         -      -      -        -         -      -      -      -  ONLINE
    35000cca25319f9ac         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531a6ba0         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531b8108         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531bcadc         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531d41f4         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531d46c8         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531d4cac         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531da728         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531da880         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531dad74         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531ff2bc         -      -      -        -         -      -      -      -  ONLINE
    35000cca2531ffff8         -      -      -        -         -      -      -      -  ONLINE
  raidz3                   164T   119T  44.5T        -         -     9%  72.8%      -  ONLINE
    35000cca253204e9c         -      -      -        -         -      -      -      -  ONLINE
    35000cca253205ffc         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532067e0         -      -      -        -         -      -      -      -  ONLINE
    35000cca253207fdc         -      -      -        -         -      -      -      -  ONLINE
    35000cca253207ff4         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532081b0         -      -      -        -         -      -      -      -  ONLINE
    35000cca25320d79c         -      -      -        -         -      -      -      -  ONLINE
    35000cca25320dad0         -      -      -        -         -      -      -      -  ONLINE
    35000cca25320e460         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532105a8         -      -      -        -         -      -      -      -  ONLINE
    35000cca253217370         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532176f4         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532178d8         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321b168         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321b5f8         -      -      -        -         -      -      -      -  ONLINE
  raidz3                   164T   119T  44.5T        -         -     9%  72.8%      -  ONLINE
    35000cca25321b774         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c2e0         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c61c         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c804         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c870         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c898         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c910         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321c938         -      -      -        -         -      -      -      -  ONLINE
    35000cca25321ca74         -      -      -        -         -      -      -      -  ONLINE
    35000cca25323980c         -      -      -        -         -      -      -      -  ONLINE
    35000cca253241428         -      -      -        -         -      -      -      -  ONLINE
    35000cca253241574         -      -      -        -         -      -      -      -  ONLINE
    35000cca253246560         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532479a4         -      -      -        -         -      -      -      -  ONLINE
    35000cca253247c68         -      -      -        -         -      -      -      -  ONLINE
  raidz3                   164T   119T  44.5T        -         -     9%  72.8%      -  ONLINE
    35000cca25324a360         -      -      -        -         -      -      -      -  ONLINE
    35000cca25324b7c0         -      -      -        -         -      -      -      -  ONLINE
    35000cca25324d8e8         -      -      -        -         -      -      -      -  ONLINE
    35000cca25324dc4c         -      -      -        -         -      -      -      -  ONLINE
    35000cca253251828         -      -      -        -         -      -      -      -  ONLINE
    35000cca253256f0c         -      -      -        -         -      -      -      -  ONLINE
    35000cca253257210         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532572e4         -      -      -        -         -      -      -      -  ONLINE
    35000cca2532586ec         -      -      -        -         -      -      -      -  ONLINE
    35000cca25325c5f4         -      -      -        -         -      -      -      -  ONLINE
    35000cca25325c610         -      -      -        -         -      -      -      -  ONLINE
    35000cca25325c76c         -      -      -        -         -      -      -      -  ONLINE
    35000cca25325fb38         -      -      -        -         -      -      -      -  ONLINE
    35000cca25325fb5c         -      -      -        -         -      -      -      -  ONLINE
    35000cca25325fb78         -      -      -        -         -      -      -      -  ONLINE
special                       -      -      -        -         -      -      -      -  -
  mirror                  6.98T  5.10T  1.88T        -         -    75%  73.0%      -  ONLINE
    zfs-64e840b01f4e178c      -      -      -        -         -      -      -      -  ONLINE
    zfs-2901cad643f112c3      -      -      -        -         -      -      -      -  ONLINE
    zfs-8fa1031490ad0ab2      -      -      -        -         -      -      -      -  ONLINE
    zfs-3cf59d1d145ab04b      -      -      -        -         -      -      -      -  ONLINE
  mirror                  7.27T  5.38T  1.88T        -         -    75%  74.1%      -  ONLINE
    zfs-cc86d88f575882ba      -      -      -        -         -      -      -      -  ONLINE
    zfs-039251109fb434af      -      -      -        -         -      -      -      -  ONLINE
    zfs-e7c241bb7dcd4fb8      -      -      -        -         -      -      -      -  ONLINE
    zfs-b13cf73ec91bb5d2      -      -      -        -         -      -      -      -  ONLINE
logs                          -      -      -        -         -      -      -      -  -
  mirror                   348G  19.3M   348G        -         -     0%  0.00%      -  ONLINE
    zfs-b30c96b1eb59e20f      -      -      -        -         -      -      -      -  ONLINE
    zfs-0f8de84266666364      -      -      -        -         -      -      -      -  ONLINE
  mirror                   348G  17.7M   348G        -         -     0%  0.00%      -  ONLINE
    zfs-72606fae4de92cf2      -      -      -        -         -      -      -      -  ONLINE
    zfs-0978488aae2a7c56      -      -      -        -         -      -      -      -  ONLINE
cache                         -      -      -        -         -      -      -      -  -
  nvme4n1p1               6.99T   246G  6.75T        -         -     0%  3.44%      -  ONLINE
  nvme5n1p1               6.99T   247G  6.75T        -         -     0%  3.44%      -  ONLINE
  nvme9n1p1                   -      -      -        -         -      -      -      -  FAULTED
  nvme10n1p1                  -      -      -        -         -      -      -      -  FAULTED
  nvme11n1p1                  -      -      -        -         -      -      -      -  FAULTED
  nvme12n1p1                  -      -      -        -         -      -      -      -  FAULTED
[root@cascade2 ~]# zpool get all home1
NAME   PROPERTY                       VALUE                          SOURCE
home1  size                           669T                           -
home1  capacity                       72%                            -
home1  altroot                        -                              default
home1  health                         ONLINE                         -
home1  guid                           13906513988257520913           -
home1  version                        -                              default
home1  bootfs                         -                              default
home1  delegation                     on                             default
home1  autoreplace                    off                            default
home1  cachefile                      none                           local
home1  failmode                       wait                           default
home1  listsnapshots                  off                            default
home1  autoexpand                     off                            default
home1  dedupditto                     0                              default
home1  dedupratio                     1.00x                          -
home1  free                           182T                           -
home1  allocated                      487T                           -
home1  readonly                       off                            -
home1  ashift                         0                              default
home1  comment                        -                              default
home1  expandsize                     -                              -
home1  freeing                        0                              -
home1  fragmentation                  9%                             -
home1  leaked                         0                              -
home1  multihost                      on                             local
home1  checkpoint                     -                              -
home1  load_guid                      9767428333831169282            -
home1  autotrim                       on                             local
home1  feature@async_destroy          enabled                        local
home1  feature@empty_bpobj            active                         local
home1  feature@lz4_compress           active                         local
home1  feature@multi_vdev_crash_dump  enabled                        local
home1  feature@spacemap_histogram     active                         local
home1  feature@enabled_txg            active                         local
home1  feature@hole_birth             active                         local
home1  feature@extensible_dataset     active                         local
home1  feature@embedded_data          active                         local
home1  feature@bookmarks              enabled                        local
home1  feature@filesystem_limits      enabled                        local
home1  feature@large_blocks           enabled                        local
home1  feature@large_dnode            enabled                        local
home1  feature@sha512                 enabled                        local
home1  feature@skein                  enabled                        local
home1  feature@edonr                  enabled                        local
home1  feature@userobj_accounting     active                         local
home1  feature@encryption             enabled                        local
home1  feature@project_quota          active                         local
home1  feature@device_removal         enabled                        local
home1  feature@obsolete_counts        enabled                        local
home1  feature@zpool_checkpoint       enabled                        local
home1  feature@spacemap_v2            active                         local
home1  feature@allocation_classes     active                         local
home1  feature@resilver_defer         enabled                        local
home1  feature@bookmark_v2            enabled                        local
@stuartthebruce stuartthebruce added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Feb 22, 2021
@stuartthebruce
Copy link
Author

I have reproduced this problem on another server running zfs 2.0.3. It has been >10 minutes since I ran [root@zfs1 ~]# time zpool remove home2 nvme5n8 on the following pool. This time I was watching a bit closer with iostat and I/O continued for a while, but has now ground to a halt with less 1 read op/sec.

[root@zfs1 ~]# zpool list -v
NAME                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
home2                   660T   177T   482T        -         -     0%    26%  1.00x    ONLINE  -
  raidz3                164T  43.5T   120T        -         -     0%  26.6%      -  ONLINE
    35000cca291272940      -      -      -        -         -      -      -      -  ONLINE
    35000cca291272aa4      -      -      -        -         -      -      -      -  ONLINE
    35000cca29127b180      -      -      -        -         -      -      -      -  ONLINE
    35000cca29127b1dc      -      -      -        -         -      -      -      -  ONLINE
    35000cca29127c258      -      -      -        -         -      -      -      -  ONLINE
    35000cca29127d0fc      -      -      -        -         -      -      -      -  ONLINE
    35000cca29128022c      -      -      -        -         -      -      -      -  ONLINE
    35000cca291280230      -      -      -        -         -      -      -      -  ONLINE
    35000cca291280358      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912803f4      -      -      -        -         -      -      -      -  ONLINE
    35000cca291282824      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912846f0      -      -      -        -         -      -      -      -  ONLINE
    35000cca291284a0c      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912851dc      -      -      -        -         -      -      -      -  ONLINE
    35000cca291285258      -      -      -        -         -      -      -      -  ONLINE
  raidz3                164T  43.6T   120T        -         -     0%  26.6%      -  ONLINE
    35000cca291264b38      -      -      -        -         -      -      -      -  ONLINE
    35000cca291265d04      -      -      -        -         -      -      -      -  ONLINE
    35000cca291269770      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912699a8      -      -      -        -         -      -      -      -  ONLINE
    35000cca29126c6cc      -      -      -        -         -      -      -      -  ONLINE
    35000cca29126c8ac      -      -      -        -         -      -      -      -  ONLINE
    35000cca29126f6cc      -      -      -        -         -      -      -      -  ONLINE
    35000cca291272148      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912723ac      -      -      -        -         -      -      -      -  ONLINE
    35000cca291272538      -      -      -        -         -      -      -      -  ONLINE
    35000cca291272590      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912725e0      -      -      -        -         -      -      -      -  ONLINE
    35000cca291272640      -      -      -        -         -      -      -      -  ONLINE
    35000cca29127265c      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912728d8      -      -      -        -         -      -      -      -  ONLINE
  raidz3                164T  43.2T   121T        -         -     0%  26.4%      -  ONLINE
    35000cca291244390      -      -      -        -         -      -      -      -  ONLINE
    35000cca291245f38      -      -      -        -         -      -      -      -  ONLINE
    35000cca29124b6e8      -      -      -        -         -      -      -      -  ONLINE
    35000cca29124cc7c      -      -      -        -         -      -      -      -  ONLINE
    35000cca29124d5a0      -      -      -        -         -      -      -      -  ONLINE
    35000cca29124def8      -      -      -        -         -      -      -      -  ONLINE
    35000cca29124e12c      -      -      -        -         -      -      -      -  ONLINE
    35000cca291255ad8      -      -      -        -         -      -      -      -  ONLINE
    35000cca291257244      -      -      -        -         -      -      -      -  ONLINE
    35000cca2912580ec      -      -      -        -         -      -      -      -  ONLINE
    35000cca291258568      -      -      -        -         -      -      -      -  ONLINE
    35000cca29125a9a0      -      -      -        -         -      -      -      -  ONLINE
    35000cca29125aca8      -      -      -        -         -      -      -      -  ONLINE
    35000cca291261cc8      -      -      -        -         -      -      -      -  ONLINE
    35000cca291262210      -      -      -        -         -      -      -      -  ONLINE
  raidz3                164T  43.3T   120T        -         -     0%  26.5%      -  ONLINE
    35000cca2910fb21c      -      -      -        -         -      -      -      -  ONLINE
    35000cca2911694b4      -      -      -        -         -      -      -      -  ONLINE
    35000cca29118cf24      -      -      -        -         -      -      -      -  ONLINE
    35000cca2911f6a34      -      -      -        -         -      -      -      -  ONLINE
    35000cca2911fdf60      -      -      -        -         -      -      -      -  ONLINE
    35000cca2911ff4e4      -      -      -        -         -      -      -      -  ONLINE
    35000cca291208da4      -      -      -        -         -      -      -      -  ONLINE
    35000cca29120aaf8      -      -      -        -         -      -      -      -  ONLINE
    35000cca29121f8f4      -      -      -        -         -      -      -      -  ONLINE
    35000cca29122cf3c      -      -      -        -         -      -      -      -  ONLINE
    35000cca291231cf0      -      -      -        -         -      -      -      -  ONLINE
    35000cca29123433c      -      -      -        -         -      -      -      -  ONLINE
    35000cca291238740      -      -      -        -         -      -      -      -  ONLINE
    35000cca29123f2c0      -      -      -        -         -      -      -      -  ONLINE
    35000cca29123f3ec      -      -      -        -         -      -      -      -  ONLINE
special                    -      -      -        -         -      -      -      -  -
  mirror               3.48T  2.89T   608G        -         -    65%  83.0%      -  ONLINE
    nvme0n1                -      -      -        -         -      -      -      -  ONLINE
    nvme1n1                -      -      -        -         -      -      -      -  ONLINE
  system-zfs           1.30T   750G   578G        -         -    53%  56.4%      -  ONLINE
logs                       -      -      -        -         -      -      -      -  -
  system-slog          63.5G    76K  63.5G        -         -     0%  0.00%      -  ONLINE
cache                      -      -      -        -         -      -      -      -  -
  nvme5n7               447G   447G   265M        -         -     0%  99.9%      -  ONLINE
  nvme5n8               447G   447G   264M        -         -     0%  99.9%      -  ONLINE
[root@zfs1 ~]# iostat -xzm 3
...
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.21    0.00    0.22    0.00    0.00   99.56

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sdn              0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdaz             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   2.00   0.07
sdam             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdbg             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    1.00    0.00   0.00     0.00     0.00   1.00   0.03
sdo              0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    1.00    0.00   0.00     0.00     0.00   1.00   0.03
sdcn             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    2.00    0.00   0.00     0.00     0.00   2.00   0.07
sdv              0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdab             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdaa             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    1.00    0.00   0.00     0.00     0.00   1.00   0.03
sdd              0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdbr             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdbz             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdbf             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdm              0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdak             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdct             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sds              0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdda             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdde             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdaw             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   2.00   0.07
sdcw             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdcs             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   2.00   0.07
sddj             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    1.00    0.00   0.00     0.00     0.00   1.00   0.03
sdfb             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   2.00   0.07
sdfe             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdeg             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    1.00    0.00   0.00     0.00     0.00   1.00   0.03
sdfd             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sden             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00   11.00    0.00   0.00     0.00     0.00  11.00   0.37
sdep             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdej             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   2.00   0.07
sder             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    1.00    0.00   0.00     0.00     0.00   1.00   0.03
sdfu             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdgg             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdgp             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    8.00    0.00   0.00     0.00     0.00   8.00   0.27
sdhg             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdhh             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdhq             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdhz             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03
sdhu             0.33    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   1.00   0.03

And here is the end of the kernel debug buffer, which is not growing in size,

[root@zfs1 ~]# tail -20 /proc/spl/kstat/zfs/dbgmsg
1614042017   spa_history.c:304:spa_history_log_sync(): txg 9569947 snapshot home2/peter.couvares@autosnap_2021-02-23_01:00:15_hourly (id 126251)
1614042017   spa_history.c:329:spa_history_log_sync(): ioctl snapshot
1614042017   spa_history.c:296:spa_history_log_sync(): command: zfs snapshot home2/peter.couvares@autosnap_2021-02-23_01:00:15_hourly
1614042017   spa_history.c:304:spa_history_log_sync(): txg 9569949 snapshot home2/juan.barayoga@autosnap_2021-02-23_01:00:15_hourly (id 124725)
1614042018   spa_history.c:329:spa_history_log_sync(): ioctl snapshot
1614042018   spa_history.c:296:spa_history_log_sync(): command: zfs snapshot home2/juan.barayoga@autosnap_2021-02-23_01:00:15_hourly
1614042035   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9570067, spa home2, vdev_id 1, ms_id 7790, smp_length 32, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526279084 ms, loading_time 1 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042116   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9570350, spa home2, vdev_id 3, ms_id 7731, smp_length 32, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526360211 ms, loading_time 2 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042216   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9570878, spa home2, vdev_id 2, ms_id 7771, smp_length 1376, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526460319 ms, loading_time 2 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042268   metaslab.c:2526:metaslab_unload(): metaslab_unload: txg 9571069, spa home2, vdev_id 1, ms_id 7763, weight 840000000000001, selected txg 9568384 (600114 ms ago), alloc_txg 9568384, loaded 9098892 ms ago, max_size 8937308160
1614042330   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9571329, spa home2, vdev_id 3, ms_id 7732, smp_length 32, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526574012 ms, loading_time 2 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042338   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9571348, spa home2, vdev_id 2, ms_id 7772, smp_length 864, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526581651 ms, loading_time 1 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042338   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9571351, spa home2, vdev_id 0, ms_id 7809, smp_length 1416, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526582061 ms, loading_time 1 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042360   metaslab.c:2526:metaslab_unload(): metaslab_unload: txg 9571437, spa home2, vdev_id 0, ms_id 7788, weight 840000000000001, selected txg 9568688 (601094 ms ago), alloc_txg 9568688, loaded 9933985 ms ago, max_size 9233711104
1614042420   metaslab.c:3590:metaslab_condense(): condensing: txg 9571737, msp[31] ffff907db5657000, vdev id 5, spa home2, smp size 1562456, segments 84124, forcing condense=FALSE
1614042439   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9571862, spa home2, vdev_id 1, ms_id 7791, smp_length 832, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526683244 ms, loading_time 1 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042476   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9571964, spa home2, vdev_id 1, ms_id 7792, smp_length 672, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526719428 ms, loading_time 1 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042535   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9572183, spa home2, vdev_id 2, ms_id 7773, smp_length 3160, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526779098 ms, loading_time 1 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042624   metaslab.c:2422:metaslab_load_impl(): metaslab_load: txg 9572626, spa home2, vdev_id 0, ms_id 7810, smp_length 896, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 526867851 ms, loading_time 2 ms, ms_max_size 17179869184, max size error 17179869184, old_weight 880000000000001, new_weight 880000000000001
1614042637   metaslab.c:2526:metaslab_unload(): metaslab_unload: txg 9572686, spa home2, vdev_id 1, ms_id 7778, weight 840000000000001, selected txg 9570067 (602227 ms ago), alloc_txg 9570067, loaded 6517920 ms ago, max_size 16188702720
[root@zfs1 ~]# date +%s
1614043262

At this point I can find no indication of forward progress being made and I will likely have to reset this system.

@stuartthebruce
Copy link
Author

Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task dbuf_evict:7746 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_evict      D    0  7746      2 0x80004000
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule_preempt_disabled+0xa/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: __mutex_lock.isra.5+0x2d0/0x4a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __thread_exit+0x20/0x20 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: arc_buf_destroy+0x5f/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_destroy+0x2e/0x3f0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __thread_exit+0x20/0x20 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_evict_one+0xfa/0x120 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_evict_thread+0x122/0x1d0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dbuf_evict_one+0x120/0x120 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: kthread+0x112/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: ret_from_fork+0x22/0x40
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task l2arc_feed:7794 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: l2arc_feed      D    0  7794      2 0x80004000
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule_preempt_disabled+0xa/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: __mutex_lock.isra.5+0x2d0/0x4a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __set_current_blocked+0x3d/0x60
Feb 22 17:13:07 zfs1.ldas.cit kernel: l2arc_feed_thread+0xe1/0x5b0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? l2arc_remove_vdev+0x240/0x240 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __thread_exit+0x20/0x20 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: kthread+0x112/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: ret_from_fork+0x22/0x40
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task txg_sync:17372 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: txg_sync        D    0 17372      2 0x80004000
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? try_to_wake_up+0x31c/0x540
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_txg_history_init_io+0x6c/0x110 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: txg_sync_thread+0x285/0x480 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __switch_to_asm+0x41/0x70
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __thread_exit+0x20/0x20 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: kthread+0x112/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: ret_from_fork+0x22/0x40
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task mmp:17373 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: mmp             D    0 17373      2 0x80004000
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: vdev_count_leaves+0x20/0x50 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: mmp_thread+0x359/0x710 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? mmp_write_uberblock+0x700/0x700 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __thread_exit+0x20/0x20 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: kthread+0x112/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: ret_from_fork+0x22/0x40
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task send_reader:2193009 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: send_reader     D    0 2193009      2 0x80004080
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __switch_to_asm+0x35/0x70
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_blkptr_verify+0x3dc/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kmem_cache_alloc+0x14d/0x1b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: zio_read+0x42/0xc0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? do_dump+0x900/0x900 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: issue_data_read+0x271/0x290 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? find_next_range+0x270/0x270 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: send_reader_thread+0xbc/0x420 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? set_next_entity+0x99/0x1b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? set_next_task_fair+0x30/0xd0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? set_user_nice.part.68+0x10f/0x1b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? find_next_range+0x270/0x270 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __thread_exit+0x20/0x20 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: thread_generic_wrapper+0x6f/0x80 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: kthread+0x112/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kthread_flush_work_fn+0x10/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: ret_from_fork+0x22/0x40
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task rsync:2372520 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: rsync           D    0 2372520 2372018 0x00000080
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_blkptr_verify+0x3dc/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zio_read+0x42/0xc0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? arc_read+0x1200/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: arc_read+0xb9e/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dbuf_rele_and_unlock+0x660/0x660 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read_impl.constprop.29+0x29f/0x6b0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? spl_kmem_cache_alloc+0x11f/0x160 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read+0x1b2/0x520 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dmu_buf_hold+0x56/0x80 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lockdir+0x4e/0xb0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lookup_norm+0x5d/0xd0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? spl_kmem_alloc+0xd5/0x120 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lookup+0x12/0x20 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_dirent_lock+0x550/0x6c0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_create+0x29a/0x910 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __kmalloc_node+0x1d9/0x2b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: zpl_create+0xae/0x180 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: path_openat+0x11f2/0x14f0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? walk_component+0x101/0x2f0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kmem_cache_free+0x100/0x1c0
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_filp_open+0x93/0x100
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __check_object_size+0xa8/0x16b
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_sys_open+0x184/0x220
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_syscall_64+0x5b/0x1a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Feb 22 17:13:07 zfs1.ldas.cit kernel: RIP: 0033:0x7f58e9fc9552
Feb 22 17:13:07 zfs1.ldas.cit kernel: Code: Bad RIP value.
Feb 22 17:13:07 zfs1.ldas.cit kernel: RSP: 002b:00007ffcc4b13f00 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Feb 22 17:13:07 zfs1.ldas.cit kernel: RAX: ffffffffffffffda RBX: 000055cd12c46e08 RCX: 00007f58e9fc9552
Feb 22 17:13:07 zfs1.ldas.cit kernel: RDX: 0000000000000041 RSI: 00007ffcc4b14070 RDI: 00000000ffffff9c
Feb 22 17:13:07 zfs1.ldas.cit kernel: RBP: 00007ffcc4b14070 R08: 0000000000000001 R09: 0000000000000000
Feb 22 17:13:07 zfs1.ldas.cit kernel: R10: 0000000000000180 R11: 0000000000000246 R12: 00000000ffffffff
Feb 22 17:13:07 zfs1.ldas.cit kernel: R13: 00007ffcc4b14070 R14: 0000000000000000 R15: 0000000000156f39
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task rsync:3055672 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: rsync           D    0 3055672 3055665 0x00000080
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __switch_to_asm+0x35/0x70
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_blkptr_verify+0x3dc/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zio_read+0x42/0xc0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? arc_read+0x1200/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: arc_read+0xb9e/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dbuf_rele_and_unlock+0x660/0x660 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read_impl.constprop.29+0x29f/0x6b0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? spl_kmem_cache_alloc+0x11f/0x160 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read+0x1b2/0x520 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dmu_buf_hold+0x56/0x80 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lockdir+0x4e/0xb0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_cursor_retrieve+0x17e/0x2e0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __check_object_size+0xa8/0x16b
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _copy_to_user+0x26/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? filldir64+0xce/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_readdir+0x134/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? filename_lookup.part.64+0xe0/0x170
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? mutex_lock+0xe/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _copy_to_user+0x26/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? cp_new_stat+0x150/0x180
Feb 22 17:13:07 zfs1.ldas.cit kernel: zpl_iterate+0x4c/0x70 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: iterate_dir+0x13c/0x190
Feb 22 17:13:07 zfs1.ldas.cit kernel: ksys_getdents64+0x9c/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? iterate_dir+0x190/0x190
Feb 22 17:13:07 zfs1.ldas.cit kernel: __x64_sys_getdents64+0x16/0x20
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_syscall_64+0x5b/0x1a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Feb 22 17:13:07 zfs1.ldas.cit kernel: RIP: 0033:0x7fd4c75f25ab
Feb 22 17:13:07 zfs1.ldas.cit kernel: Code: Bad RIP value.
Feb 22 17:13:07 zfs1.ldas.cit kernel: RSP: 002b:00007ffc42795ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Feb 22 17:13:07 zfs1.ldas.cit kernel: RAX: ffffffffffffffda RBX: 00005583ac26c330 RCX: 00007fd4c75f25ab
Feb 22 17:13:07 zfs1.ldas.cit kernel: RDX: 0000000000008000 RSI: 00005583ac26c360 RDI: 0000000000000005
Feb 22 17:13:07 zfs1.ldas.cit kernel: RBP: 00005583ac26c360 R08: 00005583ac26c320 R09: 0000000000000003
Feb 22 17:13:07 zfs1.ldas.cit kernel: R10: 0000000000000001 R11: 0000000000000246 R12: ffffffffffffff80
Feb 22 17:13:07 zfs1.ldas.cit kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task rsync:3863037 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: rsync           D    0 3863037 3862679 0x00000080
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_blkptr_verify+0x3dc/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zio_read+0x42/0xc0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? arc_read+0x1200/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: arc_read+0xb9e/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dbuf_rele_and_unlock+0x660/0x660 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read_impl.constprop.29+0x29f/0x6b0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? spl_kmem_cache_alloc+0x11f/0x160 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read+0x1b2/0x520 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dmu_buf_hold+0x56/0x80 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lockdir+0x4e/0xb0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lookup_norm+0x5d/0xd0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? spl_kmem_alloc+0xd5/0x120 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lookup+0x12/0x20 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_dirent_lock+0x550/0x6c0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_create+0x29a/0x910 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __kmalloc_node+0x1d9/0x2b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: zpl_create+0xae/0x180 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: path_openat+0x11f2/0x14f0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? walk_component+0x101/0x2f0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kmem_cache_free+0x100/0x1c0
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_filp_open+0x93/0x100
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __check_object_size+0xa8/0x16b
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_sys_open+0x184/0x220
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_syscall_64+0x5b/0x1a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Feb 22 17:13:07 zfs1.ldas.cit kernel: RIP: 0033:0x7f9bfa7d2552
Feb 22 17:13:07 zfs1.ldas.cit kernel: Code: Bad RIP value.
Feb 22 17:13:07 zfs1.ldas.cit kernel: RSP: 002b:00007ffe115a1110 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Feb 22 17:13:07 zfs1.ldas.cit kernel: RAX: ffffffffffffffda RBX: 000055fb7bfdad20 RCX: 00007f9bfa7d2552
Feb 22 17:13:07 zfs1.ldas.cit kernel: RDX: 0000000000000041 RSI: 00007ffe115a1280 RDI: 00000000ffffff9c
Feb 22 17:13:07 zfs1.ldas.cit kernel: RBP: 00007ffe115a1280 R08: 0000000000000001 R09: 0000000000000000
Feb 22 17:13:07 zfs1.ldas.cit kernel: R10: 0000000000000180 R11: 0000000000000246 R12: 00000000ffffffff
Feb 22 17:13:07 zfs1.ldas.cit kernel: R13: 00007ffe115a1280 R14: 0000000000000000 R15: 000000000004aa13
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task duc:3999381 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: duc             D    0 3999381 3999354 0x00000080
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: cv_wait_common+0xfb/0x130 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? finish_wait+0x80/0x80
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_config_enter+0xed/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_blkptr_verify+0x3dc/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zio_read+0x42/0xc0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? arc_read+0x1200/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: arc_read+0xb9e/0x1200 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dbuf_rele_and_unlock+0x660/0x660 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read_impl.constprop.29+0x29f/0x6b0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? spl_kmem_cache_alloc+0x11f/0x160 [spl]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_read+0x1b2/0x520 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_hold_impl+0x454/0x600 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dbuf_hold+0x2c/0x60 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dmu_buf_hold_noread+0x84/0x100 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: dmu_buf_hold+0x37/0x80 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_lockdir+0x4e/0xb0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zap_cursor_retrieve+0x17e/0x2e0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dmu_prefetch+0xc4/0x1f0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_readdir+0x134/0x440 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dput.part.31+0x29/0x110
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? walk_component+0x12a/0x2f0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? terminate_walk+0x7a/0xe0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? path_lookupat.isra.48+0xa7/0x200
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? _cond_resched+0x15/0x30
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? filename_lookup.part.64+0xe0/0x170
Feb 22 17:13:07 zfs1.ldas.cit kernel: zpl_iterate+0x4c/0x70 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: iterate_dir+0x13c/0x190
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? dput.part.31+0x29/0x110
Feb 22 17:13:07 zfs1.ldas.cit kernel: ksys_getdents64+0x9c/0x130
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? iterate_dir+0x190/0x190
Feb 22 17:13:07 zfs1.ldas.cit kernel: __x64_sys_getdents64+0x16/0x20
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_syscall_64+0x5b/0x1a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Feb 22 17:13:07 zfs1.ldas.cit kernel: RIP: 0033:0x7fa79bdad5ab
Feb 22 17:13:07 zfs1.ldas.cit kernel: Code: Bad RIP value.
Feb 22 17:13:07 zfs1.ldas.cit kernel: RSP: 002b:00007fff4813af88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
Feb 22 17:13:07 zfs1.ldas.cit kernel: RAX: ffffffffffffffda RBX: 000055a7a377df70 RCX: 00007fa79bdad5ab
Feb 22 17:13:07 zfs1.ldas.cit kernel: RDX: 0000000000008000 RSI: 000055a7a377dfa0 RDI: 0000000000000008
Feb 22 17:13:07 zfs1.ldas.cit kernel: RBP: 000055a7a377dfa0 R08: 0000000000000080 R09: 00007fa79c0a6c00
Feb 22 17:13:07 zfs1.ldas.cit kernel: R10: 0000000000000004 R11: 0000000000000246 R12: ffffffffffffff80
Feb 22 17:13:07 zfs1.ldas.cit kernel: R13: 0000000000000002 R14: 000055a7a1fb7130 R15: 000055a7a151ebb0
Feb 22 17:13:07 zfs1.ldas.cit kernel: INFO: task zpool:687005 blocked for more than 120 seconds.
Feb 22 17:13:07 zfs1.ldas.cit kernel:      Tainted: P           OE    --------- -t - 4.18.0-240.10.1.el8_3.x86_64 #1
Feb 22 17:13:07 zfs1.ldas.cit kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 22 17:13:07 zfs1.ldas.cit kernel: zpool           D    0 687005 370172 0x00004080
Feb 22 17:13:07 zfs1.ldas.cit kernel: Call Trace:
Feb 22 17:13:07 zfs1.ldas.cit kernel: __schedule+0x2a6/0x700
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule+0x38/0xa0
Feb 22 17:13:07 zfs1.ldas.cit kernel: schedule_preempt_disabled+0xa/0x10
Feb 22 17:13:07 zfs1.ldas.cit kernel: __mutex_lock.isra.5+0x2d0/0x4a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kmem_cache_free+0x197/0x1c0
Feb 22 17:13:07 zfs1.ldas.cit kernel: l2arc_evict+0x4c1/0x560 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: l2arc_remove_vdev+0x11a/0x240 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_load_l2cache+0x3b1/0x4e0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: spa_vdev_remove+0x621/0x830 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __kmalloc_node+0x1d9/0x2b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfs_ioc_vdev_remove+0x4f/0x90 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? strlcpy+0x2d/0x40
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfsdev_ioctl_common+0x5b5/0x830 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? kmalloc_large_node+0x37/0x60
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? __kmalloc_node+0x201/0x2b0
Feb 22 17:13:07 zfs1.ldas.cit kernel: zfsdev_ioctl+0x4f/0xe0 [zfs]
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_vfs_ioctl+0xa4/0x640
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? handle_mm_fault+0xc2/0x1d0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ? syscall_trace_enter+0x1d3/0x2c0
Feb 22 17:13:07 zfs1.ldas.cit kernel: ksys_ioctl+0x60/0x90
Feb 22 17:13:07 zfs1.ldas.cit kernel: __x64_sys_ioctl+0x16/0x20
Feb 22 17:13:07 zfs1.ldas.cit kernel: do_syscall_64+0x5b/0x1a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
Feb 22 17:13:07 zfs1.ldas.cit kernel: RIP: 0033:0x7f9ddb5d488b
Feb 22 17:13:07 zfs1.ldas.cit kernel: Code: Bad RIP value.
Feb 22 17:13:07 zfs1.ldas.cit kernel: RSP: 002b:00007ffd818ac488 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Feb 22 17:13:07 zfs1.ldas.cit kernel: RAX: ffffffffffffffda RBX: 000055c632be4ae0 RCX: 00007f9ddb5d488b
Feb 22 17:13:07 zfs1.ldas.cit kernel: RDX: 00007ffd818ac4a0 RSI: 0000000000005a0c RDI: 0000000000000003
Feb 22 17:13:07 zfs1.ldas.cit kernel: RBP: 00007ffd818afe90 R08: 0000000000000100 R09: 0000000000000000
Feb 22 17:13:07 zfs1.ldas.cit kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00007ffd818ac4a0
Feb 22 17:13:07 zfs1.ldas.cit kernel: R13: 000055c632be0b20 R14: 00007ffd818afa50 R15: 000055c632c96878

@stuartthebruce
Copy link
Author

We where able to get the original command in this issue zpool remove home1 nvme9n1p1 to work by rebooting with that device physically removed. Note, I am not sure if it was necessary but I also first ran zpool clear home1 nvme9n1p1 to clear the previous manual zpool offline -f. This also worked for removing 3 additional cache devices and the pool now comes up clean.

[root@cascade2 ~]# zpool status
  pool: home1
 state: ONLINE
  scan: resilvered 0B in 0 days 03:24:45 with 0 errors on Sun Feb  7 18:49:50 2021
config:

	NAME                      STATE     READ WRITE CKSUM
	home1                     ONLINE       0     0     0
	  raidz3-0                ONLINE       0     0     0
	    35000cca253134c28     ONLINE       0     0     0
	    35000cca253146b40     ONLINE       0     0     0
	    35000cca253155e44     ONLINE       0     0     0
	    35000cca25319f9ac     ONLINE       0     0     0
	    35000cca2531a6ba0     ONLINE       0     0     0
	    35000cca2531b8108     ONLINE       0     0     0
	    35000cca2531bcadc     ONLINE       0     0     0
	    35000cca2531d41f4     ONLINE       0     0     0
	    35000cca2531d46c8     ONLINE       0     0     0
	    35000cca2531d4cac     ONLINE       0     0     0
	    35000cca2531da728     ONLINE       0     0     0
	    35000cca2531da880     ONLINE       0     0     0
	    35000cca2531dad74     ONLINE       0     0     0
	    35000cca2531ff2bc     ONLINE       0     0     0
	    35000cca2531ffff8     ONLINE       0     0     0
	  raidz3-1                ONLINE       0     0     0
	    35000cca253204e9c     ONLINE       0     0     0
	    35000cca253205ffc     ONLINE       0     0     0
	    35000cca2532067e0     ONLINE       0     0     0
	    35000cca253207fdc     ONLINE       0     0     0
	    35000cca253207ff4     ONLINE       0     0     0
	    35000cca2532081b0     ONLINE       0     0     0
	    35000cca25320d79c     ONLINE       0     0     0
	    35000cca25320dad0     ONLINE       0     0     0
	    35000cca25320e460     ONLINE       0     0     0
	    35000cca2532105a8     ONLINE       0     0     0
	    35000cca253217370     ONLINE       0     0     0
	    35000cca2532176f4     ONLINE       0     0     0
	    35000cca2532178d8     ONLINE       0     0     0
	    35000cca25321b168     ONLINE       0     0     0
	    35000cca25321b5f8     ONLINE       0     0     0
	  raidz3-2                ONLINE       0     0     0
	    35000cca25321b774     ONLINE       0     0     0
	    35000cca25321c2e0     ONLINE       0     0     0
	    35000cca25321c61c     ONLINE       0     0     0
	    35000cca25321c804     ONLINE       0     0     0
	    35000cca25321c870     ONLINE       0     0     0
	    35000cca25321c898     ONLINE       0     0     0
	    35000cca25321c910     ONLINE       0     0     0
	    35000cca25321c938     ONLINE       0     0     0
	    35000cca25321ca74     ONLINE       0     0     0
	    35000cca25323980c     ONLINE       0     0     0
	    35000cca253241428     ONLINE       0     0     0
	    35000cca253241574     ONLINE       0     0     0
	    35000cca253246560     ONLINE       0     0     0
	    35000cca2532479a4     ONLINE       0     0     0
	    35000cca253247c68     ONLINE       0     0     0
	  raidz3-3                ONLINE       0     0     0
	    35000cca25324a360     ONLINE       0     0     0
	    35000cca25324b7c0     ONLINE       0     0     0
	    35000cca25324d8e8     ONLINE       0     0     0
	    35000cca25324dc4c     ONLINE       0     0     0
	    35000cca253251828     ONLINE       0     0     0
	    35000cca253256f0c     ONLINE       0     0     0
	    35000cca253257210     ONLINE       0     0     0
	    35000cca2532572e4     ONLINE       0     0     0
	    35000cca2532586ec     ONLINE       0     0     0
	    35000cca25325c5f4     ONLINE       0     0     0
	    35000cca25325c610     ONLINE       0     0     0
	    35000cca25325c76c     ONLINE       0     0     0
	    35000cca25325fb38     ONLINE       0     0     0
	    35000cca25325fb5c     ONLINE       0     0     0
	    35000cca25325fb78     ONLINE       0     0     0
	special
	  mirror-6                ONLINE       0     0     0
	    zfs-64e840b01f4e178c  ONLINE       0     0     0
	    zfs-2901cad643f112c3  ONLINE       0     0     0
	    zfs-8fa1031490ad0ab2  ONLINE       0     0     0
	    zfs-3cf59d1d145ab04b  ONLINE       0     0     0
	  mirror-7                ONLINE       0     0     0
	    zfs-cc86d88f575882ba  ONLINE       0     0     0
	    zfs-039251109fb434af  ONLINE       0     0     0
	    zfs-e7c241bb7dcd4fb8  ONLINE       0     0     0
	    zfs-b13cf73ec91bb5d2  ONLINE       0     0     0
	logs
	  mirror-4                ONLINE       0     0     0
	    zfs-b30c96b1eb59e20f  ONLINE       0     0     0
	    zfs-0f8de84266666364  ONLINE       0     0     0
	  mirror-5                ONLINE       0     0     0
	    zfs-72606fae4de92cf2  ONLINE       0     0     0
	    zfs-0978488aae2a7c56  ONLINE       0     0     0
	cache
	  nvme4n1p1               ONLINE       0     0     0
	  nvme5n1p1               ONLINE       0     0     0

errors: No known data errors

@AmkG
Copy link

AmkG commented Mar 5, 2021

I think I have a similar issue. Here is another data point:

Mar  5 15:41:32 localhost vmunix: [255684.786561] INFO: task txg_sync:638 blocked for more than 120 seconds.
Mar  5 15:41:32 localhost vmunix: [255684.786829]       Tainted: P           O      5.10.19-gnu #1
Mar  5 15:41:32 localhost vmunix: [255684.787175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  5 15:41:32 localhost vmunix: [255684.787734] task:txg_sync        state:D stack:    0 pid:  638 ppid:     2 flags:0x00004000
Mar  5 15:41:32 localhost vmunix: [255684.787737] Call Trace:
Mar  5 15:41:32 localhost vmunix: [255684.787743]  __schedule+0x3b7/0x830
Mar  5 15:41:32 localhost vmunix: [255684.787745]  schedule+0x40/0xb0
Mar  5 15:41:32 localhost vmunix: [255684.787756]  cv_wait_common+0x11e/0x140 [spl]
Mar  5 15:41:32 localhost vmunix: [255684.787759]  ? wait_woken+0x80/0x80
Mar  5 15:41:32 localhost vmunix: [255684.787764]  __cv_wait+0x15/0x20 [spl]
Mar  5 15:41:32 localhost vmunix: [255684.787860]  spa_config_enter+0xfb/0x110 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.787915]  spa_txg_history_init_io+0x6e/0x110 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.787970]  txg_sync_thread+0x2a9/0x490 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.788026]  ? txg_wait_open+0xe0/0xe0 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.788031]  ? __thread_exit+0x20/0x20 [spl]
Mar  5 15:41:32 localhost vmunix: [255684.788037]  thread_generic_wrapper+0x74/0x90 [spl]
Mar  5 15:41:32 localhost vmunix: [255684.788039]  kthread+0x126/0x140
Mar  5 15:41:32 localhost vmunix: [255684.788040]  ? kthread_park+0x90/0x90
Mar  5 15:41:32 localhost vmunix: [255684.788044]  ret_from_fork+0x22/0x30
Mar  5 15:41:32 localhost vmunix: [255684.793884] INFO: task zpool:28591 blocked for more than 120 seconds.
Mar  5 15:41:32 localhost vmunix: [255684.794128]       Tainted: P           O      5.10.19-gnu #1
Mar  5 15:41:32 localhost vmunix: [255684.794418] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar  5 15:41:32 localhost vmunix: [255684.794953] task:zpool           state:D stack:    0 pid:28591 ppid: 28590 flags:0x00004004
Mar  5 15:41:32 localhost vmunix: [255684.794956] Call Trace:
Mar  5 15:41:32 localhost vmunix: [255684.794959]  __schedule+0x3b7/0x830
Mar  5 15:41:32 localhost vmunix: [255684.794961]  schedule+0x40/0xb0
Mar  5 15:41:32 localhost vmunix: [255684.794963]  schedule_preempt_disabled+0xe/0x10
Mar  5 15:41:32 localhost vmunix: [255684.794966]  __mutex_lock.isra.10+0x277/0x4f0
Mar  5 15:41:32 localhost vmunix: [255684.794970]  __mutex_lock_slowpath+0x13/0x20
Mar  5 15:41:32 localhost vmunix: [255684.794971]  ? __mutex_lock_slowpath+0x13/0x20
Mar  5 15:41:32 localhost vmunix: [255684.794973]  mutex_lock+0x2f/0x40
Mar  5 15:41:32 localhost vmunix: [255684.795012]  l2arc_evict+0x414/0x570 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795053]  l2arc_remove_vdev+0x119/0x210 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795108]  spa_load_l2cache+0x33c/0x4c0 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795114]  ? spl_kmem_alloc+0xc1/0x120 [spl]
Mar  5 15:41:32 localhost vmunix: [255684.795172]  spa_vdev_remove+0x595/0x840 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795228]  zfs_ioc_vdev_remove+0x50/0x90 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795231]  ? strlcpy+0x32/0x50
Mar  5 15:41:32 localhost vmunix: [255684.795291]  zfsdev_ioctl_common+0x405/0x810 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795293]  ? __kmalloc_node+0x28d/0x340
Mar  5 15:41:32 localhost vmunix: [255684.795296]  ? __do_munmap+0x373/0x520
Mar  5 15:41:32 localhost vmunix: [255684.795352]  zfsdev_ioctl+0x54/0xe0 [zfs]
Mar  5 15:41:32 localhost vmunix: [255684.795355]  __x64_sys_ioctl+0x96/0xd0
Mar  5 15:41:32 localhost vmunix: [255684.795356]  do_syscall_64+0x37/0x80
Mar  5 15:41:32 localhost vmunix: [255684.795360]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Mar  5 15:41:32 localhost vmunix: [255684.795362] RIP: 0033:0x7f051266dfd7
Mar  5 15:41:32 localhost vmunix: [255684.795363] RSP: 002b:00007ffd167708e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar  5 15:41:32 localhost vmunix: [255684.795365] RAX: ffffffffffffffda RBX: 0000000000f9b990 RCX: 00007f051266dfd7
Mar  5 15:41:32 localhost vmunix: [255684.795366] RDX: 00007ffd16770d00 RSI: 0000000000005a0c RDI: 0000000000000003
Mar  5 15:41:32 localhost vmunix: [255684.795369] RBP: 00007ffd167742e0 R08: 0000000000000100 R09: 0000000000000000
Mar  5 15:41:32 localhost vmunix: [255684.795370] R10: 00007f0512eb34d7 R11: 0000000000000246 R12: 00007ffd16770d00
Mar  5 15:41:32 localhost vmunix: [255684.795371] R13: 0000000000fb76b8 R14: 00007ffd16770900 R15: 0000000000f979f0

Looking at just the traces it looks like some form of deadlock?

@Gelma
Copy link
Contributor

Gelma commented Mar 24, 2021

Thanks a lot for your work!

Same problem here, on different servers, with different kernel/distros and versions of ZFS.
Often, repeating the remove before start to accessing the pool, get the removing going well.

Last time - on saturday - I had it on a PowerEdge R730xd with Ubuntu 20.10 and git ZFS (commit: 891568c)

No error in dmesg, everything keeps working but the ZFS pool get stuck.

Funny thing:
zpool status before:

    NAME                                            STATE     READ WRITE CKSUM
    bidone                                          ONLINE       0     0     0
      sdb                                           ONLINE       0     0     0
    logs
      wwn-0x6d09466051f2380023eef4d6134ef24d-part3  ONLINE       0     0     0
    cache
      wwn-0x6d09466051f2380023eef4d6134ef24d-part4  ONLINE       0     0     0

and after

    NAME                                            STATE     READ WRITE CKSUM
    bidone                                          ONLINE       0     0     0
      sdb                                           ONLINE       0     0     0
    logs
      wwn-0x6d09466051f2380023eef4d6134ef24d-part3  ONLINE       0     0     0
    cache
      sda4                                          ONLINE       0     0     0

@AmkG
Copy link

AmkG commented Mar 24, 2021

To add to my previous report, I also saw the same phenomenon as @Gelma where the cache device was initially installed via a /dev/disk/by-id/ symlink but after remove command ZFS calls it via the /dev/ filename. This has persisted since then across reboots. In case it's relevant, my cache is a small partition on my SSD boot device, and I do not use a zpool.cache file, I use zpool import -a in my init script.

@stuartthebruce
Copy link
Author

I successfully removed several 100% used cache devices from a zpool under heavy load after upgrading to zfs 2.0.5 without any problem. Many thanks for fixing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants