Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

l2arc_mfuonly_pos: VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (64 <= 63) #16714

Closed
tonyhutter opened this issue Nov 1, 2024 · 2 comments · Fixed by #16782
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@tonyhutter
Copy link
Contributor

System information

Type Version/Name
Distribution Name Almalinux
Distribution Version 9
Kernel Version 5.14
Architecture x86-64
OpenZFS Version master

Describe the problem you're observing

QEMU Almalinux 9 runner reported l2arc_mfuonly_pos as KILLED with this panic:

  [ 3323.831973] ZTS run /usr/share/zfs/zfs-tests/tests/functional/l2arc/l2arc_arcstats_pos
  [ 3353.615697] ZTS run /usr/share/zfs/zfs-tests/tests/functional/l2arc/l2arc_mfuonly_pos
  [ 3367.508166] VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (64 <= 63)
  [ 3367.510359] PANIC at dbuf.c:2324:dbuf_dirty()
  [ 3367.511813] Showing stack for process 195727
  [ 3367.513288] CPU: 1 PID: 195727 Comm: txg_sync Tainted: P           OE     -------  ---  5.14.0-427.40.1.el9_4.x86_64 #1
  [ 3367.516585] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [ 3367.519458] Call Trace:
  [ 3367.520379]  <TASK>
  [ 3367.521181]  dump_stack_lvl+0x34/0x48
  [ 3367.522539]  spl_panic+0xd1/0xe9 [spl]
  [ 3367.524011]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.525672]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.527227]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.528820]  ? dbuf_rele_and_unlock+0x267/0x500 [zfs]
  [ 3367.533862]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.535528]  dbuf_dirty+0xd68/0x12f0 [zfs]
  [ 3367.537248]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.538917]  spa_history_log_sync+0x100/0x5d0 [zfs]
  [ 3367.540816]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.542488]  ? list_head+0x9/0x30 [zfs]
  [ 3367.544072]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.545802]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.547495]  ? cv_destroy_wakeup+0x4a/0x90 [spl]
  [ 3367.549141]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.550850]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.552511]  ? kmem_cache_free+0x15/0x360
  [ 3367.554043]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.555756]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 3367.557446]  dsl_sync_task_sync+0xb5/0x120 [zfs]
  [ 3367.559274]  dsl_pool_sync+0x4ba/0x690 [zfs]
  [ 3367.560973]  spa_sync_iterate_to_convergence+0xd7/0x320 [zfs]
  [ 3367.563611]  spa_sync+0x319/0x920 [zfs]
  [ 3367.565699]  txg_sync_thread+0x27b/0x3d0 [zfs]
  [ 3367.567201]  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
  [ 3367.568956]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
  [ 3367.570910]  thread_generic_wrapper+0x62/0x90 [spl]
  [ 3367.572615]  kthread+0xe0/0x100
  [ 3367.573889]  ? __pfx_kthread+0x10/0x10
  [ 3367.575225]  ret_from_fork+0x2c/0x50
Test: /usr/share/zfs/zfs-tests/tests/functional/l2arc/l2arc_mfuonly_pos (run as root) [10:00] [KILLED]
18:31:46.29 ASSERTION: l2arc_mfuonly does not cache MRU buffers.
18:31:46.29 SUCCESS: set_tunable32 L2ARC_NOPREFETCH 1
18:31:46.30 SUCCESS: set_tunable32 L2ARC_MFUONLY 1
18:31:46.31 SUCCESS: set_tunable32 PREFETCH_DISABLE 1
18:31:46.31 SUCCESS: truncate -s 1120M /var/tmp/testdir/disk.l2arc/b
18:31:46.44 SUCCESS: zpool create -f testpool /var/tmp/testdir/disk.l2arc/a cache /var/tmp/testdir/disk.l2arc/b
18:31:46.53 job: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=psync, iodepth=1
18:31:46.53 ...
18:31:46.53 fio-3.35
18:31:46.53 Starting 4 threads
18:31:46.53 job: Laying out IO file (1 file / 200MiB)
18:31:46.53 job: Laying out IO file (1 file / 200MiB)
18:31:46.53 job: Laying out IO file (1 file / 200MiB)
18:31:46.53 job: Laying out IO file (1 file / 200MiB)
18:31:49.26 
18:31:49.26 job: (groupid=0, jobs=4): err= 0: pid=195631: Wed Oct 30 18:31:49 2024
18:31:49.26   write: IOPS=307, BW=308MiB/s (323MB/s)(800MiB/2600msec); 0 zone resets
18:31:49.26     clat (usec): min=337, max=435441, avg=11256.04, stdev=33683.62
18:31:49.26      lat (usec): min=338, max=435441, avg=11258.68, stdev=33683.34
18:31:49.26     clat percentiles (usec):
18:31:49.26      |  1.00th=[   343],  5.00th=[   379], 10.00th=[   396], 20.00th=[   424],
18:31:49.26      | 30.00th=[   457], 40.00th=[   482], 50.00th=[   529], 60.00th=[   627],
18:31:49.26      | 70.00th=[   873], 80.00th=[  2212], 90.00th=[ 47449], 95.00th=[ 77071],
18:31:49.26      | 99.00th=[127402], 99.50th=[170918], 99.90th=[434111], 99.95th=[434111],
18:31:49.26      | 99.99th=[434111]
18:31:49.26    bw (  KiB/s): min=167598, max=547588, per=100.00%, avg=355918.33, stdev=37485.32, samples=17
18:31:49.26    iops        : min=  160, max=  532, avg=345.07, stdev=36.68, samples=17
18:31:49.26   lat (usec)   : 500=44.62%, 750=22.00%, 1000=4.88%
18:31:49.26   lat (msec)   : 2=8.12%, 4=2.88%, 10=3.12%, 20=1.62%, 50=3.12%
18:31:49.26   lat (msec)   : 100=7.00%, 250=2.25%, 500=0.38%
18:31:49.26   cpu          : usr=1.33%, sys=7.42%, ctx=608, majf=0, minf=0
18:31:49.26   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
18:31:49.26      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
18:31:49.26      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
18:31:49.26      issued rwts: total=0,800,0,0 short=0,0,0,0 dropped=0,0,0,0
18:31:49.26      latency   : target=0, window=0, percentile=100.00%, depth=1
18:31:49.26 
18:31:49.26 Run status group 0 (all jobs):
18:31:49.26   WRITE: bw=308MiB/s (323MB/s), 308MiB/s-308MiB/s (323MB/s-323MB/s), io=800MiB (839MB), run=2600-2600msec
18:31:49.27 SUCCESS: fio /usr/share/zfs/zfs-tests/tests/perf/fio/mkfiles.fio
18:31:49.35 job: (g=0): rw=randread, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1
18:31:49.35 ...
18:31:49.35 fio-3.35
18:31:49.35 Starting 4 threads
18:31:59.51 
18:31:59.51 job: (groupid=0, jobs=4): err= 0: pid=195640: Wed Oct 30 18:31:59 2024
18:31:59.51   read: IOPS=20.0k, BW=2500MiB/s (2621MB/s)(24.4GiB/10001msec)
18:31:59.51     clat (usec): min=13, max=597270, avg=195.23, stdev=2346.76
18:31:59.51      lat (usec): min=13, max=597270, avg=195.57, stdev=2347.39
18:31:59.51     clat percentiles (usec):
18:31:59.51      |  1.00th=[   40],  5.00th=[   48], 10.00th=[   56], 20.00th=[   59],
18:31:59.51      | 30.00th=[   61], 40.00th=[   63], 50.00th=[   64], 60.00th=[   65],
18:31:59.51      | 70.00th=[   67], 80.00th=[  105], 90.00th=[  133], 95.00th=[  161],
18:31:59.51      | 99.00th=[ 2409], 99.50th=[ 7111], 99.90th=[15270], 99.95th=[27395],
18:31:59.51      | 99.99th=[60556]
18:31:59.51    bw (  MiB/s): min=  195, max= 4063, per=100.00%, avg=2502.80, stdev=273.59, samples=76
18:31:59.51    iops        : min= 1565, max=32504, avg=20021.89, stdev=2188.90, samples=76
18:31:59.51   lat (usec)   : 20=0.51%, 50=5.79%, 100=73.51%, 250=17.34%, 500=0.45%
18:31:59.51   lat (usec)   : 750=0.13%, 1000=0.06%
18:31:59.51   lat (msec)   : 2=1.04%, 4=0.31%, 10=0.59%, 20=0.19%, 50=0.05%
18:31:59.51   lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
18:31:59.51   cpu          : usr=1.33%, sys=42.31%, ctx=5993, majf=0, minf=124
18:31:59.51   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
18:31:59.51      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
18:31:59.51      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
18:31:59.51      issued rwts: total=199981,0,0,0 short=0,0,0,0 dropped=0,0,0,0
18:31:59.51      latency   : target=0, window=0, percentile=100.00%, depth=1
18:31:59.51 
18:31:59.51 Run status group 0 (all jobs):
18:31:59.51    READ: bw=2500MiB/s (2621MB/s), 2500MiB/s-2500MiB/s (2621MB/s-2621MB/s), io=24.4GiB (26.2GB), run=10001-10001msec
18:31:59.51 SUCCESS: fio /usr/share/zfs/zfs-tests/tests/perf/fio/random_reads.fio
18:31:59.64 SUCCESS: zpool export testpool
18:32:00.08 SUCCESS: zpool import -N -d /var/tmp/testdir/disk.l2arc testpool
18:32:00.09 SUCCESS: test 0 -eq 0
Test: /usr/share/zfs/zfs-tests/tests/functional/l2arc/l2arc_l2miss_pos (run as root) [10:00] [KILLED]

Describe how to reproduce the problem

Saw this as a test result while testing an unrelated PR
https://github.com/tonyhutter/zfs/actions/runs/11598165017/job/32293419930?pr=3

Include any warning/errors/backtraces from the system logs

@tonyhutter tonyhutter added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 1, 2024
@amotin
Copy link
Member

amotin commented Nov 18, 2024

I think this problem goes from L2ARC rebuild not being stopped by spa_export_common() before setting spa_final_txg, which with some luck may log its completion between that time and spa_l2cache_drop() call by spa_unload().

@amotin
Copy link
Member

amotin commented Nov 19, 2024

I've intercepted the log message that causes it, confirming my guess:

VERIFY3(tx->tx_txg <= spa_final_dirty_txg(spa)) failed (64 <= 63) L2ARC rebuild successful, restored %llu blocks

amotin added a commit to amotin/zfs that referenced this issue Nov 19, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16714
gamanakis pushed a commit to gamanakis/zfs that referenced this issue Nov 20, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16714
gamanakis pushed a commit to gamanakis/zfs that referenced this issue Nov 20, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16714
gamanakis pushed a commit to gamanakis/zfs that referenced this issue Nov 20, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16714
Closes openzfs#16782
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Nov 21, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16714
Closes openzfs#16782
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Nov 21, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16714
Closes openzfs#16782
amotin pushed a commit that referenced this issue Nov 25, 2024
This allowed to debug #16714, fixed in #16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes #16795
ixhamza pushed a commit to truenas/zfs that referenced this issue Dec 2, 2024
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16714
Closes openzfs#16782
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Dec 3, 2024
This allowed to debug openzfs#16714, fixed in openzfs#16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes openzfs#16795
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Jan 26, 2025
Without doing that there is a race window on export when history
log write by completed rebuild dirties transaction beyond final,
triggering assertion.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by: iXsystems, Inc.
Closes openzfs#16714
Closes openzfs#16782
lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Jan 26, 2025
This allowed to debug openzfs#16714, fixed in openzfs#16782.  Without assertions
added here it is difficult to figure out what logs cause the problem,
since the assertion happens in sync thread context.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Pavel Snajdr <snajpa@snajpa.net>
Co-authored-by: Alexander Motin <mav@FreeBSD.org>
Closes openzfs#16795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants