Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM] Kernel NULL pointer dereference in arc_shrink #3517

Closed
jameslikeslinux opened this issue Jun 23, 2015 · 11 comments
Closed

[ARM] Kernel NULL pointer dereference in arc_shrink #3517

jameslikeslinux opened this issue Jun 23, 2015 · 11 comments
Labels
Component: Memory Management kernel memory management Type: Architecture Indicates an issue is specific to a single processor architecture

Comments

@jameslikeslinux
Copy link
Contributor

Like in #3516, I am playing around with ZFS on a BeagleBone Black board. Using ZFS/SPL 0.6.4, things were mostly stable; using ZFS/SPL HEAD, the arc_shrink function dies fairly consistently. The easiest way to trigger it is by doing echo 3 > /proc/sys/vm/drop_caches, which always produces:

Jun 15 22:54:37 beaglebone1 kernel: [  214.437309] Unable to handle kernel NULL pointer dereference at virtual address 0000004c
Jun 15 22:54:37 beaglebone1 kernel: [  214.446830] pgd = d382c000
Jun 15 22:54:37 beaglebone1 kernel: [  214.449566] [0000004c] *pgd=93811831, *pte=00000000, *ppte=00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.460692] Internal error: Oops: 17 [#4] PREEMPT SMP ARM
Jun 15 22:54:37 beaglebone1 kernel: [  214.466121] Modules linked in: tun cfg80211 rfkill nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_LOG xt_limit xt_pkttype xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables dm_mod c_can_platform c_can can_dev pruss_remoteproc uio_pdrv_genirq uio zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO)
Jun 15 22:54:37 beaglebone1 kernel: [  214.500404] CPU: 0 PID: 2165 Comm: zsh Tainted: P      D    O 3.14.43-beagleboard-r67 #1
Jun 15 22:54:37 beaglebone1 kernel: [  214.508533] task: d573ee80 ti: d3824000 task.ti: d3824000
Jun 15 22:54:37 beaglebone1 kernel: [  214.514095] PC is at arc_shrink+0x2ac/0x53c [zfs]
Jun 15 22:54:37 beaglebone1 kernel: [  214.518840] LR is at wake_up_bit+0x2c/0x30
Jun 15 22:54:37 beaglebone1 kernel: [  214.522954] pc : [<bf0971c4>]    lr : [<c008b3f8>]    psr: 00000013
Jun 15 22:54:37 beaglebone1 kernel: [  214.522954] sp : d3825da0  ip : df9edb54  fp : d3825dcc
Jun 15 22:54:37 beaglebone1 kernel: [  214.534480] r10: c0c98f40  r9 : bf242d08  r8 : bf262d08
Jun 15 22:54:37 beaglebone1 kernel: [  214.539726] r7 : bf1ff308  r6 : dc6be640  r5 : 00000000  r4 : 00007000
Jun 15 22:54:37 beaglebone1 kernel: [  214.546280] r3 : dc6be740  r2 : 00000000  r1 : dc6be674  r0 : 00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.552835] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Jun 15 22:54:37 beaglebone1 kernel: [  214.559999] Control: 10c5387d  Table: 9382c019  DAC: 00000015
Jun 15 22:54:37 beaglebone1 kernel: [  214.565767] Process zsh (pid: 2165, stack limit = 0xd3824240)
Jun 15 22:54:37 beaglebone1 kernel: [  214.571536] Stack: (0xd3825da0 to 0xd3826000)
Jun 15 22:54:37 beaglebone1 kernel: [  214.575914] 5da0: 00000000 c08ca1b8 0000a933 00000000 bf1ff308 d3825ec4 00000000 d3824010
Jun 15 22:54:37 beaglebone1 kernel: [  214.584129] 5dc0: d3825df4 d3825dd0 bf097340 bf097138 00000000 bf17e414 d3825ec0 00000080
Jun 15 22:54:37 beaglebone1 kernel: [  214.592344] 5de0: 00000080 00015266 d3825e04 d3825df8 bf097470 bf097268 d3825e84 d3825e08
Jun 15 22:54:37 beaglebone1 kernel: [  214.600558] 5e00: c012ccdc bf097460 dfbe1b40 dfa0afa0 dfa0af80 dfa07320 dfa07300 dfa0a0e0
Jun 15 22:54:37 beaglebone1 kernel: [  214.608772] 5e20: dfa0a0c0 c08ca220 d3825e4c c08ca220 d3825e54 c08ca524 00000000 00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.616987] 5e40: c08ca524 000003e8 000003e8 000002c9 0000a933 00000000 00000001 d3825ec0
Jun 15 22:54:37 beaglebone1 kernel: [  214.625202] 5e60: 0000080c 000003e8 000003e8 c0cd2b18 00000000 bf17e414 d3825eb4 d3825e88
Jun 15 22:54:37 beaglebone1 kernel: [  214.633416] 5e80: c012d7cc c012cb54 00000001 c0dd4c64 c0cc4fa0 00000001 00000002 0009f848
Jun 15 22:54:37 beaglebone1 kernel: [  214.641631] 5ea0: d3825f78 00000002 d3825ee4 d3825eb8 c01c6600 c012d708 d3825f78 c08cd1fc
Jun 15 22:54:37 beaglebone1 kernel: [  214.649845] 5ec0: 000000d0 00000080 00000001 00000000 dd825400 dd825400 d3825f24 d3825ee8
Jun 15 22:54:37 beaglebone1 kernel: [  214.658060] 5ee0: c01da9c0 c01c657c d3825f78 d3825ef8 c01712e4 00000002 00000020 0009f848
Jun 15 22:54:37 beaglebone1 kernel: [  214.666275] 5f00: d3825f78 d500bc80 00000002 d3824000 0009f848 00000000 d3825f3c d3825f28
Jun 15 22:54:37 beaglebone1 kernel: [  214.674489] 5f20: c01da9fc c01da900 00000001 d3824000 d3825f74 d3825f40 c016f260 c01da9e4
Jun 15 22:54:37 beaglebone1 kernel: [  214.682703] 5f40: 00000000 c018be04 00000000 00000000 00000000 d500bc80 d500bc80 00000002
Jun 15 22:54:37 beaglebone1 kernel: [  214.690917] 5f60: 0009f848 00000000 d3825fa4 d3825f78 c016f97c c016f1b4 00000002 00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.699131] 5f80: 00000002 0009f848 b6efa698 00000004 c000fb24 d3824000 00000000 d3825fa8
Jun 15 22:54:37 beaglebone1 kernel: [  214.707345] 5fa0: c000f900 c016f93c 00000002 0009f848 00000001 0009f848 00000002 00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.715560] 5fc0: 00000002 0009f848 b6efa698 00000004 00000002 0009f848 00000002 00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.723775] 5fe0: beeb7f2c beeb7c50 b6e385e0 b6e89d6c 60000010 00000001 00000000 00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.731983] Backtrace: 
Jun 15 22:54:37 beaglebone1 kernel: [  214.734520] [<bf09712c>] (arc_shrink [zfs]) from [<bf097340>] (arc_shrink+0x428/0x53c [zfs])
Jun 15 22:54:37 beaglebone1 kernel: [  214.742993]  r9:d3824010 r8:00000000 r7:d3825ec4 r6:bf1ff308 r5:00000000 r4:0000a933
Jun 15 22:54:37 beaglebone1 kernel: [  214.750864] [<bf09725c>] (arc_shrink [zfs]) from [<bf097470>] (arc_shrinker_func_scan_objects+0x1c/0x24 [zfs])
Jun 15 22:54:37 beaglebone1 kernel: [  214.760906]  r9:00015266 r8:00000080 r7:00000080 r6:d3825ec0 r5:bf17e414 r4:00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.768764] [<bf097454>] (arc_shrinker_func_scan_objects [zfs]) from [<c012ccdc>] (shrink_slab_node+0x194/0x360)
Jun 15 22:54:37 beaglebone1 kernel: [  214.778987] [<c012cb48>] (shrink_slab_node) from [<c012d7cc>] (shrink_slab+0xd0/0xfc)
Jun 15 22:54:37 beaglebone1 kernel: [  214.786846]  r10:bf17e414 r9:00000000 r8:c0cd2b18 r7:000003e8 r6:000003e8 r5:0000080c
Jun 15 22:54:37 beaglebone1 kernel: [  214.794743]  r4:d3825ec0
Jun 15 22:54:37 beaglebone1 kernel: [  214.797305] [<c012d6fc>] (shrink_slab) from [<c01c6600>] (drop_caches_sysctl_handler+0x90/0xb0)
Jun 15 22:54:37 beaglebone1 kernel: [  214.806039]  r10:00000002 r9:d3825f78 r8:0009f848 r7:00000002 r6:00000001 r5:c0cc4fa0
Jun 15 22:54:37 beaglebone1 kernel: [  214.813934]  r4:c0dd4c64 r3:00000001
Jun 15 22:54:37 beaglebone1 kernel: [  214.817544] [<c01c6570>] (drop_caches_sysctl_handler) from [<c01da9c0>] (proc_sys_call_handler+0xcc/0xe4)
Jun 15 22:54:37 beaglebone1 kernel: [  214.827148]  r4:dd825400
Jun 15 22:54:37 beaglebone1 kernel: [  214.829699] [<c01da8f4>] (proc_sys_call_handler) from [<c01da9fc>] (proc_sys_write+0x24/0x2c)
Jun 15 22:54:37 beaglebone1 kernel: [  214.838257]  r10:00000000 r9:0009f848 r8:d3824000 r7:00000002 r6:d500bc80 r5:d3825f78
Jun 15 22:54:37 beaglebone1 kernel: [  214.846154]  r4:0009f848
Jun 15 22:54:37 beaglebone1 kernel: [  214.848712] [<c01da9d8>] (proc_sys_write) from [<c016f260>] (vfs_write+0xb8/0x1cc)
Jun 15 22:54:37 beaglebone1 kernel: [  214.856318] [<c016f1a8>] (vfs_write) from [<c016f97c>] (SyS_write+0x4c/0xa0)
Jun 15 22:54:37 beaglebone1 kernel: [  214.863394]  r10:00000000 r9:0009f848 r8:00000002 r7:d500bc80 r6:d500bc80 r5:00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.871290]  r4:00000000
Jun 15 22:54:37 beaglebone1 kernel: [  214.873848] [<c016f930>] (SyS_write) from [<c000f900>] (ret_fast_syscall+0x0/0x30)
Jun 15 22:54:37 beaglebone1 kernel: [  214.881447]  r9:d3824000 r8:c000fb24 r7:00000004 r6:b6efa698 r5:0009f848 r4:00000002
Jun 15 22:54:37 beaglebone1 kernel: [  214.889263] Code: e7995004 e1550003 e1a00005 0a000001 (e595104c) 
Jun 15 22:54:37 beaglebone1 kernel: [  214.910393] ---[ end trace a5b4ded18df58216 ]---

The same bug can be triggered a different way by using very memory intensive apps, like git, which causes the kernel to error out like:

Jun 15 22:52:56 beaglebone1 kernel: [  113.720923] Unable to handle kernel NULL pointer dereference at virtual address 0000004c
Jun 15 22:52:56 beaglebone1 kernel: [  113.729076] pgd = cf7c8000
Jun 15 22:52:57 beaglebone1 kernel: [  113.732775] [0000004c] *pgd=8f7bc831, *pte=00000000, *ppte=00000000
Jun 15 22:52:57 beaglebone1 kernel: [  113.739116] Internal error: Oops: 17 [#2] PREEMPT SMP ARM
Jun 15 22:52:57 beaglebone1 kernel: [  113.744540] Modules linked in: tun cfg80211 rfkill nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables xt_LOG xt_limit xt_pkttype xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_comment xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables dm_mod c_can_platform c_can can_dev pruss_remoteproc uio_pdrv_genirq uio zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO)
Jun 15 22:52:57 beaglebone1 kernel: [  113.778814] CPU: 0 PID: 2244 Comm: git Tainted: P      D    O 3.14.43-beagleboard-r67 #1
Jun 15 22:52:57 beaglebone1 kernel: [  113.786940] task: d1a36f00 ti: cf7c0000 task.ti: cf7c0000
Jun 15 22:52:57 beaglebone1 kernel: [  113.792499] PC is at arc_shrink+0x2ac/0x53c [zfs]
Jun 15 22:52:57 beaglebone1 kernel: [  113.797242] LR is at wake_up_bit+0x2c/0x30
Jun 15 22:52:57 beaglebone1 kernel: [  113.801357] pc : [<bf0971c4>]    lr : [<c008b3f8>]    psr: 00000113
Jun 15 22:52:57 beaglebone1 kernel: [  113.801357] sp : cf7c1ad0  ip : df9edb54  fp : cf7c1afc
Jun 15 22:52:57 beaglebone1 kernel: [  113.812884] r10: c0c98f40  r9 : bf242d08  r8 : bf262d08
Jun 15 22:52:57 beaglebone1 kernel: [  113.818129] r7 : bf1ff308  r6 : dc6be640  r5 : 00000000  r4 : 00007000
Jun 15 22:52:57 beaglebone1 kernel: [  113.824684] r3 : dc6be740  r2 : 00000000  r1 : dc6be674  r0 : 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  113.831240] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Jun 15 22:52:57 beaglebone1 kernel: [  113.838406] Control: 10c5387d  Table: 8f7c8019  DAC: 00000015
Jun 15 22:52:57 beaglebone1 kernel: [  113.844175] Process git (pid: 2244, stack limit = 0xcf7c0240)
Jun 15 22:52:57 beaglebone1 kernel: [  113.849945] Stack: (0xcf7c1ad0 to 0xcf7c2000)
Jun 15 22:52:57 beaglebone1 kernel: [  113.854322] 1ac0:                                     bf01055c c08ca1b8 0000a885 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  113.862537] 1ae0: bf1ff308 cf7c1c6c 00000000 cf7c0000 cf7c1b24 cf7c1b00 bf097340 bf097138
Jun 15 22:52:57 beaglebone1 kernel: [  113.870751] 1b00: 00000000 bf17e414 cf7c1c68 00000080 00000080 00000085 cf7c1b34 cf7c1b28
Jun 15 22:52:57 beaglebone1 kernel: [  113.878966] 1b20: bf097470 bf097268 cf7c1bb4 cf7c1b38 c012ccdc bf097460 cf7c1b5c cf7c1b48
Jun 15 22:52:57 beaglebone1 kernel: [  113.887181] 1b40: c00c92b4 c0164be8 00000000 dd81f200 cf7c1b9c c08ca1b8 cf7c1b7c c08ca220
Jun 15 22:52:57 beaglebone1 kernel: [  113.895395] 1b60: cf7c1b84 c08ca524 00000000 00000000 c08ca524 00000008 0000a581 00000075
Jun 15 22:52:57 beaglebone1 kernel: [  113.903610] 1b80: 0000a885 00000000 00000000 cf7c1c68 00000000 00000008 0000a581 c0cd2b18
Jun 15 22:52:57 beaglebone1 kernel: [  113.911824] 1ba0: 00000000 bf17e414 cf7c1be4 cf7c1bb8 c012d7cc c012cb54 00000000 cf7c1c78
Jun 15 22:52:57 beaglebone1 kernel: [  113.920039] 1bc0: c0d2fa0c 0000a581 00200010 00000000 c0ca4e90 c0dd393c cf7c1c64 cf7c1be8
Jun 15 22:52:57 beaglebone1 kernel: [  113.928254] 1be0: c01301e8 c012d708 cf7c1c1c cf7c1bf8 bf14d9a8 00000000 00000000 dfbe3840
Jun 15 22:52:57 beaglebone1 kernel: [  113.936468] 1c00: c011b8e8 c0482ce4 cf7c0028 cf7c1d74 cf7c1c68 51eb851f 00000000 c0dcb8fc
Jun 15 22:52:57 beaglebone1 kernel: [  113.944681] 1c20: c0d2fa04 00000000 c0d2ee00 00000000 00000000 00000000 00000020 00000001
Jun 15 22:52:57 beaglebone1 kernel: [  113.952895] 1c40: 002084d0 cf7c0028 00000000 c0d2fa00 00000001 002084d0 cf7c1ccc cf7c1c68
Jun 15 22:52:57 beaglebone1 kernel: [  113.961109] 1c60: c013048c c012fdb0 002084d0 00000080 00000001 00000000 00000008 00000008
Jun 15 22:52:57 beaglebone1 kernel: [  113.969323] 1c80: 00000020 00000000 002084d0 00000001 00000001 00000001 00000000 0000000c
Jun 15 22:52:57 beaglebone1 kernel: [  113.977537] 1ca0: 00000000 00000000 cf7c0000 00000000 00000040 c0d2fa00 00000000 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  113.985751] 1cc0: cf7c1da4 cf7c1cd0 c0123c38 c01303a0 00000000 00000040 c0d2ee00 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  113.993965] 1ce0: 00000000 00000000 cf7c1d67 cf7c1d66 cf7c1d70 d1d5bf6c cf7c1d3c cf7c1d08
Jun 15 22:52:57 beaglebone1 kernel: [  114.002179] 1d00: 00000000 00000000 00000010 00000000 00000010 c0d2ee00 00000050 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.010393] 1d20: 00000000 00000000 00000040 00000010 00000040 00000000 c0ca56f8 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.018607] 1d40: 00000001 c0d2fa00 c0d2fa04 00000141 00000000 c0ca4e9c cf7c0010 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.026821] 1d60: dfac7ec0 00000000 c0d2ee00 c0d2ee00 c011ba68 00000000 00000000 d5715040
Jun 15 22:52:57 beaglebone1 kernel: [  114.035037] 1d80: cf7cab20 c0ca5970 000000a8 ac800000 d18c8b58 cf7c8000 cf7c1dc4 cf7c1da8
Jun 15 22:52:57 beaglebone1 kernel: [  114.043252] 1da0: c0142d84 c012359c cf7c0008 cf7c0000 c0ca5970 000000a8 cf7c1e3c cf7c1dc8
Jun 15 22:52:57 beaglebone1 kernel: [  114.051466] 1dc0: c0146100 c0142d5c 000000a8 b4497000 d383e160 cf7c8000 cf7c1e5c cf7c1de8
Jun 15 22:52:57 beaglebone1 kernel: [  114.059680] 1de0: 00000564 c08ca220 cf7c1e0c c08ca524 cf7c1e0c cf7cab20 c08ca524 00000080
Jun 15 22:52:57 beaglebone1 kernel: [  114.067894] 1e00: cf7c1e2c d5715040 c009407c c0148a7c 00000000 cf7c1fb0 cf7c0000 00000005
Jun 15 22:52:57 beaglebone1 kernel: [  114.076108] 1e20: ac8005c2 d5715040 d1a36f00 d5715078 cf7c1edc cf7c1e40 c08ccd40 c0145760
Jun 15 22:52:57 beaglebone1 kernel: [  114.084323] 1e40: cf7c1e5c cf7c1e50 c0092eec c00940e0 cf7c1efc cf7c1e60 c08cccb8 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.092537] 1e60: cf7c1e7c 000000a8 c0092f08 c0094140 cf7c1edc cf7c1e80 00000000 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.100751] 1e80: 00000200 000000a8 00000001 00000000 00001162 01162000 00000000 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.108966] 1ea0: 01209000 00000000 d5715040 00000000 00000021 00000005 c08cceb8 c0caa928
Jun 15 22:52:57 beaglebone1 kernel: [  114.117180] 1ec0: ac8005c2 cf7c1fb0 00000000 011633d0 cf7c1efc cf7c1ee0 c08ccf74 c08cca94
Jun 15 22:52:57 beaglebone1 kernel: [  114.125395] 1ee0: 00000005 c08cceb8 c0caa928 ac8005c2 cf7c1fac cf7c1f00 c00083d4 c08ccec4
Jun 15 22:52:57 beaglebone1 kernel: [  114.133609] 1f00: 011e8000 c08ca220 cf7c1f2c c08ca524 cf7c1f2c cf7c1f20 c08ca524 c08cd1fc
Jun 15 22:52:57 beaglebone1 kernel: [  114.141824] 1f20: cf7c1f6c cf7c1f30 c0094244 c08ca4f8 00000000 40000013 d571507c d5715078
Jun 15 22:52:57 beaglebone1 kernel: [  114.150037] 1f40: d5715040 d5715040 01209000 d5715078 011e8000 01209000 cf7c0000 00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.158252] 1f60: cf7c1f7c cf7c1f70 c0092f08 c0094140 cf7c1fa4 cf7c1f80 c014ab68 c0092efc
Jun 15 22:52:57 beaglebone1 kernel: [  114.166466] 1f80: b6ee2000 011e8000 00021000 000f325c 60000010 ffffffff 00000000 00000040
Jun 15 22:52:57 beaglebone1 kernel: [  114.174681] 1fa0: 00000000 cf7c1fb0 c08cb1b8 c0008398 ac8005c2 01828a3e bec2c90c bec2c96c
Jun 15 22:52:57 beaglebone1 kernel: [  114.182894] 1fc0: bec2c96c 00000000 bec2c96c 00000000 00000040 00000000 011633d0 00000001
Jun 15 22:52:57 beaglebone1 kernel: [  114.191108] 1fe0: 00000000 bec2c8f0 000f36c0 000f325c 60000010 ffffffff bf1f45ef 7bf25c77
Jun 15 22:52:57 beaglebone1 kernel: [  114.199317] Backtrace: 
Jun 15 22:52:57 beaglebone1 kernel: [  114.201853] [<bf09712c>] (arc_shrink [zfs]) from [<bf097340>] (arc_shrink+0x428/0x53c [zfs])
Jun 15 22:52:57 beaglebone1 kernel: [  114.210326]  r9:cf7c0000 r8:00000000 r7:cf7c1c6c r6:bf1ff308 r5:00000000 r4:0000a885
Jun 15 22:52:57 beaglebone1 kernel: [  114.218199] [<bf09725c>] (arc_shrink [zfs]) from [<bf097470>] (arc_shrinker_func_scan_objects+0x1c/0x24 [zfs])
Jun 15 22:52:57 beaglebone1 kernel: [  114.228241]  r9:00000085 r8:00000080 r7:00000080 r6:cf7c1c68 r5:bf17e414 r4:00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.236096] [<bf097454>] (arc_shrinker_func_scan_objects [zfs]) from [<c012ccdc>] (shrink_slab_node+0x194/0x360)
Jun 15 22:52:57 beaglebone1 kernel: [  114.246320] [<c012cb48>] (shrink_slab_node) from [<c012d7cc>] (shrink_slab+0xd0/0xfc)
Jun 15 22:52:57 beaglebone1 kernel: [  114.254181]  r10:bf17e414 r9:00000000 r8:c0cd2b18 r7:0000a581 r6:00000008 r5:00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.262078]  r4:cf7c1c68
Jun 15 22:52:57 beaglebone1 kernel: [  114.264632] [<c012d6fc>] (shrink_slab) from [<c01301e8>] (do_try_to_free_pages+0x444/0x5f0)
Jun 15 22:52:57 beaglebone1 kernel: [  114.273016]  r10:c0dd393c r9:c0ca4e90 r8:00000000 r7:00200010 r6:0000a581 r5:c0d2fa0c
Jun 15 22:52:57 beaglebone1 kernel: [  114.280913]  r4:cf7c1c78 r3:00000000
Jun 15 22:52:57 beaglebone1 kernel: [  114.284519] [<c012fda4>] (do_try_to_free_pages) from [<c013048c>] (try_to_free_pages+0xf8/0x1f0)
Jun 15 22:52:57 beaglebone1 kernel: [  114.293339]  r10:002084d0 r9:00000001 r8:c0d2fa00 r7:00000000 r6:cf7c0028 r5:002084d0
Jun 15 22:52:57 beaglebone1 kernel: [  114.301233]  r4:00000001
Jun 15 22:52:57 beaglebone1 kernel: [  114.303797] [<c0130394>] (try_to_free_pages) from [<c0123c38>] (__alloc_pages_nodemask+0x6a8/0xa84)
Jun 15 22:52:57 beaglebone1 kernel: [  114.312879]  r9:00000000 r8:00000000 r7:c0d2fa00 r6:00000040 r5:00000000 r4:cf7c0000
Jun 15 22:52:57 beaglebone1 kernel: [  114.320706] [<c0123590>] (__alloc_pages_nodemask) from [<c0142d84>] (__pte_alloc+0x34/0x154)
Jun 15 22:52:57 beaglebone1 kernel: [  114.329178]  r10:cf7c8000 r9:d18c8b58 r8:ac800000 r7:000000a8 r6:c0ca5970 r5:cf7cab20
Jun 15 22:52:57 beaglebone1 kernel: [  114.337074]  r4:d5715040
Jun 15 22:52:57 beaglebone1 kernel: [  114.339626] [<c0142d50>] (__pte_alloc) from [<c0146100>] (handle_mm_fault+0x9ac/0xa58)
Jun 15 22:52:57 beaglebone1 kernel: [  114.347574]  r7:000000a8 r6:c0ca5970 r5:cf7c0000 r4:cf7c0008
Jun 15 22:52:57 beaglebone1 kernel: [  114.353296] [<c0145754>] (handle_mm_fault) from [<c08ccd40>] (do_page_fault+0x2b8/0x430)
Jun 15 22:52:57 beaglebone1 kernel: [  114.361418]  r10:d5715078 r9:d1a36f00 r8:d5715040 r7:ac8005c2 r6:00000005 r5:cf7c0000
Jun 15 22:52:57 beaglebone1 kernel: [  114.369316]  r4:cf7c1fb0
Jun 15 22:52:57 beaglebone1 kernel: [  114.371869] [<c08cca88>] (do_page_fault) from [<c08ccf74>] (do_translation_fault+0xbc/0xc0)
Jun 15 22:52:57 beaglebone1 kernel: [  114.380253]  r10:011633d0 r9:00000000 r8:cf7c1fb0 r7:ac8005c2 r6:c0caa928 r5:c08cceb8
Jun 15 22:52:57 beaglebone1 kernel: [  114.388150]  r4:00000005
Jun 15 22:52:57 beaglebone1 kernel: [  114.390703] [<c08cceb8>] (do_translation_fault) from [<c00083d4>] (do_DataAbort+0x48/0xa8)
Jun 15 22:52:57 beaglebone1 kernel: [  114.399000]  r7:ac8005c2 r6:c0caa928 r5:c08cceb8 r4:00000005
Jun 15 22:52:57 beaglebone1 kernel: [  114.404714] [<c000838c>] (do_DataAbort) from [<c08cb1b8>] (__dabt_usr+0x38/0x40)
Jun 15 22:52:57 beaglebone1 kernel: [  114.412140] Exception stack(0xcf7c1fb0 to 0xcf7c1ff8)
Jun 15 22:52:57 beaglebone1 kernel: [  114.417213] 1fa0:                                     ac8005c2 01828a3e bec2c90c bec2c96c
Jun 15 22:52:57 beaglebone1 kernel: [  114.425427] 1fc0: bec2c96c 00000000 bec2c96c 00000000 00000040 00000000 011633d0 00000001
Jun 15 22:52:57 beaglebone1 kernel: [  114.433641] 1fe0: 00000000 bec2c8f0 000f36c0 000f325c 60000010 ffffffff
Jun 15 22:52:57 beaglebone1 kernel: [  114.440280]  r8:00000040 r7:00000000 r6:ffffffff r5:60000010 r4:000f325c
Jun 15 22:52:57 beaglebone1 kernel: [  114.447048] Code: e7995004 e1550003 e1a00005 0a000001 (e595104c) 
Jun 15 22:52:57 beaglebone1 kernel: [  114.460749] ---[ end trace a5b4ded18df58214 ]---

Though I should note, the system has plenty of free memory when these bugs occur (100-200 MB free).

As I said, neither of these bugs can be triggered with ZFS 0.6.4. When I get a chance, I will try to do a git bisect and figure out where exactly the bug was introduced unless someone already knows.

The environment is the same as described in #3516.

@dweeezil
Copy link
Contributor

@MrStaticVoid Likely triggered in some way by the recent ARC changes. Could you please run gdb on your kernel module and post the output of list *(arc_shrink+0x2ac).

@jameslikeslinux
Copy link
Contributor Author

(gdb) list *(arc_shrink+0x2ac)
0x91c4 is in arc_kmem_reap_now (/var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/arc.c:2991).
2986    in /var/tmp/portage/sys-fs/zfs-kmod-9999/work/zfs-kmod-9999/module/zfs/../../module/zfs/arc.c

https://github.com/zfsonlinux/zfs/blob/72540ea3148a2bc03860d7d59b2b5fdc9a5cdee7/module/zfs/arc.c#L2986

@dweeezil
Copy link
Contributor

@MrStaticVoid OK, thanks. That's where I was originally looking. I have a feeling it is caused by the large block support committed in f1512ee. It requires many more different block sizes to be supported. Assuming you're not using large blocks, it would be interesting to try:

diff --git a/include/sys/spa.h b/include/sys/spa.h
index 5dc9084..34a4bd2 100644
--- a/include/sys/spa.h
+++ b/include/sys/spa.h
@@ -113,7 +113,7 @@ _NOTE(CONSTCOND) } while (0)
  */
 #define        SPA_MINBLOCKSHIFT       9
 #define        SPA_OLD_MAXBLOCKSHIFT   17
-#define        SPA_MAXBLOCKSHIFT       24
+#define        SPA_MAXBLOCKSHIFT       17
 #define        SPA_MINBLOCKSIZE        (1ULL << SPA_MINBLOCKSHIFT)
 #define        SPA_OLD_MAXBLOCKSIZE    (1ULL << SPA_OLD_MAXBLOCKSHIFT)
 #define        SPA_MAXBLOCKSIZE        (1ULL << SPA_MAXBLOCKSHIFT)

and see what happens.

@dweeezil
Copy link
Contributor

@MrStaticVoid I realized that you'll probably want to either add zfs_max_recordsize=131072 as a module parameter (during modprobe, kernel cmd line arg, etc.) or add:

diff --git a/module/zfs/dsl_dataset.c b/module/zfs/dsl_dataset.c
index dacb667..8dfd5d4 100644
--- a/module/zfs/dsl_dataset.c
+++ b/module/zfs/dsl_dataset.c
@@ -60,7 +60,7 @@
  * and pools with larger blocks can always be imported and used, regardless
  * of this setting.
  */
-int zfs_max_recordsize = 1 * 1024 * 1024;
+int zfs_max_recordsize = 131072;

 #define        SWITCH64(x, y) \
        { \

for consistency.

@jameslikeslinux
Copy link
Contributor Author

I recompiled with SPA_MAXBLOCKSHIFT 17 and set zfs_max_recordsize such that

> cat /sys/module/zfs/parameters/zfs_max_recordsize  
131072

And I am not able to trigger the issue either with the drop_caches or a large git clone as I was able to before.

@dweeezil
Copy link
Contributor

@MrStaticVoid The large block support apparently introduced an incompatibility for 32-bit systems. Someone will need to audit all the relevant code for overflows, array sizes, etc. in the context of a 32-bit system. I'm not likely to have the time to do so in the near future.

@behlendorf
Copy link
Contributor

@MrStaticVoid @dweeezil nice job quickly running this down to the large block patches. This is exactly the reason why the plan for the next tag involves getting both the large block and ABD, #3441, patches merged. The ABD patches should resolve the additional stress of the virtual memory subsystem cause of supporting large blocks. In fact, it should allow us to properly support all 32-bit arches.

@MrStaticVoid it would be great if you could help us shake out and test the ABD patches on ARM. They should be rebased again fairly soon and I'll be going through them carefully myself review and testing the code.

@behlendorf behlendorf added Type: Architecture Indicates an issue is specific to a single processor architecture Component: Memory Management kernel memory management labels Jun 23, 2015
@loli10K
Copy link
Contributor

loli10K commented Nov 1, 2015

I may be hitting the same problem on my ARM board (BananaPi): i just upgraded today from 0.6.3 to 0.6.5, this is the error triggered by echo 3 > /proc/sys/vm/drop_caches

[  138.427152] Unable to handle kernel NULL pointer dereference at virtual address 00000048
[  138.427248] pgd = cfc94000
[  138.427279] [00000048] *pgd=778c3831
[  138.427322] Internal error: Oops: 17 [#1] SMP ARM
[  138.427349] Modules linked in: zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O)
[  138.427453] CPU: 1 PID: 4196 Comm: bash Tainted: P        W  O   3.18.0-rc2-zol #3
[  138.427481] task: cdb66180 ti: ce586000 task.ti: ce586000
[  138.427729] PC is at arc_kmem_reap_now+0x70/0xf8 [zfs]
[  138.427790] LR is at 0xde808974
[  138.427819] pc : [<bf0af3bc>]    lr : [<de808974>]    psr: 00000013
[  138.427819] sp : ce587db8  ip : 00000000  fp : ce587ddc
[  138.427859] r10: 00000000  r9 : ce586020  r8 : bf291184
[  138.427883] r7 : bf298184  r6 : cd253e00  r5 : bf278188  r4 : 00000000
[  138.427908] r3 : cd253d00  r2 : 00000000  r1 : cd253e30  r0 : 00000000
[  138.427934] Flags: nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  138.427961] Control: 10c5387d  Table: 4fc9406a  DAC: 00000015
[  138.427987] Process bash (pid: 4196, stack limit = 0xce586240)
[  138.428011] Stack: (0xce587db8 to 0xce588000)
[  138.428036] 7da0:                                                       00004cc6 00000000
[  138.428070] 7dc0: bf22d700 ce587ed4 bf22e240 ce586020 ce587e0c ce587de0 bf0b670c bf0af358
[  138.428104] 7de0: ce587ed0 00000000 bf1ac818 00000080 00000080 ce587ed0 00009964 c0d54c34
[  138.428138] 7e00: ce587e1c ce587e10 bf0b67f8 bf0b6620 ce587e94 ce587e20 c0108f80 bf0b67e0
[  138.428171] 7e20: dea426a0 dea420c0 de9d7800 de9d7820 dd9b4ab8 dd9b4b08 c0d58380 dd9b4ab8
[  138.428205] 7e40: ce587e6c ce587e50 000003e8 00000000 00000000 c0dd84c0 000003e8 00000000
[  138.428238] 7e60: 00004cc6 00000000 c01a128c ce587ed0 000007c0 c0d78514 000003e8 000003e8
[  138.428271] 7e80: 00000000 bf1ac818 ce587ec4 ce587e98 c01098e8 c0108df0 00000000 c0e2060c
[  138.428304] 7ea0: de00d800 00000002 000b2c08 ce587f80 ce586000 00000000 ce587ef4 ce587ec8
[  138.428340] 7ec0: c01a137c c0109824 ce587f80 c01ad780 000000d0 00000080 00000001 00000000
[  138.428374] 7ee0: c0d6a318 de00d800 ce587f2c ce587ef8 c01ae628 c01a129c ce587f80 c08eb32c
[  138.428407] 7f00: 00000000 00000002 ce587f34 d13be000 00000002 ce586008 000b2c08 ce587f80
[  138.428441] 7f20: ce587f44 ce587f30 c01ae65c c01ae584 00000001 00000000 ce587f7c ce587f48
[  138.428497] 7f40: c014cd1c c01ae644 00000000 d13be000 ce587f7c d13be000 d13be000 000b2c08
[  138.428531] 7f60: 00000002 c000fb24 ce586000 00000000 ce587fa4 ce587f80 c014d340 c014cc78
[  138.428563] 7f80: 00000002 00000000 b6f435e0 00000002 000b2c08 00000004 00000000 ce587fa8
[  138.428597] 7fa0: c000f8a0 c014d2f8 b6f435e0 00000002 00000001 000b2c08 00000002 00000000
[  138.428629] 7fc0: b6f435e0 00000002 000b2c08 00000004 beee8a1c 000ad06c 00000000 001bf978
[  138.428662] 7fe0: 00000002 beee89a0 b6eb1905 b6eeb06c 40000010 00000001 00000000 00000000
[  138.429071] [<bf0af3bc>] (arc_kmem_reap_now [zfs]) from [<bf0b670c>] (__arc_shrinker_func.isra.23+0xf8/0x1c0 [zfs])
[  138.429480] [<bf0b670c>] (__arc_shrinker_func.isra.23 [zfs]) from [<bf0b67f8>] (arc_shrinker_func_scan_objects+0x24/0x2c [zfs])
[  138.429713] [<bf0b67f8>] (arc_shrinker_func_scan_objects [zfs]) from [<c0108f80>] (shrink_slab_node+0x19c/0x30c)
[  138.429769] [<c0108f80>] (shrink_slab_node) from [<c01098e8>] (shrink_slab+0xd0/0xfc)
[  138.429819] [<c01098e8>] (shrink_slab) from [<c01a137c>] (drop_caches_sysctl_handler+0xec/0x154)
[  138.429864] [<c01a137c>] (drop_caches_sysctl_handler) from [<c01ae628>] (proc_sys_call_handler+0xb0/0xc0)
[  138.429907] [<c01ae628>] (proc_sys_call_handler) from [<c01ae65c>] (proc_sys_write+0x24/0x2c)
[  138.429954] [<c01ae65c>] (proc_sys_write) from [<c014cd1c>] (vfs_write+0xb0/0x1e0)
[  138.429993] [<c014cd1c>] (vfs_write) from [<c014d340>] (SyS_write+0x54/0xb0)
[  138.430029] [<c014d340>] (SyS_write) from [<c000f8a0>] (ret_fast_syscall+0x0/0x48)
[  138.430082] Code: e5b54004 e1540003 e1a00004 0a000001 (e5941048) 
[  138.430116] ---[ end trace cb88537fdc8fa202 ]---

Fortunately applying the proposed patch to include/sys/spa.h seems to solve the problem. Thank you @dweeezil.

That being said, i have a couple of other ARM boards (RPi, Odroid) lying around and would like to help test things, but i may need some guidance (i don't even know the meaning of ABD at this point).

@behlendorf
Copy link
Contributor

Can I safely close out this issue? Are things now stable with the latest master source?

@jameslikeslinux
Copy link
Contributor Author

I will give it a try.

@jameslikeslinux
Copy link
Contributor Author

I am unable to trigger this bug anymore and my BeagleBone Black has been rock solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Memory Management kernel memory management Type: Architecture Indicates an issue is specific to a single processor architecture
Projects
None yet
Development

No branches or pull requests

4 participants