Enabling more TCP congestion control #85

arter97 · 2015-02-25T16:37:05Z

Some Linux users prefer other TCP congestion control than the default "cubic".

Please enable other TCP congestion controls.

Patch-set : http://arter97.com/0001-defconfig-enable-more-TCP-congestion-controls.patch

commit aa0ab02 upstream. On the 8xx, the page size is set in the PMD entry and applies to all pages of the page table pointed by the said PMD entry. When an app has some regular pages allocated (e.g. see below) and tries to mmap() a huge page at a hint address covered by the same PMD entry, the kernel accepts the hint allthough the 8xx cannot handle different page sizes in the same PMD entry. 10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc 10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc mmap(0x10080000, 524288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|0x40000, -1, 0) = 0x10080000 This results the app remaining forever in do_page_fault()/hugetlb_fault() and when interrupting that app, we get the following warning: [162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4 [162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W 4.14.6 #85 [162980.035744] task: c67e2c00 task.stack: c668e000 [162980.035783] NIP: c000fe18 LR: c00e1eec CTR: c00f90c0 [162980.035830] REGS: c668fc20 TRAP: 0700 Tainted: G W (4.14.6) [162980.035854] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24044224 XER: 20000000 [162980.036003] [162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000 [162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc [162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48 [162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000 [162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4 [162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150 [162980.036861] Call Trace: [162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable) [162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150 [162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4 [162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8 [162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c [162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc [162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614 [162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244 [162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88 [162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4 [162980.037781] Instruction dump: [162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94 [162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094 [162980.038216] ---[ end trace c0ceeca8e7a5800a ]--- [162980.038754] BUG: non-zero nr_ptes on freeing mm: 1 [162985.363322] BUG: non-zero nr_ptes on freeing mm: -1 In order to fix this, this patch uses the address space "slices" implemented for BOOK3S/64 and enhanced to support PPC32 by the preceding patch. This patch modifies the context.id on the 8xx to be in the range [1:16] instead of [0:15] in order to identify context.id == 0 as not initialised contexts as done on BOOK3S This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is selected for the 8xx Alltough we could in theory have as many slices as PMD entries, the current slices implementation limits the number of low slices to 16. This limitation is not preventing us to fix the initial issue allthough it is suboptimal. It will be cured in a subsequent patch. Fixes: 4b91428 ("powerpc/8xx: Implement support of hugepages") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a349b95 upstream. This patch fixes an issue that the following error is possible to happen when ohci hardware causes an interruption and the system is shutting down at the same time. [ 34.851754] usb 2-1: USB disconnect, device number 2 [ 35.166658] irq 156: nobody cared (try booting with the "irqpoll" option) [ 35.173445] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 5.3.0-rc5 #85 [ 35.179964] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) [ 35.187886] Workqueue: usb_hub_wq hub_event [ 35.192063] Call trace: [ 35.194509] dump_backtrace+0x0/0x150 [ 35.198165] show_stack+0x14/0x20 [ 35.201475] dump_stack+0xa0/0xc4 [ 35.204785] __report_bad_irq+0x34/0xe8 [ 35.208614] note_interrupt+0x2cc/0x318 [ 35.212446] handle_irq_event_percpu+0x5c/0x88 [ 35.216883] handle_irq_event+0x48/0x78 [ 35.220712] handle_fasteoi_irq+0xb4/0x188 [ 35.224802] generic_handle_irq+0x24/0x38 [ 35.228804] __handle_domain_irq+0x5c/0xb0 [ 35.232893] gic_handle_irq+0x58/0xa8 [ 35.236548] el1_irq+0xb8/0x180 [ 35.239681] __do_softirq+0x94/0x23c [ 35.243253] irq_exit+0xd0/0xd8 [ 35.246387] __handle_domain_irq+0x60/0xb0 [ 35.250475] gic_handle_irq+0x58/0xa8 [ 35.254130] el1_irq+0xb8/0x180 [ 35.257268] kernfs_find_ns+0x5c/0x120 [ 35.261010] kernfs_find_and_get_ns+0x3c/0x60 [ 35.265361] sysfs_unmerge_group+0x20/0x68 [ 35.269454] dpm_sysfs_remove+0x2c/0x68 [ 35.273284] device_del+0x80/0x370 [ 35.276683] hid_destroy_device+0x28/0x60 [ 35.280686] usbhid_disconnect+0x4c/0x80 [ 35.284602] usb_unbind_interface+0x6c/0x268 [ 35.288867] device_release_driver_internal+0xe4/0x1b0 [ 35.293998] device_release_driver+0x14/0x20 [ 35.298261] bus_remove_device+0x110/0x128 [ 35.302350] device_del+0x148/0x370 [ 35.305832] usb_disable_device+0x8c/0x1d0 [ 35.309921] usb_disconnect+0xc8/0x2d0 [ 35.313663] hub_event+0x6e0/0x1128 [ 35.317146] process_one_work+0x1e0/0x320 [ 35.321148] worker_thread+0x40/0x450 [ 35.324805] kthread+0x124/0x128 [ 35.328027] ret_from_fork+0x10/0x18 [ 35.331594] handlers: [ 35.333862] [<0000000079300c1d>] usb_hcd_irq [ 35.338126] [<0000000079300c1d>] usb_hcd_irq [ 35.342389] Disabling IRQ #156 ohci_shutdown() disables all the interrupt and rh_state is set to OHCI_RH_HALTED. In other hand, ohci_irq() is possible to enable OHCI_INTR_SF and OHCI_INTR_MIE on ohci_irq(). Note that OHCI_INTR_SF is possible to be set by start_ed_unlink() which is called: ohci_irq() -> process_done_list() -> takeback_td() -> start_ed_unlink() So, ohci_irq() has the following condition, the issue happens by &ohci->regs->intrenable = OHCI_INTR_MIE | OHCI_INTR_SF and ohci->rh_state = OHCI_RH_HALTED: /* interrupt for some other device? */ if (ints == 0 || unlikely(ohci->rh_state == OHCI_RH_HALTED)) return IRQ_NOTMINE; To fix the issue, ohci_shutdown() holds the spin lock while disabling the interruption and changing the rh_state flag to prevent reenable the OHCI_INTR_MIE unexpectedly. Note that io_watchdog_func() also calls the ohci_shutdown() and it already held the spin lock, so that the patch makes a new function as _ohci_shutdown(). This patch is inspired by a Renesas R-Car Gen3 BSP patch from Tho Vu. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Cc: stable <stable@vger.kernel.org> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/1566877910-6020-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a349b95 upstream. This patch fixes an issue that the following error is possible to happen when ohci hardware causes an interruption and the system is shutting down at the same time. [ 34.851754] usb 2-1: USB disconnect, device number 2 [ 35.166658] irq 156: nobody cared (try booting with the "irqpoll" option) [ 35.173445] CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 5.3.0-rc5 #85 [ 35.179964] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) [ 35.187886] Workqueue: usb_hub_wq hub_event [ 35.192063] Call trace: [ 35.194509] dump_backtrace+0x0/0x150 [ 35.198165] show_stack+0x14/0x20 [ 35.201475] dump_stack+0xa0/0xc4 [ 35.204785] __report_bad_irq+0x34/0xe8 [ 35.208614] note_interrupt+0x2cc/0x318 [ 35.212446] handle_irq_event_percpu+0x5c/0x88 [ 35.216883] handle_irq_event+0x48/0x78 [ 35.220712] handle_fasteoi_irq+0xb4/0x188 [ 35.224802] generic_handle_irq+0x24/0x38 [ 35.228804] __handle_domain_irq+0x5c/0xb0 [ 35.232893] gic_handle_irq+0x58/0xa8 [ 35.236548] el1_irq+0xb8/0x180 [ 35.239681] __do_softirq+0x94/0x23c [ 35.243253] irq_exit+0xd0/0xd8 [ 35.246387] __handle_domain_irq+0x60/0xb0 [ 35.250475] gic_handle_irq+0x58/0xa8 [ 35.254130] el1_irq+0xb8/0x180 [ 35.257268] kernfs_find_ns+0x5c/0x120 [ 35.261010] kernfs_find_and_get_ns+0x3c/0x60 [ 35.265361] sysfs_unmerge_group+0x20/0x68 [ 35.269454] dpm_sysfs_remove+0x2c/0x68 [ 35.273284] device_del+0x80/0x370 [ 35.276683] hid_destroy_device+0x28/0x60 [ 35.280686] usbhid_disconnect+0x4c/0x80 [ 35.284602] usb_unbind_interface+0x6c/0x268 [ 35.288867] device_release_driver_internal+0xe4/0x1b0 [ 35.293998] device_release_driver+0x14/0x20 [ 35.298261] bus_remove_device+0x110/0x128 [ 35.302350] device_del+0x148/0x370 [ 35.305832] usb_disable_device+0x8c/0x1d0 [ 35.309921] usb_disconnect+0xc8/0x2d0 [ 35.313663] hub_event+0x6e0/0x1128 [ 35.317146] process_one_work+0x1e0/0x320 [ 35.321148] worker_thread+0x40/0x450 [ 35.324805] kthread+0x124/0x128 [ 35.328027] ret_from_fork+0x10/0x18 [ 35.331594] handlers: [ 35.333862] [<0000000079300c1d>] usb_hcd_irq [ 35.338126] [<0000000079300c1d>] usb_hcd_irq [ 35.342389] Disabling IRQ #156 ohci_shutdown() disables all the interrupt and rh_state is set to OHCI_RH_HALTED. In other hand, ohci_irq() is possible to enable OHCI_INTR_SF and OHCI_INTR_MIE on ohci_irq(). Note that OHCI_INTR_SF is possible to be set by start_ed_unlink() which is called: ohci_irq() -> process_done_list() -> takeback_td() -> start_ed_unlink() So, ohci_irq() has the following condition, the issue happens by &ohci->regs->intrenable = OHCI_INTR_MIE | OHCI_INTR_SF and ohci->rh_state = OHCI_RH_HALTED: /* interrupt for some other device? */ if (ints == 0 || unlikely(ohci->rh_state == OHCI_RH_HALTED)) return IRQ_NOTMINE; To fix the issue, ohci_shutdown() holds the spin lock while disabling the interruption and changing the rh_state flag to prevent reenable the OHCI_INTR_MIE unexpectedly. Note that io_watchdog_func() also calls the ohci_shutdown() and it already held the spin lock, so that the patch makes a new function as _ohci_shutdown(). This patch is inspired by a Renesas R-Car Gen3 BSP patch from Tho Vu. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/1566877910-6020-1-git-send-email-yoshihiro.shimoda.uh@renesas.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.16: - Drop change in io_watchdog_func() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>

commit 7febbcb upstream. The commit 54e53b2 ("tty: serial: 8250: pass IRQ shared flag to UART ports") nicely explained the problem: ---8<---8<--- On some systems IRQ lines between multiple UARTs might be shared. If so, the irqflags have to be configured accordingly. The reason is: The 8250 port startup code performs IRQ tests *before* the IRQ handler for that particular port is registered. This is performed in serial8250_do_startup(). This function checks whether IRQF_SHARED is configured and only then disables the IRQ line while testing. This test is performed upon each open() of the UART device. Imagine two UARTs share the same IRQ line: On is already opened and the IRQ is active. When the second UART is opened, the IRQ line has to be disabled while performing IRQ tests. Otherwise an IRQ might handler might be invoked, but the IRQ itself cannot be handled, because the corresponding handler isn't registered, yet. That's because the 8250 code uses a chain-handler and invokes the corresponding port's IRQ handling routines himself. Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't disabled while performing tests and the kernel correctly detects a spurious IRQ. So, adding this flag to the DT probe solves the issue. Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the IRQF_SHARED flag can be set unconditionally as well. Example stack trace by performing `echo 1 > /dev/ttyS2` on a non-patched system: |irq 85: nobody cared (try booting with the "irqpoll" option) | [...] |handlers: |[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt |Disabling IRQ #85 ---8<---8<--- But unfortunately didn't fix the root cause. Let's try again here by moving IRQ flag assignment from serial_link_irq_chain() to serial8250_do_startup(). This should fix the similar issue reported for 8250_pnp case. Since this change we don't need to have custom solutions in 8250_aspeed_vuart and 8250_of drivers, thus, drop them. Fixes: 1c2f049 ("serial: 8250: add IRQ trigger support") Reported-by: Li RongQing <lirongqing@baidu.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: Vikram Pandita <vikram.pandita@ti.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable <stable@vger.kernel.org> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20200211135559.85960-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7febbcb upstream. The commit 54e53b2 ("tty: serial: 8250: pass IRQ shared flag to UART ports") nicely explained the problem: ---8<---8<--- On some systems IRQ lines between multiple UARTs might be shared. If so, the irqflags have to be configured accordingly. The reason is: The 8250 port startup code performs IRQ tests *before* the IRQ handler for that particular port is registered. This is performed in serial8250_do_startup(). This function checks whether IRQF_SHARED is configured and only then disables the IRQ line while testing. This test is performed upon each open() of the UART device. Imagine two UARTs share the same IRQ line: On is already opened and the IRQ is active. When the second UART is opened, the IRQ line has to be disabled while performing IRQ tests. Otherwise an IRQ might handler might be invoked, but the IRQ itself cannot be handled, because the corresponding handler isn't registered, yet. That's because the 8250 code uses a chain-handler and invokes the corresponding port's IRQ handling routines himself. Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't disabled while performing tests and the kernel correctly detects a spurious IRQ. So, adding this flag to the DT probe solves the issue. Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the IRQF_SHARED flag can be set unconditionally as well. Example stack trace by performing `echo 1 > /dev/ttyS2` on a non-patched system: |irq 85: nobody cared (try booting with the "irqpoll" option) | [...] |handlers: |[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt |Disabling IRQ #85 ---8<---8<--- But unfortunately didn't fix the root cause. Let's try again here by moving IRQ flag assignment from serial_link_irq_chain() to serial8250_do_startup(). This should fix the similar issue reported for 8250_pnp case. Since this change we don't need to have custom solutions in 8250_aspeed_vuart and 8250_of drivers, thus, drop them. Fixes: 1c2f049 ("serial: 8250: add IRQ trigger support") Reported-by: Li RongQing <lirongqing@baidu.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: Vikram Pandita <vikram.pandita@ti.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable <stable@vger.kernel.org> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20200211135559.85960-1-andriy.shevchenko@linux.intel.com [Kurt: Backport to v4.9] Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit 7febbcb ] The commit 54e53b2 ("tty: serial: 8250: pass IRQ shared flag to UART ports") nicely explained the problem: ---8<---8<--- On some systems IRQ lines between multiple UARTs might be shared. If so, the irqflags have to be configured accordingly. The reason is: The 8250 port startup code performs IRQ tests *before* the IRQ handler for that particular port is registered. This is performed in serial8250_do_startup(). This function checks whether IRQF_SHARED is configured and only then disables the IRQ line while testing. This test is performed upon each open() of the UART device. Imagine two UARTs share the same IRQ line: On is already opened and the IRQ is active. When the second UART is opened, the IRQ line has to be disabled while performing IRQ tests. Otherwise an IRQ might handler might be invoked, but the IRQ itself cannot be handled, because the corresponding handler isn't registered, yet. That's because the 8250 code uses a chain-handler and invokes the corresponding port's IRQ handling routines himself. Unfortunately this IRQF_SHARED flag isn't configured for UARTs probed via device tree even if the IRQs are shared. This way, the actual and shared IRQ line isn't disabled while performing tests and the kernel correctly detects a spurious IRQ. So, adding this flag to the DT probe solves the issue. Note: The UPF_SHARE_IRQ flag is configured unconditionally. Therefore, the IRQF_SHARED flag can be set unconditionally as well. Example stack trace by performing `echo 1 > /dev/ttyS2` on a non-patched system: |irq 85: nobody cared (try booting with the "irqpoll" option) | [...] |handlers: |[<ffff0000080fc628>] irq_default_primary_handler threaded [<ffff00000855fbb8>] serial8250_interrupt |Disabling IRQ #85 ---8<---8<--- But unfortunately didn't fix the root cause. Let's try again here by moving IRQ flag assignment from serial_link_irq_chain() to serial8250_do_startup(). This should fix the similar issue reported for 8250_pnp case. Since this change we don't need to have custom solutions in 8250_aspeed_vuart and 8250_of drivers, thus, drop them. Fixes: 1c2f049 ("serial: 8250: add IRQ trigger support") Reported-by: Li RongQing <lirongqing@baidu.com> Cc: Kurt Kanzenbach <kurt@linutronix.de> Cc: Vikram Pandita <vikram.pandita@ti.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable <stable@vger.kernel.org> Acked-by: Kurt Kanzenbach <kurt@linutronix.de> Link: https://lore.kernel.org/r/20200211135559.85960-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 1a14150 ] Reading the dispatch trace log from /sys/kernel/debug/powerpc/dtl/cpu-* results in a BUG() when the config CONFIG_HARDENED_USERCOPY is enabled as shown below. kernel BUG at mm/usercopy.c:102! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth pseries_wdt dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 27 PID: 1815 Comm: python3 Not tainted 6.10.0-rc3 #85 Hardware name: IBM,9040-MRX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_042) hv:phyp pSeries NIP: c0000000005d23d4 LR: c0000000005d23d0 CTR: 00000000006ee6f8 REGS: c000000120c078c0 TRAP: 0700 Not tainted (6.10.0-rc3) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 2828220f XER: 0000000e CFAR: c0000000001fdc80 IRQMASK: 0 [ ... GPRs omitted ... ] NIP [c0000000005d23d4] usercopy_abort+0x78/0xb0 LR [c0000000005d23d0] usercopy_abort+0x74/0xb0 Call Trace: usercopy_abort+0x74/0xb0 (unreliable) __check_heap_object+0xf8/0x120 check_heap_object+0x218/0x240 __check_object_size+0x84/0x1a4 dtl_file_read+0x17c/0x2c4 full_proxy_read+0x8c/0x110 vfs_read+0xdc/0x3a0 ksys_read+0x84/0x144 system_call_exception+0x124/0x330 system_call_vectored_common+0x15c/0x2ec --- interrupt: 3000 at 0x7fff81f3ab34 Commit 6d07d1c ("usercopy: Restrict non-usercopy caches to size 0") requires that only whitelisted areas in slab/slub objects can be copied to userspace when usercopy hardening is enabled using CONFIG_HARDENED_USERCOPY. Dtl contains hypervisor dispatch events which are expected to be read by privileged users. Hence mark this safe for user access. Specify useroffset=0 and usersize=DISPATCH_LOG_BYTES to whitelist the entire object. Co-developed-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Anjali K <anjalik@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240614173844.746818-1-anjalik@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org>

[ Upstream commit 7b0033d ] In my test case, concurrent calls to f2fs shutdown report the following stack trace: Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty #85 Call Trace: <TASK> ? show_regs+0x8b/0xa0 ? __die_body+0x26/0xa0 ? die_addr+0x54/0x90 ? exc_general_protection+0x24b/0x5c0 ? asm_exc_general_protection+0x26/0x30 ? kthread_stop+0x46/0x390 f2fs_stop_gc_thread+0x6c/0x110 f2fs_do_shutdown+0x309/0x3a0 f2fs_ioc_shutdown+0x150/0x1c0 __f2fs_ioctl+0xffd/0x2ac0 f2fs_ioctl+0x76/0xe0 vfs_ioctl+0x23/0x60 __x64_sys_ioctl+0xce/0xf0 x64_sys_call+0x2b1b/0x4540 do_syscall_64+0xa7/0x240 entry_SYSCALL_64_after_hwframe+0x76/0x7e The root cause is a race condition in f2fs_stop_gc_thread() called from different f2fs shutdown paths: [CPU0] [CPU1] ---------------------- ----------------------- f2fs_stop_gc_thread f2fs_stop_gc_thread gc_th = sbi->gc_thread gc_th = sbi->gc_thread kfree(gc_th) sbi->gc_thread = NULL < gc_th != NULL > kthread_stop(gc_th->f2fs_gc_task) //UAF The commit c7f114d ("f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()") attempted to fix this issue by using a read semaphore to prevent races between shutdown and remount threads, but it fails to prevent all race conditions. Fix it by converting to write lock of s_umount in f2fs_do_shutdown(). Fixes: 7950e9a ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

mdrjr closed this as completed May 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling more TCP congestion control #85

Enabling more TCP congestion control #85

arter97 commented Feb 25, 2015

Enabling more TCP congestion control #85

Enabling more TCP congestion control #85

Comments

arter97 commented Feb 25, 2015