Full DATA_FIN support #58

matttbe · 2020-07-16T14:52:22Z

Work in progress by @mjmartineau

(Feature from the initial roadmap)

mjmartineau · 2020-07-30T15:01:53Z

Merged in net-next.

fast_second_level_miss handler for the TLBTEMP area has an assumption that page table directory entry for the TLBTEMP address range is 0. For it to be true the TLBTEMP area must be aligned to 4MB boundary and not share its 4MB region with anything that may use a page table. This is not true currently: TLBTEMP shares space with vmalloc space which results in the following kinds of runtime errors when fast_second_level_miss loads page table directory entry for the vmalloc space instead of fixing up the TLBTEMP area: Unable to handle kernel paging request at virtual address c7ff0e00 pc = d0009275, ra = 90009478 Oops: sig: 9 [#1] PREEMPT CPU: 1 PID: 61 Comm: kworker/u9:2 Not tainted 5.10.0-rc3-next-20201110-00007-g1fe4962fa983-dirty #58 Workqueue: xprtiod xs_stream_data_receive_workfn a00: 90009478 d11e1dc0 c7ff0e00 00000020 c7ff0000 00000001 7f8b8107 00000000 a08: 900c5992 d11e1d90 d0cc88b8 5506e97c 00000000 5506e97c d06c8074 d11e1d90 pc: d0009275, ps: 00060310, depc: 00000014, excvaddr: c7ff0e00 lbeg: d0009275, lend: d0009287 lcount: 00000003, sar: 00000010 Call Trace: xs_stream_data_receive_workfn+0x43c/0x770 process_one_work+0x1a1/0x324 worker_thread+0x1cc/0x3c0 kthread+0x10d/0x124 ret_from_kernel_thread+0xc/0x18 Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

ksmbd-fixes

The rtla osnoise tool is an interface for the osnoise tracer. The osnoise tracer dispatches a kernel thread per-cpu. These threads read the time in a loop while with preemption, softirqs and IRQs enabled, thus allowing all the sources of osnoise during its execution. The osnoise threads take note of the entry and exit point of any source of interferences, increasing a per-cpu interference counter. The osnoise tracer also saves an interference counter for each source of interference. The rtla osnoise top mode displays information about the periodic summary from the osnoise tracer. One example of rtla osnoise top output is: [root@alien ~]# rtla osnoise top -c 0-3 -d 1m -q -r 900000 -P F:1 Operating System Noise duration: 0 00:01:00 | time is in us CPU Period Runtime Noise % CPU Aval Max Noise Max Single HW NMI IRQ Softirq Thread 0 #58 52200000 1031 99.99802 91 60 0 0 52285 0 101 1 #59 53100000 5 99.99999 5 5 0 9 53122 0 18 2 #59 53100000 7 99.99998 7 7 0 8 53115 0 18 3 #59 53100000 8274 99.98441 277 23 0 9 53778 0 660 "rtla osnoise top --help" works and provide information about the available options. Link: https://lkml.kernel.org/r/0d796993abf587ae5a170bb8415c49368d4999e1.1639158831.git.bristot@kernel.org Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: linux-rt-users@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

GCC12 appears to be much smarter about its dependency tracking and is aware that the relaxed variants are just normal loads and stores and this is causing problems like: [ 210.074549] ------------[ cut here ]------------ [ 210.079223] NETDEV WATCHDOG: enabcm6e4ei0 (bcmgenet): transmit queue 1 timed out [ 210.086717] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:529 dev_watchdog+0x234/0x240 [ 210.095044] Modules linked in: genet(E) nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat] [ 210.146561] ACPI CPPC: PCC check channel failed for ss: 0. ret=-110 [ 210.146927] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G E 5.17.0-rc7G12+ #58 [ 210.153226] CPPC Cpufreq:cppc_scale_freq_workfn: failed to read perf counters [ 210.161349] Hardware name: Raspberry Pi Foundation Raspberry Pi 4 Model B/Raspberry Pi 4 Model B, BIOS EDK2-DEV 02/08/2022 [ 210.161353] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 210.161358] pc : dev_watchdog+0x234/0x240 [ 210.161364] lr : dev_watchdog+0x234/0x240 [ 210.161368] sp : ffff8000080a3a40 [ 210.161370] x29: ffff8000080a3a40 x28: ffffcd425af87000 x27: ffff8000080a3b20 [ 210.205150] x26: ffffcd425aa00000 x25: 0000000000000001 x24: ffffcd425af8ec08 [ 210.212321] x23: 0000000000000100 x22: ffffcd425af87000 x21: ffff55b142688000 [ 210.219491] x20: 0000000000000001 x19: ffff55b1426884c8 x18: ffffffffffffffff [ 210.226661] x17: 64656d6974203120 x16: 0000000000000001 x15: 6d736e617274203a [ 210.233831] x14: 2974656e65676d63 x13: ffffcd4259c300d8 x12: ffffcd425b07d5f0 [ 210.241001] x11: 00000000ffffffff x10: ffffcd425b07d5f0 x9 : ffffcd4258bdad9c [ 210.248171] x8 : 00000000ffffdfff x7 : 000000000000003f x6 : 0000000000000000 [ 210.255341] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000001000 [ 210.262511] x2 : 0000000000001000 x1 : 0000000000000005 x0 : 0000000000000044 [ 210.269682] Call trace: [ 210.272133] dev_watchdog+0x234/0x240 [ 210.275811] call_timer_fn+0x3c/0x15c [ 210.279489] __run_timers.part.0+0x288/0x310 [ 210.283777] run_timer_softirq+0x48/0x80 [ 210.287716] __do_softirq+0x128/0x360 [ 210.291392] __irq_exit_rcu+0x138/0x140 [ 210.295243] irq_exit_rcu+0x1c/0x30 [ 210.298745] el1_interrupt+0x38/0x54 [ 210.302334] el1h_64_irq_handler+0x18/0x24 [ 210.306445] el1h_64_irq+0x7c/0x80 [ 210.309857] arch_cpu_idle+0x18/0x2c [ 210.313445] default_idle_call+0x4c/0x140 [ 210.317470] cpuidle_idle_call+0x14c/0x1a0 [ 210.321584] do_idle+0xb0/0x100 [ 210.324737] cpu_startup_entry+0x30/0x8c [ 210.328675] secondary_start_kernel+0xe4/0x110 [ 210.333138] __secondary_switched+0x94/0x98 The assumption when these were relaxed seems to be that device memory would be mapped non reordering, and that other constructs (spinlocks/etc) would provide the barriers to assure that packet data and in memory rings/queues were ordered with respect to device register reads/writes. This itself seems a bit sketchy, but the real problem with GCC12 is that it is moving the actual reads/writes around at will as though they were independent operations when in truth they are not, but the compiler can't know that. When looking at the assembly dumps for many of these routines its possible to see very clean, but not strictly in program order operations occurring as the compiler would be free to do if these weren't actually register reads/write operations. Its possible to suppress the timeout with a liberal bit of dma_mb()'s sprinkled around but the device still seems unable to reliably send/receive data. A better plan is to use the safer readl/writel everywhere. Since this partially reverts an older commit, which notes the use of the relaxed variants for performance reasons. I would suggest that any performance problems with this commit are targeted at relaxing only the performance critical code paths after assuring proper barriers. Fixes: 69d2ea9 ("net: bcmgenet: Use correct I/O accessors") Reported-by: Peter Robinson <pbrobinson@gmail.com> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Acked-by: Peter Robinson <pbrobinson@gmail.com> Tested-by: Peter Robinson <pbrobinson@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220310045358.224350-1-jeremy.linton@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Since commit 262ca38 ("clk: Stop forwarding clk_rate_requests to the parent"), the clk_rate_request is .. as the title says, not forwarded anymore to the parent: this produces an issue with the MediaTek clock MUX driver during GPU DVFS on MT8195, but not on MT8192 or others. This is because, differently from others, like MT8192 where all of the clocks in the MFG parents tree are of mtk_mux type, but in the parent tree of MT8195's MFG clock, we have one mtk_mux clock and one (clk framework generic) mux clock, like so: names: mfg_bg3d -> mfg_ck_fast_ref -> top_mfg_core_tmp (or) mfgpll types: mtk_gate -> mux -> mtk_mux (or) mtk_pll To solve this issue and also keep the GPU DVFS clocks code working as expected, wire up a .determine_rate() callback for the mtk_mux ops; for that, the standard clk_mux_determine_rate_flags() was used as it was possible to. This commit was successfully tested on MT6795 Xperia M5, MT8173 Elm, MT8192 Spherion and MT8195 Tomato; no regressions were seen. For the sake of some more documentation about this issue here's the trace of it: [ 12.211587] ------------[ cut here ]------------ [ 12.211589] WARNING: CPU: 6 PID: 78 at drivers/clk/clk.c:1462 clk_core_init_rate_req+0x84/0x90 [ 12.211593] Modules linked in: stp crct10dif_ce mtk_adsp_common llc rfkill snd_sof_xtensa_dsp panfrost(+) sbs_battery cros_ec_lid_angle cros_ec_sensors snd_sof_of cros_ec_sensors_core hid_multitouch cros_usbpd_logger snd_sof gpu_sched snd_sof_utils fuse ipv6 [ 12.211614] CPU: 6 PID: 78 Comm: kworker/u16:2 Tainted: G W 6.0.0-next-20221011+ #58 [ 12.211616] Hardware name: Acer Tomato (rev2) board (DT) [ 12.211617] Workqueue: devfreq_wq devfreq_monitor [ 12.211620] pstate: 40400009 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 12.211622] pc : clk_core_init_rate_req+0x84/0x90 [ 12.211625] lr : clk_core_forward_rate_req+0xa4/0xe4 [ 12.211627] sp : ffff80000893b8e0 [ 12.211628] x29: ffff80000893b8e0 x28: ffffdddf92f9b000 x27: ffff46a2c0e8bc05 [ 12.211632] x26: ffff46a2c1041200 x25: 0000000000000000 x24: 00000000173eed80 [ 12.211636] x23: ffff80000893b9c0 x22: ffff80000893b940 x21: 0000000000000000 [ 12.211641] x20: ffff46a2c1039f00 x19: ffff46a2c1039f00 x18: 0000000000000000 [ 12.211645] x17: 0000000000000038 x16: 000000000000d904 x15: 0000000000000003 [ 12.211649] x14: ffffdddf9357ce48 x13: ffffdddf935e71c8 x12: 000000000004803c [ 12.211653] x11: 00000000a867d7ad x10: 00000000a867d7ad x9 : ffffdddf90c28df4 [ 12.211657] x8 : ffffdddf9357a980 x7 : 0000000000000000 x6 : 0000000000000004 [ 12.211661] x5 : ffffffffffffffc8 x4 : 00000000173eed80 x3 : ffff80000893b940 [ 12.211665] x2 : 00000000173eed80 x1 : ffff80000893b940 x0 : 0000000000000000 [ 12.211669] Call trace: [ 12.211670] clk_core_init_rate_req+0x84/0x90 [ 12.211673] clk_core_round_rate_nolock+0xe8/0x10c [ 12.211675] clk_mux_determine_rate_flags+0x174/0x1f0 [ 12.211677] clk_mux_determine_rate+0x1c/0x30 [ 12.211680] clk_core_determine_round_nolock+0x74/0x130 [ 12.211682] clk_core_round_rate_nolock+0x58/0x10c [ 12.211684] clk_core_round_rate_nolock+0xf4/0x10c [ 12.211686] clk_core_set_rate_nolock+0x194/0x2ac [ 12.211688] clk_set_rate+0x40/0x94 [ 12.211691] _opp_config_clk_single+0x38/0xa0 [ 12.211693] _set_opp+0x1b0/0x500 [ 12.211695] dev_pm_opp_set_rate+0x120/0x290 [ 12.211697] panfrost_devfreq_target+0x3c/0x50 [panfrost] [ 12.211705] devfreq_set_target+0x8c/0x2d0 [ 12.211707] devfreq_update_target+0xcc/0xf4 [ 12.211708] devfreq_monitor+0x40/0x1d0 [ 12.211710] process_one_work+0x294/0x664 [ 12.211712] worker_thread+0x7c/0x45c [ 12.211713] kthread+0x104/0x110 [ 12.211716] ret_from_fork+0x10/0x20 [ 12.211718] irq event stamp: 7102 [ 12.211719] hardirqs last enabled at (7101): [<ffffdddf904ea5a0>] finish_task_switch.isra.0+0xec/0x2f0 [ 12.211723] hardirqs last disabled at (7102): [<ffffdddf91794b74>] el1_dbg+0x24/0x90 [ 12.211726] softirqs last enabled at (6716): [<ffffdddf90410be4>] __do_softirq+0x414/0x588 [ 12.211728] softirqs last disabled at (6507): [<ffffdddf904171d8>] ____do_softirq+0x18/0x24 [ 12.211730] ---[ end trace 0000000000000000 ]--- Fixes: 262ca38 ("clk: Stop forwarding clk_rate_requests to the parent") Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20221011135548.318323-1-angelogioacchino.delregno@collabora.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>

When trying to use copy_from_kernel_nofault() to read vsyscall page through a bpf program, the following oops was reported: BUG: unable to handle page fault for address: ffffffffff600000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 3231067 P4D 3231067 PUD 3233067 PMD 3235067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 20390 Comm: test_progs ...... 6.7.0+ #58 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... RIP: 0010:copy_from_kernel_nofault+0x6f/0x110 ...... Call Trace: <TASK> ? copy_from_kernel_nofault+0x6f/0x110 bpf_probe_read_kernel+0x1d/0x50 bpf_prog_2061065e56845f08_do_probe_read+0x51/0x8d trace_call_bpf+0xc5/0x1c0 perf_call_bpf_enter.isra.0+0x69/0xb0 perf_syscall_enter+0x13e/0x200 syscall_trace_enter+0x188/0x1c0 do_syscall_64+0xb5/0xe0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 </TASK> ...... ---[ end trace 0000000000000000 ]--- The oops is triggered when: 1) A bpf program uses bpf_probe_read_kernel() to read from the vsyscall page and invokes copy_from_kernel_nofault() which in turn calls __get_user_asm(). 2) Because the vsyscall page address is not readable from kernel space, a page fault exception is triggered accordingly. 3) handle_page_fault() considers the vsyscall page address as a user space address instead of a kernel space address. This results in the fix-up setup by bpf not being applied and a page_fault_oops() is invoked due to SMAP. Considering handle_page_fault() has already considered the vsyscall page address as a userspace address, fix the problem by disallowing vsyscall page read for copy_from_kernel_nofault(). Originally-by: Thomas Gleixner <tglx@linutronix.de> Reported-by: syzbot+72aa0161922eba61b50e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/CAG48ez06TZft=ATH1qh2c5mpS5BT8UakwNkzi6nvK5_djC-4Nw@mail.gmail.com Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/bpf/CABOYnLynjBoFZOf3Z4BhaZkc5hx_kHfsjiW+UWLoB=w33LvScw@mail.gmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240202103935.3154011-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>

matttbe added the enhancement label Jul 16, 2020

matttbe added this to the v5.9 milestone Jul 16, 2020

matttbe assigned mjmartineau Jul 16, 2020

mjmartineau closed this as completed Jul 30, 2020

dcaratti pushed a commit to dcaratti/mptcp_net-next that referenced this issue Sep 2, 2021

Merge pull request multipath-tcp#58 from namjaejeon/cifsd-for-next

db0e04a

ksmbd-fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full DATA_FIN support #58

Full DATA_FIN support #58

matttbe commented Jul 16, 2020 •

edited

Loading

mjmartineau commented Jul 30, 2020

Full DATA_FIN support #58

Full DATA_FIN support #58

Comments

matttbe commented Jul 16, 2020 • edited Loading

mjmartineau commented Jul 30, 2020

matttbe commented Jul 16, 2020 •

edited

Loading