commit 3c730ee65d574cbf2d05559cda2cb07d8f3f8b7a Author: Greg Kroah-Hartman Date: Wed Aug 31 17:18:21 2022 +0200 Linux 5.19.6 Link: https://lore.kernel.org/r/20220829105808.828227973@linuxfoundation.org Tested-by: Florian Fainelli Tested-by: Ron Economos Tested-by: Shuah Khan Tested-by: Zan Aziz Tested-by: Guenter Roeck Tested-by: Ronald Warsow Tested-by: Linux Kernel Functional Testing Tested-by: Sudip Mukherjee Tested-by: Bagas Sanjaya Tested-by: Fenil Jain Tested-by: Rudi Heitbaum Tested-by: Justin M. Forbes Tested-by: Jiri Slaby Signed-off-by: Greg Kroah-Hartman commit a36df92c7ff7ecde2fb362241d0ab024dddd0597 Author: Daniel Borkmann Date: Thu Aug 25 23:26:47 2022 +0200 bpf: Don't use tnum_range on array range checking for poke descriptors commit a657182a5c5150cdfacb6640aad1d2712571a409 upstream. Hsin-Wei reported a KASAN splat triggered by their BPF runtime fuzzer which is based on a customized syzkaller: BUG: KASAN: slab-out-of-bounds in bpf_int_jit_compile+0x1257/0x13f0 Read of size 8 at addr ffff888004e90b58 by task syz-executor.0/1489 CPU: 1 PID: 1489 Comm: syz-executor.0 Not tainted 5.19.0 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: dump_stack_lvl+0x9c/0xc9 print_address_description.constprop.0+0x1f/0x1f0 ? bpf_int_jit_compile+0x1257/0x13f0 kasan_report.cold+0xeb/0x197 ? kvmalloc_node+0x170/0x200 ? bpf_int_jit_compile+0x1257/0x13f0 bpf_int_jit_compile+0x1257/0x13f0 ? arch_prepare_bpf_dispatcher+0xd0/0xd0 ? rcu_read_lock_sched_held+0x43/0x70 bpf_prog_select_runtime+0x3e8/0x640 ? bpf_obj_name_cpy+0x149/0x1b0 bpf_prog_load+0x102f/0x2220 ? __bpf_prog_put.constprop.0+0x220/0x220 ? find_held_lock+0x2c/0x110 ? __might_fault+0xd6/0x180 ? lock_downgrade+0x6e0/0x6e0 ? lock_is_held_type+0xa6/0x120 ? __might_fault+0x147/0x180 __sys_bpf+0x137b/0x6070 ? bpf_perf_link_attach+0x530/0x530 ? new_sync_read+0x600/0x600 ? __fget_files+0x255/0x450 ? lock_downgrade+0x6e0/0x6e0 ? fput+0x30/0x1a0 ? ksys_write+0x1a8/0x260 __x64_sys_bpf+0x7a/0xc0 ? syscall_enter_from_user_mode+0x21/0x70 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f917c4e2c2d The problem here is that a range of tnum_range(0, map->max_entries - 1) has limited ability to represent the concrete tight range with the tnum as the set of resulting states from value + mask can result in a superset of the actual intended range, and as such a tnum_in(range, reg->var_off) check may yield true when it shouldn't, for example tnum_range(0, 2) would result in 00XX -> v = 0000, m = 0011 such that the intended set of {0, 1, 2} is here represented by a less precise superset of {0, 1, 2, 3}. As the register is known const scalar, really just use the concrete reg->var_off.value for the upper index check. Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes") Reported-by: Hsin-Wei Hung Signed-off-by: Daniel Borkmann Cc: Shung-Hsi Yu Acked-by: John Fastabend Link: https://lore.kernel.org/r/984b37f9fdf7ac36831d2137415a4a915744c1b6.1661462653.git.daniel@iogearbox.net Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit f0e5ce88e1cf2734afbb5ad6377c7bd7ad0992d3 Author: Conor Dooley Date: Sat Aug 20 00:14:16 2022 +0100 riscv: dts: microchip: mpfs: remove pci axi address translation property commit e4009c5fa77b4356aa37ce002e9f9952dfd7a615 upstream. An AXI master address translation table property was inadvertently added to the device tree & this was not caught by dtbs_check at the time. Remove the property - it should not be in mpfs.dtsi anyway as it would be more suitable in -fabric.dtsi nor does it actually apply to the version of the reference design we are using for upstream. Link: https://www.microsemi.com/document-portal/doc_download/1245812-polarfire-fpga-and-polarfire-soc-fpga-pci-express-user-guide # Section 1.3.3 Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree") Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit 14f158b9770fd55bacb5087588d8038aa9b80f67 Author: Conor Dooley Date: Sat Aug 20 00:14:15 2022 +0100 riscv: dts: microchip: mpfs: remove bogus card-detect-delay commit 2b55915d27dcaa35f54bad7925af0a76001079bc upstream. Recent versions of dt-schema warn about a previously undetected undocumented property: arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: mmc@20008000: Unevaluated properties are not allowed ('card-detect-delay' was unexpected) From schema: Documentation/devicetree/bindings/mmc/cdns,sdhci.yaml There are no GPIOs connected to MSSIO6B4 pin K3 so adding the common cd-debounce-delay-ms property makes no sense. The Cadence IP has a register that sets the card detect delay as "DP * tclk". On MPFS, this clock frequency is not configurable (it must be 200 MHz) & the FPGA comes out of reset with this register already set. Fixes: bc47b2217f24 ("riscv: dts: microchip: add the sundance polarberry") Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board") Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit a8604d23a8122df7ff929ce4c5d2be1b4be9bb6e Author: Conor Dooley Date: Sat Aug 20 00:14:14 2022 +0100 riscv: dts: microchip: mpfs: remove ti,fifo-depth property commit 72a05748cbd285567d69f173f8694e3471b79f20 upstream. Recent versions of dt-schema warn about a previously undetected undocument property on the icicle & polarberry devicetrees: arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: ethernet@20112000: ethernet-phy@8: Unevaluated properties are not allowed ('ti,fifo-depth' was unexpected) From schema: Documentation/devicetree/bindings/net/cdns,macb.yaml I know what you're thinking, the binding doesn't look to be the problem and I agree. I am not sure why a TI vendor property was ever actually added since it has no meaning... just get rid of it. Fixes: bc47b2217f24 ("riscv: dts: microchip: add the sundance polarberry") Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board") Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit 5977375a7dba19bc882faeeabac9bd271e78b4f6 Author: Conor Dooley Date: Sat Aug 20 00:14:13 2022 +0100 riscv: dts: microchip: mpfs: fix incorrect pcie child node name commit 3f67e69976035352db110443916bcce32c7f64ac upstream. Recent versions of dt-schema complain about the PCIe controller's child node name: arch/riscv/boot/dts/microchip/mpfs-icicle-kit.dtb: pcie@2000000000: Unevaluated properties are not allowed ('clock-names', 'clocks', 'legacy-interrupt-controller', 'microchip,axi-m-atr0' were unexpected) From schema: Documentation/devicetree/bindings/pci/microchip,pcie-host.yaml Make the dts match the correct property name in the dts. Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree") Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit f24ee7391a75b9577fdf40b16039e0b6a97abae3 Author: Mike Christie Date: Thu Aug 11 20:12:06 2022 -0500 scsi: core: Fix passthrough retry counter handling commit fac8e558da9485e13a0ae0488aa0b8a8c307cd34 upstream. Passthrough users will set the scsi_cmnd->allowed value and were expecting up to $allowed retries. The problem is that before: commit 6aded12b10e0 ("scsi: core: Remove struct scsi_request") we used to set the retries on the scsi_request then copy them over to scsi_cmnd->allowed in scsi_setup_scsi_cmnd. With that patch we now set scsi_cmnd->allowed to 0 in scsi_prepare_cmd and overwrite what the passthrough user set. This moves the allowed initialization to after the blk_rq_is_passthrough() check so it's only done for the non-passthrough path where the ULD init_command will normally set an allowed value it prefers. Link: https://lore.kernel.org/r/20220812011206.9157-1-michael.christie@oracle.com Fixes: 6aded12b10e0 ("scsi: core: Remove struct scsi_request") Reviewed-by: Christoph Hellwig Signed-off-by: Mike Christie Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 828f57ac75eaccd6607ee4d1468d34e983e32c68 Author: Saurabh Sengar Date: Thu Aug 4 08:55:34 2022 -0700 scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream. storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it doesn't need to make forward progress under memory pressure. Marking this workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following warning: [ 14.506347] ------------[ cut here ]------------ [ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn [ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130 [ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu [ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun [ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130 <-snip-> [ 14.506408] Call Trace: [ 14.506412] __flush_work+0xf1/0x1c0 [ 14.506414] __cancel_work_timer+0x12f/0x1b0 [ 14.506417] ? kernfs_put+0xf0/0x190 [ 14.506418] cancel_delayed_work_sync+0x13/0x20 [ 14.506420] disk_block_events+0x78/0x80 [ 14.506421] del_gendisk+0x3d/0x2f0 [ 14.506423] sr_remove+0x28/0x70 [ 14.506427] device_release_driver_internal+0xef/0x1c0 [ 14.506428] device_release_driver+0x12/0x20 [ 14.506429] bus_remove_device+0xe1/0x150 [ 14.506431] device_del+0x167/0x380 [ 14.506432] __scsi_remove_device+0x11d/0x150 [ 14.506433] scsi_remove_device+0x26/0x40 [ 14.506434] storvsc_remove_lun+0x40/0x60 [ 14.506436] process_one_work+0x209/0x400 [ 14.506437] worker_thread+0x34/0x400 [ 14.506439] kthread+0x121/0x140 [ 14.506440] ? process_one_work+0x400/0x400 [ 14.506441] ? kthread_park+0x90/0x90 [ 14.506443] ret_from_fork+0x35/0x40 [ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]--- Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.microsoft.com Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun") Reviewed-by: Michael Kelley Signed-off-by: Saurabh Sengar Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit a292244e5bfa8800bd2f9d42c1878b30cb728181 Author: Kiwoong Kim Date: Tue Aug 2 10:42:31 2022 +0900 scsi: ufs: core: Enable link lost interrupt commit 6d17a112e9a63ff6a5edffd1676b99e0ffbcd269 upstream. Link lost is treated as fatal error with commit c99b9b230149 ("scsi: ufs: Treat link loss as fatal error"), but the event isn't registered as interrupt source. Enable it. Link: https://lore.kernel.org/r/1659404551-160958-1-git-send-email-kwmad.kim@samsung.com Fixes: c99b9b230149 ("scsi: ufs: Treat link loss as fatal error") Reviewed-by: Bart Van Assche Signed-off-by: Kiwoong Kim Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 0761b0e818c7b41c0a2c61477a944314150c0ccc Author: Mark Brown Date: Wed Aug 17 19:23:24 2022 +0100 arm64/sme: Don't flush SVE register state when handling SME traps commit 714f3cbd70a4db9f9b7fe5b8a032896ed33fb824 upstream. Currently as part of handling a SME access trap we flush the SVE register state. This is not needed and would corrupt register state if the task has access to the SVE registers already. For non-streaming mode accesses the required flushing will be done in the SVE access trap. For streaming mode SVE register accesses the architecture guarantees that the register state will be flushed when streaming mode is entered or exited so there is no need for us to do so. Simply remove the register initialisation. Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas Link: https://lore.kernel.org/r/20220817182324.638214-5-broonie@kernel.org Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit a8d79f9d1a4d90b7b4eb8bf7aa61995359aeb02e Author: Mark Brown Date: Wed Aug 17 19:23:23 2022 +0100 arm64/sme: Don't flush SVE register state when allocating SME storage commit 826a4fdd2ada9e5923c58bdd168f31a42e958ffc upstream. Currently when taking a SME access trap we allocate storage for the SVE register state in order to be able to handle storage of streaming mode SVE. Due to the original usage in a purely SVE context the SVE register state allocation this also flushes the register state for SVE if storage was already allocated but in the SME context this is not desirable. For a SME access trap to be taken the task must not be in streaming mode so either there already is SVE register state present for regular SVE mode which would be corrupted or the task does not have TIF_SVE and the flush is redundant. Fix this by adding a flag to sve_alloc() indicating if we are in a SVE context and need to flush the state. Freshly allocated storage is always zeroed either way. Fixes: 8bd7f91c03d8 ("arm64/sme: Implement traps and syscall handling for SME") Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas Link: https://lore.kernel.org/r/20220817182324.638214-4-broonie@kernel.org Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit 913fe86ae9038cb450c573ea991499c4f32d1264 Author: Mark Brown Date: Wed Aug 17 19:23:22 2022 +0100 arm64/signal: Flush FPSIMD register state when disabling streaming mode commit ea64baacbc36a0d552aec0d87107182f40211131 upstream. When handling a signal delivered to a context with streaming mode enabled we will disable streaming mode for the signal handler, when doing so we should also flush the saved FPSIMD register state like exiting streaming mode in the hardware would do so that if that state is reloaded we get the same behaviour. Without this we will reload whatever the last FPSIMD state that was saved for the task was. Fixes: 40a8e87bb328 ("arm64/sme: Disable ZA and streaming mode when handling signals") Signed-off-by: Mark Brown Reviewed-by: Catalin Marinas Link: https://lore.kernel.org/r/20220817182324.638214-3-broonie@kernel.org Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit f83cbd14c79459b03f1d0235c76533c5628b7263 Author: Mark Rutland Date: Wed Aug 17 16:40:22 2022 +0100 arm64: fix rodata=full commit 2e8cff0a0eee87b27f0cf87ad8310eb41b5886ab upstream. On arm64, "rodata=full" has been suppored (but not documented) since commit: c55191e96caa9d78 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well") As it's necessary to determine the rodata configuration early during boot, arm64 has an early_param() handler for this, whereas init/main.c has a __setup() handler which is run later. Unfortunately, this split meant that since commit: f9a40b0890658330 ("init/main.c: return 1 from handled __setup() functions") ... passing "rodata=full" would result in a spurious warning from the __setup() handler (though RO permissions would be configured appropriately). Further, "rodata=full" has been broken since commit: 0d6ea3ac94ca77c5 ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") ... which caused strtobool() to parse "full" as false (in addition to many other values not documented for the "rodata=" kernel parameter. This patch fixes this breakage by: * Moving the core parameter parser to an __early_param(), such that it is available early. * Adding an (optional) arch hook which arm64 can use to parse "full". * Updating the documentation to mention that "full" is valid for arm64. * Having the core parameter parser handle "on" and "off" explicitly, such that any undocumented values (e.g. typos such as "ful") are reported as errors rather than being silently accepted. Note that __setup() and early_param() have opposite conventions for their return values, where __setup() uses 1 to indicate a parameter was handled and early_param() uses 0 to indicate a parameter was handled. Fixes: f9a40b089065 ("init/main.c: return 1 from handled __setup() functions") Fixes: 0d6ea3ac94ca ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()") Signed-off-by: Mark Rutland Cc: Andy Shevchenko Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Jagdish Gediya Cc: Matthew Wilcox Cc: Randy Dunlap Cc: Will Deacon Reviewed-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit ec76a1de1d65cdca53918f7b3258b1938a147ed1 Author: Ian Rogers Date: Mon Aug 22 14:33:51 2022 -0700 perf stat: Clear evsel->reset_group for each stat run commit bf515f024e4c0ca46a1b08c4f31860c01781d8a5 upstream. If a weak group is broken then the reset_group flag remains set for the next run. Having reset_group set means the counter isn't created and ultimately a segfault. A simple reproduction of this is: # perf stat -r2 -e '{cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles,cycles}:W which will be added as a test in the next patch. Fixes: 4804e0111662d7d8 ("perf stat: Use affinity for opening events") Reviewed-by: Andi Kleen Signed-off-by: Ian Rogers Tested-by: Arnaldo Carvalho de Melo Tested-by: Xing Zhengjun Cc: Alexander Shishkin Cc: Andi Kleen Cc: Ingo Molnar Cc: Jiri Olsa Cc: Kan Liang Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Stephane Eranian Link: https://lore.kernel.org/r/20220822213352.75721-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 6d7a4a140cfcea05278217dd21e86835e2dc6087 Author: Stephane Eranian Date: Wed Aug 17 22:46:13 2022 -0700 perf/x86/intel/ds: Fix precise store latency handling commit d4bdb0bebc5ba3299d74f123c782d99cd4e25c49 upstream. With the existing code in store_latency_data(), the memory operation (mem_op) returned to the user is always OP_LOAD where in fact, it should be OP_STORE. This comes from the fact that the function is simply grabbing the information from a data source map which covers only load accesses. Intel 12th gen CPU offers precise store sampling that captures both the data source and latency. Therefore it can use the data source mapping table but must override the memory operation to reflect stores instead of loads. Fixes: 61b985e3e775 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com Signed-off-by: Greg Kroah-Hartman commit 291f8baead174e17654465dcccc47e87530f8896 Author: Stephane Eranian Date: Wed Aug 3 09:00:31 2022 -0700 perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU commit 11745ecfe8fea4b4a4c322967a7605d2ecbd5080 upstream. Existing code was generating bogus counts for the SNB IMC bandwidth counters: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/ The problem was introduced by commit: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function. The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above. The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/ Fixes: 07ce734dd8ad ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kan Liang Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com Signed-off-by: Greg Kroah-Hartman commit a9271d39d6dc8a9b2fba6ed9312f8d77ba9f5379 Author: James Clark Date: Thu Jul 28 10:39:46 2022 +0100 perf python: Fix build when PYTHON_CONFIG is user supplied commit bc9e7fe313d5e56d4d5f34bcc04d1165f94f86fb upstream. The previous change to Python autodetection had a small mistake where the auto value was used to determine the Python binary, rather than the user supplied value. The Python binary is only used for one part of the build process, rather than the final linking, so it was producing correct builds in most scenarios, especially when the auto detected value matched what the user wanted, or the system only had a valid set of Pythons. Change it so that the Python binary path is derived from either the PYTHON_CONFIG value or PYTHON value, depending on what is specified by the user. This was the original intention. This error was spotted in a build failure an odd cross compilation environment after commit 4c41cb46a732fe82 ("perf python: Prefer python3") was merged. Fixes: 630af16eee495f58 ("perf tools: Use Python devtools for version autodetection rather than runtime") Signed-off-by: James Clark Acked-by: Ian Rogers Cc: Alexander Shishkin Cc: Ingo Molnar Cc: James Clark Cc: Jiri Olsa Cc: Mark Rutland Cc: Namhyung Kim Cc: Peter Zijlstra Link: https://lore.kernel.org/r/20220728093946.1337642-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit b2f10baf4d67e1a8c0ec52643c20d1895b0f749a Author: Yu Kuai Date: Tue Jul 26 20:22:24 2022 +0800 blk-mq: fix io hung due to missing commit_rqs commit 65fac0d54f374625b43a9d6ad1f2c212bd41f518 upstream. Currently, in virtio_scsi, if 'bd->last' is not set to true while dispatching request, such io will stay in driver's queue, and driver will wait for block layer to dispatch more rqs. However, if block layer failed to dispatch more rq, it should trigger commit_rqs to inform driver. There is a problem in blk_mq_try_issue_list_directly() that commit_rqs won't be called: // assume that queue_depth is set to 1, list contains two rq blk_mq_try_issue_list_directly blk_mq_request_issue_directly // dispatch first rq // last is false __blk_mq_try_issue_directly blk_mq_get_dispatch_budget // succeed to get first budget __blk_mq_issue_directly scsi_queue_rq cmd->flags |= SCMD_LAST virtscsi_queuecommand kick = (sc->flags & SCMD_LAST) != 0 // kick is false, first rq won't issue to disk queued++ blk_mq_request_issue_directly // dispatch second rq __blk_mq_try_issue_directly blk_mq_get_dispatch_budget // failed to get second budget ret == BLK_STS_RESOURCE blk_mq_request_bypass_insert // errors is still 0 if (!list_empty(list) || errors && ...) // won't pass, commit_rqs won't be called In this situation, first rq relied on second rq to dispatch, while second rq relied on first rq to complete, thus they will both hung. Fix the problem by also treat 'BLK_STS_*RESOURCE' as 'errors' since it means that request is not queued successfully. Same problem exists in blk_mq_dispatch_rq_list(), 'BLK_STS_*RESOURCE' can't be treated as 'errors' here, fix the problem by calling commit_rqs if queue_rq return 'BLK_STS_*RESOURCE'. Fixes: d666ba98f849 ("blk-mq: add mq_ops->commit_rqs()") Signed-off-by: Yu Kuai Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20220726122224.1790882-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit ca949183c3407a790100ac1d9fc10821a5fd887f Author: Salvatore Bonaccorso Date: Mon Aug 1 11:15:30 2022 +0200 Documentation/ABI: Mention retbleed vulnerability info file for sysfs commit 00da0cb385d05a89226e150a102eb49d8abb0359 upstream. While reporting for the AMD retbleed vulnerability was added in 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") the new sysfs file was not mentioned so far in the ABI documentation for sysfs-devices-system-cpu. Fix that. Fixes: 6b80b59b3555 ("x86/bugs: Report AMD retbleed vulnerability") Signed-off-by: Salvatore Bonaccorso Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220801091529.325327-1-carnil@debian.org Signed-off-by: Greg Kroah-Hartman commit 43365c8fbb3ca6d60ecb32b5c0f91e1563dd0ac1 Author: Prike Liang Date: Wed Aug 24 11:16:51 2022 +0800 drm/amdkfd: Fix isa version for the GC 10.3.7 commit ee8086dbc1585d9f4020a19447388246a5cff5c8 upstream. Correct the isa version for handling KFD test. Fixes: 7c4f4f197e0c ("drm/amdkfd: Add GC 10.3.6 and 10.3.7 KFD definitions") Signed-off-by: Prike Liang Reviewed-by: Aaron Liu Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit b864bc2ad49f413d670888abd737b2b5da3e5310 Author: Peter Zijlstra Date: Fri Aug 19 13:01:35 2022 +0200 x86/nospec: Fix i386 RSB stuffing commit 332924973725e8cdcc783c175f68cf7e162cb9e5 upstream. Turns out that i386 doesn't unconditionally have LFENCE, as such the loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such chips. Fixes: ba6e31af2be9 ("x86/speculation: Add LFENCE to RSB fill sequence") Reported-by: Ben Hutchings Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman commit 7b0163c1b07b7ff1717aa975821c40df98786ddc Author: Liam Howlett Date: Wed Aug 10 16:02:25 2022 +0000 binder_alloc: add missing mmap_lock calls when using the VMA commit 44e602b4e52f70f04620bbbf4fe46ecb40170bde upstream. Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages() and when checking for a VMA in binder_alloc_new_buf_locked(). It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock after it verifies a VMA exists, but may be taken again deeper in the call stack, if necessary. Link: https://lkml.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com Fixes: a43cfc87caaf (android: binder: stop saving a pointer to the VMA) Signed-off-by: Liam R. Howlett Reported-by: Ondrej Mosnacek Reported-by: Acked-by: Carlos Llamas Tested-by: Ondrej Mosnacek Cc: Minchan Kim Cc: Christian Brauner (Microsoft) Cc: Greg Kroah-Hartman Cc: Hridya Valsaraju Cc: Joel Fernandes Cc: Martijn Coenen Cc: Suren Baghdasaryan Cc: Todd Kjos Cc: Matthew Wilcox (Oracle) Cc: "Arve Hjønnevåg" Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit b887868c4e6b9e8094909f3874444048345fce8a Author: Zenghui Yu Date: Tue Aug 9 12:38:48 2022 +0800 arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76 commit 5e1e087457c94ad7fafbe1cf6f774c6999ee29d4 upstream. Since commit 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs"), we failed to detect erratum 1286807 on Cortex-A76 because its entry in arm64_repeat_tlbi_list[] was accidently corrupted by this commit. Fix this issue by creating a separate entry for Kryo4xx Gold. Fixes: 51f559d66527 ("arm64: Enable repeat tlbi workaround on KRYO4XX gold CPUs") Cc: Shreyas K K Signed-off-by: Zenghui Yu Acked-by: Marc Zyngier Link: https://lore.kernel.org/r/20220809043848.969-1-yuzenghui@huawei.com Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman commit f42a9819ba84bed2e609a4dff56af37063dcabdc Author: Guoqing Jiang Date: Wed Aug 17 20:05:14 2022 +0800 md: call __md_stop_writes in md_stop commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 upstream. From the link [1], we can see raid1d was running even after the path raid_dtr -> md_stop -> __md_stop. Let's stop write first in destructor to align with normal md-raid to fix the KASAN issue. [1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZcec6hQFm8nhg@mail.gmail.com/T/#m7f12bf90481c02c6d2da68c64aeed4779b7df74a Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Reported-by: Mikulas Patocka Signed-off-by: Guoqing Jiang Signed-off-by: Song Liu Signed-off-by: Greg Kroah-Hartman commit 4d83d9b7d5ddbbabfd62af393a02c40ddd2a03db Author: Guoqing Jiang Date: Wed Aug 17 20:05:13 2022 +0800 Revert "md-raid: destroy the bitmap after destroying the thread" commit 1d258758cf06a0734482989911d184dd5837ed4e upstream. This reverts commit e151db8ecfb019b7da31d076130a794574c89f6f. Because it obviously breaks clustered raid as noticed by Neil though it fixed KASAN issue for dm-raid, let's revert it and fix KASAN issue in next commit. [1]. https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@linux.dev/T/#m95ac225cab7409f66c295772483d091084a6d470 Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread") Signed-off-by: Guoqing Jiang Signed-off-by: Song Liu Signed-off-by: Greg Kroah-Hartman commit ba8da1806c4f24be1a0c5ab645b5c92864eab919 Author: David Hildenbrand Date: Thu Aug 11 12:34:34 2022 +0200 mm/hugetlb: fix hugetlb not supporting softdirty tracking commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream. Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2. I observed that hugetlb does not support/expect write-faults in shared mappings that would have to map the R/O-mapped page writable -- and I found two case where we could currently get such faults and would erroneously map an anon page into a shared mapping. Reproducers part of the patches. I propose to backport both fixes to stable trees. The first fix needs a small adjustment. This patch (of 2): Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include #include #include #include #include #include #include #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support softdirty tracking. Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: David Hildenbrand Reviewed-by: Mike Kravetz Cc: Peter Feiner Cc: Kirill A. Shutemov Cc: Cyrill Gorcunov Cc: Pavel Emelyanov Cc: Jamie Liu Cc: Hugh Dickins Cc: Naoya Horiguchi Cc: Bjorn Helgaas Cc: Muchun Song Cc: Peter Xu Cc: [3.18+] Signed-off-by: Andrew Morton Signed-off-by: David Hildenbrand Signed-off-by: Greg Kroah-Hartman commit 5192d4ae17a563039876faae8a66e99a04bc1c34 Author: Jens Axboe Date: Thu Aug 25 10:17:25 2022 -0600 io_uring: fix issue with io_write() not always undoing sb_start_write() commit e053aaf4da56cbf0afb33a0fda4a62188e2c0637 upstream. This is actually an older issue, but we never used to hit the -EAGAIN path before having done sb_start_write(). Make sure that we always call kiocb_end_write() if we need to retry the write, so that we keep the calls to sb_start_write() etc balanced. Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit e8f1d2fd811384b2f91043580b9b3c1c6eaef73d Author: Jiri Slaby Date: Wed Aug 10 09:06:09 2022 +0200 Revert "zram: remove double compression logic" commit 37887783b3fef877bf34b8992c9199864da4afcb upstream. This reverts commit e7be8d1dd983156b ("zram: remove double compression logic") as it causes zram failures. It does not revert cleanly, PTR_ERR handling was introduced in the meantime. This is handled by appropriate IS_ERR. When under memory pressure, zs_malloc() can fail. Before the above commit, the allocation was retried with direct reclaim enabled (GFP_NOIO). After the commit, it is not -- only __GFP_KSWAPD_RECLAIM is tried. So when the failure occurs under memory pressure, the overlaying filesystem such as ext2 (mounted by ext4 module in this case) can emit failures, making the (file)system unusable: EXT4-fs warning (device zram0): ext4_end_bio:343: I/O error 10 writing to inode 16386 starting block 159744) Buffer I/O error on device zram0, logical block 159744 With direct reclaim, memory is really reclaimed and allocation succeeds, eventually. In the worst case, the oom killer is invoked, which is proper outcome if user sets up zram too large (in comparison to available RAM). This very diff doesn't apply to 5.19 (stable) cleanly (see PTR_ERR note above). Use revert of e7be8d1dd983 directly. Link: https://bugzilla.suse.com/show_bug.cgi?id=1202203 Link: https://lkml.kernel.org/r/20220810070609.14402-1-jslaby@suse.cz Fixes: e7be8d1dd983 ("zram: remove double compression logic") Signed-off-by: Jiri Slaby Reviewed-by: Sergey Senozhatsky Cc: Minchan Kim Cc: Nitin Gupta Cc: Alexey Romanov Cc: Dmitry Rokosov Cc: Lukas Czerner Cc: [5.19] Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit c4ce7913dfd2a9cf567dbf70246e6b71934c3e5e Author: Heinrich Schuchardt Date: Wed Aug 17 15:25:21 2022 +0200 riscv: dts: microchip: correct L2 cache interrupts commit 34fc9cc3aebe8b9e27d3bc821543dd482dc686ca upstream. The "PolarFire SoC MSS Technical Reference Manual" documents the following PLIC interrupts: 1 - L2 Cache Controller Signals when a metadata correction event occurs 2 - L2 Cache Controller Signals when an uncorrectable metadata event occurs 3 - L2 Cache Controller Signals when a data correction event occurs 4 - L2 Cache Controller Signals when an uncorrectable data event occurs This differs from the SiFive FU540 which only has three L2 cache related interrupts. The sequence in the device tree is defined by an enum: enum { DIR_CORR = 0, DATA_CORR, DATA_UNCORR, DIR_UNCORR, }; So the correct sequence of the L2 cache interrupts is interrupts = <1>, <3>, <4>, <2>; [Conor] This manifests as an unusable system if the l2-cache driver is enabled, as the wrong interrupt gets cleared & the handler prints errors to the console ad infinitum. Fixes: 0fa6107eca41 ("RISC-V: Initial DTS for Microchip ICICLE board") CC: stable@vger.kernel.org # 5.15: e35b07a7df9b: riscv: dts: microchip: mpfs: Group tuples in interrupt properties Signed-off-by: Heinrich Schuchardt Signed-off-by: Conor Dooley Signed-off-by: Greg Kroah-Hartman commit b8e86aef0a601bc9731c38d4a5b3f0ee5aa99b2d Author: Conor Dooley Date: Sun Aug 14 15:12:38 2022 +0100 riscv: traps: add missing prototype commit d951b20b9def73dcc39a5379831525d0d2a537e9 upstream. Sparse complains: arch/riscv/kernel/traps.c:213:6: warning: symbol 'shadow_stack' was not declared. Should it be static? The variable is used in entry.S, so declare shadow_stack there alongside SHADOW_OVERFLOW_STACK_SIZE. Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection") Signed-off-by: Conor Dooley Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220814141237.493457-5-mail@conchuod.ie Signed-off-by: Palmer Dabbelt Signed-off-by: Greg Kroah-Hartman commit f80d72069ede35765d4eb738c855d2cfed734f9a Author: Conor Dooley Date: Sun Aug 14 15:12:37 2022 +0100 riscv: signal: fix missing prototype warning commit b5c3aca86d2698c4850b6ee8b341938025d2780c upstream. Fix the warning: arch/riscv/kernel/signal.c:316:27: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes] asmlinkage __visible void do_notify_resume(struct pt_regs *regs, All other functions in the file are static & none of the existing headers stood out as an obvious location. Create signal.h to hold the declaration. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Signed-off-by: Conor Dooley Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220814141237.493457-4-mail@conchuod.ie Signed-off-by: Palmer Dabbelt Signed-off-by: Greg Kroah-Hartman commit 45d47bd9b96e7874b98dbcc7602fe2826c5d62a6 Author: Juergen Gross Date: Thu Aug 25 16:19:18 2022 +0200 xen/privcmd: fix error exit of privcmd_ioctl_dm_op() commit c5deb27895e017a0267de0a20d140ad5fcc55a54 upstream. The error exit of privcmd_ioctl_dm_op() is calling unlock_pages() potentially with pages being NULL, leading to a NULL dereference. Additionally lock_pages() doesn't check for pin_user_pages_fast() having been completely successful, resulting in potentially not locking all pages into memory. This could result in sporadic failures when using the related memory in user mode. Fix all of that by calling unlock_pages() always with the real number of pinned pages, which will be zero in case pages being NULL, and by checking the number of pages pinned by pin_user_pages_fast() matching the expected number of pages. Cc: Fixes: ab520be8cd5d ("xen/privcmd: Add IOCTL_PRIVCMD_DM_OP") Reported-by: Rustam Subkhankulov Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Oleksandr Tyshchenko Link: https://lore.kernel.org/r/20220825141918.3581-1-jgross@suse.com Signed-off-by: Juergen Gross Signed-off-by: Greg Kroah-Hartman commit f377ac7597ba6a631ed98888e8027f9a7b2dbe7e Author: Heming Zhao Date: Mon Aug 15 16:57:54 2022 +0800 ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown commit 550842cc60987b269e31b222283ade3e1b6c7fc8 upstream. After commit 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error"), any procedure after ocfs2_dlm_init() fails will trigger crash when calling ocfs2_dlm_shutdown(). ie: On local mount mode, no dlm resource is initialized. If ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will trigger kernel crash. This solution should bypass uninitialized resources in ocfs2_dlm_shutdown(). Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error") Signed-off-by: Heming Zhao Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit a25f09216071fe49cf453746f04785be538d1234 Author: David Howells Date: Tue Aug 23 02:10:56 2022 -0500 smb3: missing inode locks in punch hole commit ba0803050d610d5072666be727bca5e03e55b242 upstream. smb3 fallocate punch hole was not grabbing the inode or filemap_invalidate locks so could have race with pagemap reinstantiating the page. Cc: stable@vger.kernel.org Signed-off-by: David Howells Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 8e3ba23a67de984f4156f0663f1f603ff6c15815 Author: Karol Herbst Date: Fri Aug 19 22:09:28 2022 +0200 nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf commit 6b04ce966a738ecdd9294c9593e48513c0dc90aa upstream. It is a bit unlcear to us why that's helping, but it does and unbreaks suspend/resume on a lot of GPUs without any known drawbacks. Cc: stable@vger.kernel.org # v5.15+ Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156 Signed-off-by: Karol Herbst Reviewed-by: Lyude Paul Link: https://patchwork.freedesktop.org/patch/msgid/20220819200928.401416-1-kherbst@redhat.com Signed-off-by: Greg Kroah-Hartman commit f1a7466258b7fbb171728e0efabaef038ed1e1e6 Author: Riwen Lu Date: Tue Aug 23 15:43:42 2022 +0800 ACPI: processor: Remove freq Qos request for all CPUs commit 36527b9d882362567ceb4eea8666813280f30e6f upstream. The freq Qos request would be removed repeatedly if the cpufreq policy relates to more than one CPU. Then, it would cause the "called for unknown object" warning. Remove the freq Qos request for each CPU relates to the cpufreq policy, instead of removing repeatedly for the last CPU of it. Fixes: a1bb46c36ce3 ("ACPI: processor: Add QoS requests for all CPUs") Reported-by: Jeremy Linton Tested-by: Jeremy Linton Signed-off-by: Riwen Lu Cc: 5.4+ # 5.4+ Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman commit c061d697a304cc652a21eae4c252299de7e28cc5 Author: Matthew Wilcox (Oracle) Date: Sat Jul 30 05:25:18 2022 +0100 shmem: update folio if shmem_replace_page() updates the page commit 9dfb3b8d655022760ca68af11821f1c63aa547c3 upstream. If we allocate a new page, we need to make sure that our folio matches that new page. If we do end up in this code path, we store the wrong page in the shmem inode's page cache, and I would rather imagine that data corruption ensues. This will be solved by changing shmem_replace_page() to shmem_replace_folio(), but this is the minimal fix. Link: https://lkml.kernel.org/r/20220730042518.1264767-1-willy@infradead.org Fixes: da08e9b79323 ("mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio()") Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: William Kucharski Cc: Hugh Dickins Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 5f4d2b0caf2063e8b2560bf39be9c39443b3e91e Author: Shakeel Butt Date: Wed Aug 17 17:21:39 2022 +0000 Revert "memcg: cleanup racy sum avoidance code" commit dbb16df6443c59e8a1ef21c2272fcf387d600ddf upstream. This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f. Recently we started running the kernel with rstat infrastructure on production traffic and begin to see negative memcg stats values. Particularly the 'sock' stat is the one which we observed having negative value. $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 18446744073708724224 Re-run after couple of seconds $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 53248 For now we are only seeing this issue on large machines (256 CPUs) and only with 'sock' stat. I think the networking stack increase the stat on one cpu and decrease it on another cpu much more often. So, this negative sock is due to rstat flusher flushing the stats on the CPU that has seen the decrement of sock but missed the CPU that has increments. A typical race condition. For easy stable backport, revert is the most simple solution. For long term solution, I am thinking of two directions. First is just reduce the race window by optimizing the rstat flusher. Second is if the reader sees a negative stat value, force flush and restart the stat collection. Basically retry but limited. Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code") Signed-off-by: Shakeel Butt Cc: "Michal Koutný" Cc: Johannes Weiner Cc: Michal Hocko Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Yosry Ahmed Cc: Greg Thelen Cc: [5.15] Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit f08ccb792d3eaf1dc62d8cbf6a30d6522329f660 Author: Shigeru Yoshida Date: Fri Aug 19 03:13:36 2022 +0900 fbdev: fbcon: Properly revert changes when vc_resize() failed commit a5a923038d70d2d4a86cb4e3f32625a5ee6e7e24 upstream. fbcon_do_set_font() calls vc_resize() when font size is changed. However, if if vc_resize() failed, current implementation doesn't revert changes for font size, and this causes inconsistent state. syzbot reported unable to handle page fault due to this issue [1]. syzbot's repro uses fault injection which cause failure for memory allocation, so vc_resize() failed. This patch fixes this issue by properly revert changes for font related date when vc_resize() failed. Link: https://syzkaller.appspot.com/bug?id=3443d3a1fa6d964dd7310a0cb1696d165a3e07c4 [1] Reported-by: syzbot+a168dbeaaa7778273c1b@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida Signed-off-by: Helge Deller CC: stable@vger.kernel.org # 5.15+ Signed-off-by: Greg Kroah-Hartman commit fbdc482d43eda40a70de4b0155843d5472f6de62 Author: Brian Foster Date: Tue Aug 16 11:54:07 2022 -0400 s390: fix double free of GS and RI CBs on fork() failure commit 13cccafe0edcd03bf1c841de8ab8a1c8e34f77d9 upstream. The pointers for guarded storage and runtime instrumentation control blocks are stored in the thread_struct of the associated task. These pointers are initially copied on fork() via arch_dup_task_struct() and then cleared via copy_thread() before fork() returns. If fork() happens to fail after the initial task dup and before copy_thread(), the newly allocated task and associated thread_struct memory are freed via free_task() -> arch_release_task_struct(). This results in a double free of the guarded storage and runtime info structs because the fields in the failed task still refer to memory associated with the source task. This problem can manifest as a BUG_ON() in set_freepointer() (with CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled) when running trinity syscall fuzz tests on s390x. To avoid this problem, clear the associated pointer fields in arch_dup_task_struct() immediately after the new task is copied. Note that the RI flag is still cleared in copy_thread() because it resides in thread stack memory and that is where stack info is copied. Signed-off-by: Brian Foster Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling") Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling") Cc: # 4.15 Reviewed-by: Gerald Schaefer Reviewed-by: Heiko Carstens Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com Signed-off-by: Vasily Gorbik Signed-off-by: Greg Kroah-Hartman commit bb125123f60ea05211d4b3e5ff9dfa7e9ddd43ab Author: Paulo Alcantara Date: Fri Aug 19 17:00:19 2022 -0300 cifs: skip extra NULL byte in filenames commit a1d2eb51f0a33c28f5399a1610e66b3fbd24e884 upstream. Since commit: cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty alloc_path_with_tree_prefix() function was no longer including the trailing separator when @path is empty, although @out_len was still assuming a path separator thus adding an extra byte to the final filename. This has caused mount issues in some Synology servers due to the extra NULL byte in filenames when sending SMB2_CREATE requests with SMB2_FLAGS_DFS_OPERATIONS set. Fix this by checking if @path is not empty and then add extra byte for separator. Also, do not include any trailing NULL bytes in filename as MS-SMB2 requires it to be 8-byte aligned and not NULL terminated. Cc: stable@vger.kernel.org Fixes: 7eacba3b00a3 ("cifs: alloc_path_with_tree_prefix: do not append sep. if the path is empty") Signed-off-by: Paulo Alcantara (SUSE) Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman commit 5fcf81e308d1f4ae95f31690d2a80b7061385ff9 Author: Peter Xu Date: Tue Aug 23 18:11:38 2022 -0400 mm/mprotect: only reference swap pfn page if type match commit 3d2f78f08cd8388035ac375e731ec1ac1b79b09d upstream. Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]: kernel BUG at include/linux/swapops.h:117! CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S O L 6.0.0-dbg-DEV #2 RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0 Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6 c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b 48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48 RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282 RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000 RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000 R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738 R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a FS: 00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: change_pte_range+0x36e/0x880 change_p4d_range+0x2e8/0x670 change_protection_range+0x14e/0x2c0 mprotect_fixup+0x1ee/0x330 do_mprotect_pkey+0x34c/0x440 __x64_sys_mprotect+0x1d/0x30 It triggers because pfn_swap_entry_to_page() could be called upon e.g. a genuine swap entry. Fix it by only calling it when it's a write migration entry where the page* is used. [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") Signed-off-by: Peter Xu Reported-by: Yu Zhao Tested-by: Yu Zhao Reviewed-by: David Hildenbrand Cc: "Huang, Ying" Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 3ada1b3e58db255a14ec73a59d7913e84dc5a8a4 Author: Miaohe Lin Date: Tue Jul 12 21:05:42 2022 +0800 mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte commit ab74ef708dc51df7cf2b8a890b9c6990fac5c0c6 upstream. In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page cache are installed in the ptes. But hugepage_add_new_anon_rmap is called for them mistakenly because they're not vm_shared. This will corrupt the page->mapping used by page cache code. Link: https://lkml.kernel.org/r/20220712130542.18836-1-linmiaohe@huawei.com Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl") Signed-off-by: Miaohe Lin Reviewed-by: Mike Kravetz Cc: Axel Rasmussen Cc: Peter Xu Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 9ae15c4ba2be1e5a62503b6d873e84beb5fcbb5a Author: Liu Shixin Date: Fri Aug 19 17:40:05 2022 +0800 bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem commit dd0ff4d12dd284c334f7e9b07f8f335af856ac78 upstream. The vmemmap pages is marked by kmemleak when allocated from memblock. Remove it from kmemleak when freeing the page. Otherwise, when we reuse the page, kmemleak may report such an error and then stop working. kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing) kmemleak: Kernel memory leak detector disabled kmemleak: Object 0xffff98fb6be00000 (size 335544320): kmemleak: comm "swapper", pid 0, jiffies 4294892296 kmemleak: min_count = 0 kmemleak: count = 0 kmemleak: flags = 0x1 kmemleak: checksum = 0 kmemleak: backtrace: Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com Fixes: f41f2ed43ca5 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page) Signed-off-by: Liu Shixin Reviewed-by: Muchun Song Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 8eaa24d57ab6a3f95be50c947a885f983869e8cb Author: Gerald Schaefer Date: Wed Aug 17 15:26:03 2022 +0200 s390/mm: do not trigger write fault when vma does not allow VM_WRITE commit 41ac42f137080bc230b5882e3c88c392ab7f2d32 upstream. For non-protection pXd_none() page faults in do_dat_exception(), we call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC). In do_exception(), vma->vm_flags is checked against that before calling handle_mm_fault(). Since commit 92f842eac7ee3 ("[S390] store indication fault optimization"), we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that it was a write access. However, the vma flags check is still only checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma does not allow VM_WRITE. Fix this by changing access check in do_exception() to VM_WRITE only, when recognizing write access. Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com Fixes: 92f842eac7ee3 ("[S390] store indication fault optimization") Cc: Reported-by: David Hildenbrand Reviewed-by: Heiko Carstens Signed-off-by: Gerald Schaefer Signed-off-by: Vasily Gorbik Signed-off-by: Greg Kroah-Hartman commit c035edae0dad1aff599ce2d3ecb8d91d90ec5da0 Author: Badari Pulavarty Date: Sun Aug 21 18:08:53 2022 +0000 mm/damon/dbgfs: avoid duplicate context directory creation commit d26f60703606ab425eee9882b32a1781a8bed74d upstream. When user tries to create a DAMON context via the DAMON debugfs interface with a name of an already existing context, the context directory creation fails but a new context is created and added in the internal data structure, due to absence of the directory creation success check. As a result, memory could leak and DAMON cannot be turned on. An example test case is as below: # cd /sys/kernel/debug/damon/ # echo "off" > monitor_on # echo paddr > target_ids # echo "abc" > mk_context # echo "abc" > mk_context # echo $$ > abc/target_ids # echo "on" > monitor_on <<< fails Return value of 'debugfs_create_dir()' is expected to be ignored in general, but this is an exceptional case as DAMON feature is depending on the debugfs functionality and it has the potential duplicate name issue. This commit therefore fixes the issue by checking the directory creation failure and immediately return the error in the case. Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts") Signed-off-by: Badari Pulavarty Signed-off-by: SeongJae Park Cc: [ 5.15.x] Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit fe64e17d9b0120c6b1b02ec72ca5fc2d08cf1fcd Author: Quanyang Wang Date: Fri Aug 19 16:11:45 2022 +0800 asm-generic: sections: refactor memory_intersects commit 0c7d7cc2b4fe2e74ef8728f030f0f1674f9f6aee upstream. There are two problems with the current code of memory_intersects: First, it doesn't check whether the region (begin, end) falls inside the region (virt, vend), that is (virt < begin && vend > end). The second problem is if vend is equal to begin, it will return true but this is wrong since vend (virt + size) is not the last address of the memory region but (virt + size -1) is. The wrong determination will trigger the misreporting when the function check_for_illegal_area calls memory_intersects to check if the dma region intersects with stext region. The misreporting is as below (stext is at 0x80100000): WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536] Modules linked in: CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5 Hardware name: Xilinx Zynq Platform unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0xb0/0x198 __warn from warn_slowpath_fmt+0x80/0xb4 warn_slowpath_fmt from check_for_illegal_area+0x130/0x168 check_for_illegal_area from debug_dma_map_sg+0x94/0x368 debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128 __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24 dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4 usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214 usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118 usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70 usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360 usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440 usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238 usb_stor_control_thread from kthread+0xf8/0x104 kthread from ret_from_fork+0x14/0x2c Refactor memory_intersects to fix the two problems above. Before the 1d7db834a027e ("dma-debug: use memory_intersects() directly"), memory_intersects is called only by printk_late_init: printk_late_init -> init_section_intersects ->memory_intersects. There were few places where memory_intersects was called. When commit 1d7db834a027e ("dma-debug: use memory_intersects() directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA subsystem uses it to check for an illegal area and the calltrace above is triggered. [akpm@linux-foundation.org: fix nearby comment typo] Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com Fixes: 979559362516 ("asm/sections: add helpers to check for section data") Signed-off-by: Quanyang Wang Cc: Ard Biesheuvel Cc: Arnd Bergmann Cc: Thierry Reding Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 22ebb780d54ef1e9a1fdba41696ebf48ef99b96d Author: Richard Guy Briggs Date: Thu Aug 25 15:32:40 2022 -0400 audit: move audit_return_fixup before the filters commit d4fefa4801a1c2f9c0c7a48fbb0fdf384e89a4ab upstream. The success and return_code are needed by the filters. Move audit_return_fixup() before the filters. This was causing syscall auditing events to be missed. Link: https://github.com/linux-audit/audit-kernel/issues/138 Cc: stable@vger.kernel.org Fixes: 12c5e81d3fd0 ("audit: prepare audit_context for use in calling contexts beyond syscalls") Signed-off-by: Richard Guy Briggs [PM: manual merge required] Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman commit 9a6c710f3bc10bc9cc23e1c080b53245b7f9d5b7 Author: Khazhismel Kumykov Date: Mon Aug 1 08:50:34 2022 -0700 writeback: avoid use-after-free after removing device commit f87904c075515f3e1d8f4a7115869d3b914674fd upstream. When a disk is removed, bdi_unregister gets called to stop further writeback and wait for associated delayed work to complete. However, wb_inode_writeback_end() may schedule bandwidth estimation dwork after this has completed, which can result in the timer attempting to access the just freed bdi_writeback. Fix this by checking if the bdi_writeback is alive, similar to when scheduling writeback work. Since this requires wb->work_lock, and wb_inode_writeback_end() may get called from interrupt, switch wb->work_lock to an irqsafe lock. Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload") Signed-off-by: Khazhismel Kumykov Reviewed-by: Jan Kara Cc: Michael Stapelberg Cc: Wu Fengguang Cc: Alexander Viro Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman commit 9be7fa7ead18a48940df7b59d993bbc8b9055c15 Author: Siddh Raman Pant Date: Tue Aug 23 21:38:10 2022 +0530 loop: Check for overflow while configuring loop commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream. The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset. loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h). The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset; This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos); Thus, check for negative value during loop_set_status_from_info(). Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Siddh Raman Pant Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit a210408b902465c20970c2abc1ef4391d1769cf6 Author: Jan Beulich Date: Thu Apr 28 16:50:29 2022 +0200 x86/PAT: Have pat_enabled() properly reflect state when running on Xen commit 72cbc8f04fe2fa93443c0fcccb7ad91dfea3d9ce upstream. After commit ID in the Fixes: tag, pat_enabled() returns false (because of PAT initialization being suppressed in the absence of MTRRs being announced to be available). This has become a problem: the i915 driver now fails to initialize when running PV on Xen (i915_gem_object_pin_map() is where I located the induced failure), and its error handling is flaky enough to (at least sometimes) result in a hung system. Yet even beyond that problem the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses would have been quite a bit less optimal than possible. Arrange for the function to return true in such environments, without undermining the rest of PAT MSR management logic considering PAT to be disabled: specifically, no writes to the PAT MSR should occur. For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before). [ bp: This is the "small fix" variant for stable. It'll get replaced with a proper PAT and MTRR detection split upstream but that is too involved for a stable backport. - additional touchups to commit msg. Use cpu_feature_enabled(). ] Fixes: bdd8b6c98239 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: Jan Beulich Signed-off-by: Borislav Petkov Acked-by: Ingo Molnar Cc: Cc: Juergen Gross Cc: Lucas De Marchi Link: https://lore.kernel.org/r/9385fa60-fa5d-f559-a137-6608408f88b0@suse.com Signed-off-by: Greg Kroah-Hartman commit d9975eea5e6add825b18dadc8c13b0424f48ba4b Author: Peter Zijlstra Date: Tue Aug 16 14:28:36 2022 +0200 x86/nospec: Unwreck the RSB stuffing commit 4e3aa9238277597c6c7624f302d81a7b568b6f2d upstream. Commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") made a right mess of the RSB stuffing, rewrite the whole thing to not suck. Thanks to Andrew for the enlightening comment about Post-Barrier RSB things so we can make this code less magical. Cc: stable@vger.kernel.org Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman commit 9d0a21053cf3a3c229e56e96464048aa3b9f657e Author: Pawan Gupta Date: Wed Aug 3 14:41:32 2022 -0700 x86/bugs: Add "unknown" reporting for MMIO Stale Data commit 7df548840c496b0141fb2404b889c346380c2b22 upstream. Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper Suggested-by: Tony Luck Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 0666703c4be88fb576dab5bb109aa4f06c9ca073 Author: Tom Lendacky Date: Tue Aug 23 16:55:51 2022 -0500 x86/sev: Don't use cc_platform_has() for early SEV-SNP calls commit cdaa0a407f1acd3a44861e3aea6e3c7349e668f1 upstream. When running identity-mapped and depending on the kernel configuration, it is possible that the compiler uses jump tables when generating code for cc_platform_has(). This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity-mapped addresses. This has been seen with CONFIG_RETPOLINE=n. Similar to sme_encrypt_kernel(), use an open-coded direct check for the status of SNP rather than trying to eliminate the jump table. This preserves any code optimization in cc_platform_has() that can be useful post boot. It also limits the changes to SEV-specific files so that future compiler features won't necessarily require possible build changes just because they are not compatible with running identity-mapped. [ bp: Massage commit message. ] Fixes: 5e5ccff60a29 ("x86/sev: Add helper for validating pages in early enc attribute changes") Reported-by: Sean Christopherson Suggested-by: Sean Christopherson Signed-off-by: Tom Lendacky Signed-off-by: Borislav Petkov Cc: # 5.19.x Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/ Signed-off-by: Greg Kroah-Hartman commit a10290756e4fc89c1f2a9f39f5d27ed58dc895b5 Author: Chen Zhongjin Date: Fri Aug 19 16:43:34 2022 +0800 x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream. When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176. If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address. By adding the base address ftrace_{regs_}caller with the offset *ip - ops->trampoline*, we can get the correct address to find the ORC entry. Also change "caller" to "tramp_addr" to make variable name conform to its content. [ mingo: Clarified the changelog a bit. ] Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: Chen Zhongjin Signed-off-by: Ingo Molnar Reviewed-by: Steven Rostedt (Google) Cc: Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com Signed-off-by: Greg Kroah-Hartman commit d1a6d0a9631fe60bf113fa44a2074e577cb9a35e Author: Juergen Gross Date: Tue Aug 16 09:11:37 2022 +0200 x86/entry: Fix entry_INT80_compat for Xen PV guests commit 5b9f0c4df1c1152403c738373fb063e9ffdac0a1 upstream. Commit c89191ce67ef ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") missed one use case of SWAPGS in entry_INT80_compat(). Removing of the SWAPGS macro led to asm just using "swapgs", as it is accepting instructions in capital letters, too. This in turn leads to splats in Xen PV guests like: [ 36.145223] general protection fault, maybe for address 0x2d: 0000 [#1] PREEMPT SMP NOPTI [ 36.145794] CPU: 2 PID: 1847 Comm: ld-linux.so.2 Not tainted 5.19.1-1-default #1 \ openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b [ 36.146608] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013 [ 36.148126] RIP: e030:entry_INT80_compat+0x3/0xa3 Fix that by open coding this single instance of the SWAPGS macro. Fixes: c89191ce67ef ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") Signed-off-by: Juergen Gross Signed-off-by: Borislav Petkov Reviewed-by: Jan Beulich Cc: # 5.19 Link: https://lore.kernel.org/r/20220816071137.4893-1-jgross@suse.com Signed-off-by: Greg Kroah-Hartman commit 66f2f9f2772639e07b465d05f7e2a89eb6d66813 Author: Kan Liang Date: Tue Aug 16 05:56:11 2022 -0700 perf/x86/lbr: Enable the branch type for the Arch LBR by default commit 32ba156df1b1c8804a4e5be5339616945eafea22 upstream. On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: Stephane Eranian Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit e31430b23603661b7c4751c212eef23b5b0be03e Author: Kan Liang Date: Thu Aug 18 11:44:29 2022 -0700 perf/x86/intel: Fix pebs event constraints for ADL commit cde643ff75bc20c538dfae787ca3b587bab16b50 upstream. According to the latest event list, the LOAD_LATENCY PEBS event only works on the GP counter 0 and 1 for ADL and RPL. Update the pebs event constraints table. Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: Ammy Yi Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220818184429.2355857-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit ffbf5efde85e3fff2daeed3c9855b2861f932783 Author: Michael Roth Date: Tue Aug 23 11:07:34 2022 -0500 x86/boot: Don't propagate uninitialized boot_params->cc_blob_address commit 4b1c742407571eff58b6de9881889f7ca7c4b4dc upstream. In some cases, bootloaders will leave boot_params->cc_blob_address uninitialized rather than zeroing it out. This field is only meant to be set by the boot/compressed kernel in order to pass information to the uncompressed kernel when SEV-SNP support is enabled. Therefore, there are no cases where the bootloader-provided values should be treated as anything other than garbage. Otherwise, the uncompressed kernel may attempt to access this bogus address, leading to a crash during early boot. Normally, sanitize_boot_params() would be used to clear out such fields but that happens too late: sev_enable() may have already initialized it to a valid value that should not be zeroed out. Instead, have sev_enable() zero it out unconditionally beforehand. Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also including this handling in the sev_enable() stub function. [ bp: Massage commit message and comments. ] Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup") Reported-by: Jeremi Piotrowski Reported-by: watnuss@gmx.de Signed-off-by: Michael Roth