Merge pull request #1 from torvalds/master #492

ghost · 2017-11-25T12:10:21Z

Merge

Recent refactoring of tc block offloads infrastructure introduced new flow_block_cb_setup_simple() method intended to be used as unified way for all drivers to register offload callbacks. However, commit that actually extended all users (drivers) with block cb list and provided it to flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer dereference when creating Qdisc: [ 278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 278.393233] #PF: supervisor read access in kernel mode [ 278.399446] #PF: error_code(0x0000) - not-present page [ 278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0 [ 278.414141] Oops: 0000 [#1] SMP PTI [ 278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ torvalds#492 [ 278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190 [ 278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12 48 3b 50 28 [ 278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246 [ 278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840 [ 278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998 [ 278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001 [ 278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08 [ 278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40 [ 278.515374] FS: 00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000 [ 278.525099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0 [ 278.541197] Call Trace: [ 278.545252] tcf_block_offload_cmd.isra.52+0x7e/0xb0 [ 278.551871] tcf_block_get_ext+0x365/0x3e0 [ 278.557569] qdisc_create+0x15c/0x4e0 [ 278.562859] ? kmem_cache_alloc_trace+0x1a2/0x1c0 [ 278.569235] tc_modify_qdisc+0x1c8/0x780 [ 278.574761] rtnetlink_rcv_msg+0x291/0x340 [ 278.580518] ? _cond_resched+0x15/0x40 [ 278.585856] ? rtnl_calcit.isra.29+0x120/0x120 [ 278.591868] netlink_rcv_skb+0x4a/0x110 [ 278.597198] netlink_unicast+0x1a0/0x250 [ 278.602601] netlink_sendmsg+0x2c1/0x3c0 [ 278.608022] sock_sendmsg+0x5b/0x60 [ 278.612969] ___sys_sendmsg+0x289/0x310 [ 278.618231] ? do_wp_page+0x99/0x730 [ 278.623216] ? page_add_new_anon_rmap+0xbe/0x140 [ 278.629298] ? __handle_mm_fault+0xc84/0x1360 [ 278.635113] ? __sys_sendmsg+0x5e/0xa0 [ 278.640285] __sys_sendmsg+0x5e/0xa0 [ 278.645239] do_syscall_64+0x5b/0x1b0 [ 278.650274] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 278.656697] RIP: 0033:0x7f796abdeb87 [ 278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87 [ 278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003 [ 278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000 [ 278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_ thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k ms_helper ttm drm mpt3sas raid_class scsi_transport_sas [ 278.802263] CR2: 0000000000000000 [ 278.807170] ---[ end trace b1f0a442a279e66f ]--- Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to flow_block_cb_setup_simple() function instead of hardcoded NULL pointer. Fixes: 955bcb6 ("drivers: net: use flow block API") Signed-off-by: Vlad Buslov <[email protected]>

Recent refactoring of tc block offloads infrastructure introduced new flow_block_cb_setup_simple() method intended to be used as unified way for all drivers to register offload callbacks. However, commit that actually extended all users (drivers) with block cb list and provided it to flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer dereference when creating Qdisc: [ 278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 278.393233] #PF: supervisor read access in kernel mode [ 278.399446] #PF: error_code(0x0000) - not-present page [ 278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0 [ 278.414141] Oops: 0000 [#1] SMP PTI [ 278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ torvalds#492 [ 278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190 [ 278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12 48 3b 50 28 [ 278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246 [ 278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840 [ 278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998 [ 278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001 [ 278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08 [ 278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40 [ 278.515374] FS: 00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000 [ 278.525099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0 [ 278.541197] Call Trace: [ 278.545252] tcf_block_offload_cmd.isra.52+0x7e/0xb0 [ 278.551871] tcf_block_get_ext+0x365/0x3e0 [ 278.557569] qdisc_create+0x15c/0x4e0 [ 278.562859] ? kmem_cache_alloc_trace+0x1a2/0x1c0 [ 278.569235] tc_modify_qdisc+0x1c8/0x780 [ 278.574761] rtnetlink_rcv_msg+0x291/0x340 [ 278.580518] ? _cond_resched+0x15/0x40 [ 278.585856] ? rtnl_calcit.isra.29+0x120/0x120 [ 278.591868] netlink_rcv_skb+0x4a/0x110 [ 278.597198] netlink_unicast+0x1a0/0x250 [ 278.602601] netlink_sendmsg+0x2c1/0x3c0 [ 278.608022] sock_sendmsg+0x5b/0x60 [ 278.612969] ___sys_sendmsg+0x289/0x310 [ 278.618231] ? do_wp_page+0x99/0x730 [ 278.623216] ? page_add_new_anon_rmap+0xbe/0x140 [ 278.629298] ? __handle_mm_fault+0xc84/0x1360 [ 278.635113] ? __sys_sendmsg+0x5e/0xa0 [ 278.640285] __sys_sendmsg+0x5e/0xa0 [ 278.645239] do_syscall_64+0x5b/0x1b0 [ 278.650274] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 278.656697] RIP: 0033:0x7f796abdeb87 [ 278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87 [ 278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003 [ 278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000 [ 278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_ thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k ms_helper ttm drm mpt3sas raid_class scsi_transport_sas [ 278.802263] CR2: 0000000000000000 [ 278.807170] ---[ end trace b1f0a442a279e66f ]--- Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to flow_block_cb_setup_simple() function instead of hardcoded NULL pointer. Fixes: 955bcb6 ("drivers: net: use flow block API") Signed-off-by: Vlad Buslov <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>

When interface's namespace is being changed, dev_change_net_namespace() is called. This removes and re-allocates many resources that include sysfs files. The "/net/class/net/<interface name>" is one of them. If the sysfs creation routine(device_rename()) found duplicate sysfs file name, it warns about it and fails. But unfortunately, at that point, dev_change_net_namespace() doesn't return fail because rollback cost is too high. So, the interface can't have a sysfs file. The approach of this patch is to find the duplicate sysfs file as fast as possible. If it found that, dev_change_net_namespace() returns fail immediately with zero rollback cost. This patch includes two other things. a) Acquire rtnl_lock() in both bond_create_sysfs() and bond_destroy_sysfs() to avoid race condition. b) Do not remove "/sys/class/net/bonding_masters" sysfs file by bond_destroy_sysfs() if the file wasn't created by bond_create_sysfs(). Test commands: ip netns add nst ip link add bonding_masters type dummy modprobe bonding ip link set bonding_masters netns nst Splat looks like: [ 6396.214052] WARNING: CPU: 2 PID: 1167 at net/core/dev.c:10119 dev_change_net_namespace+0xa13/0xcb0 [ 6396.215087] Modules linked in: bonding dummy bareudp openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 af_packet e1000 sch_fq_codel nf_tables nfnetlink ip_tables x_tables unix [ 6396.217064] CPU: 2 PID: 1167 Comm: ip Not tainted 5.6.0+ torvalds#492 [ 6396.217885] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 6396.219046] RIP: 0010:dev_change_net_namespace+0xa13/0xcb0 [ 6396.219845] Code: e9 bc f6 ff ff 4c 8b 64 24 28 e9 a5 fc ff ff b8 ea ff ff ff e9 65 fb ff ff e8 f9 00 27 ff e9 f5 f6 ff ff 0f 0b e9 0b fb ff ff <0f> 0b e9 ec fa ff ff b8 ef ff ff ff e9 43 fb ff ff 4c 89 ff 48 89 [ 6396.222336] RSP: 0018:ffff8880518a6dd0 EFLAGS: 00010282 [ 6396.223114] RAX: 00000000ffffffef RBX: ffff888057049000 RCX: ffff888051a8c000 [ 6396.224128] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffb8dc0074 [ 6396.225265] RBP: ffff8880570490a0 R08: fffffbfff7166089 R09: fffffbfff7166089 [ 6396.226307] R10: 0000000000000001 R11: fffffbfff7166088 R12: ffff8880570490b8 [ 6396.227333] R13: ffff888057398040 R14: ffff888057398040 R15: ffff888057049090 [ 6396.228350] FS: 00007fdb2d3710c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 6396.229514] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6396.230403] CR2: 000055ec001c4910 CR3: 000000006591c001 CR4: 00000000000606e0 [ 6396.231414] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 6396.232420] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 6396.233463] Call Trace: [ 6396.233907] ? do_dup2+0x450/0x450 [ 6396.234452] ? deref_stack_reg+0x9c/0xd0 [ 6396.235078] ? dev_get_valid_name+0xc0/0xc0 [ 6396.235718] ? ns_capable_common+0x5c/0xd0 [ 6396.236342] ? __netlink_ns_capable+0xc3/0xf0 [ 6396.237022] do_setlink+0x171/0x2f70 [ 6396.237663] ? check_flags.part.41+0x450/0x450 [ 6396.238393] ? unwind_next_frame+0xb77/0x1c30 [ 6396.239113] ? rtnl_getlink+0x8a0/0x8a0 [ 6396.239751] ? is_bpf_text_address+0x81/0xe0 [ 6396.240447] ? kernel_text_address+0x111/0x120 [ 6396.241177] ? __kernel_text_address+0xe/0x30 [ 6396.241860] ? unwind_get_return_address+0x5f/0xa0 [ 6396.242571] ? create_prof_cpu_mask+0x20/0x20 [ 6396.243239] ? arch_stack_walk+0x83/0xb0 [ 6396.243842] ? pskb_expand_head+0x25f/0xe10 [ 6396.244479] ? stack_trace_save+0x82/0xb0 [ 6396.245103] ? memset+0x1f/0x40 [ 6396.245616] ? __nla_validate_parse+0x98/0x1ab0 [ 6396.246297] ? register_lock_class+0x19e0/0x19e0 [ 6396.246986] ? __kasan_slab_free+0x111/0x150 [ 6396.247629] ? kfree+0xce/0x2f0 [ 6396.248129] ? netlink_trim+0x196/0x1f0 [ 6396.248715] ? nla_memcpy+0x90/0x90 [ 6396.249258] ? register_lock_class+0x19e0/0x19e0 [ 6396.249958] ? deref_stack_reg+0x9c/0xd0 [ 6396.250556] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [ 6396.251387] ? entry_SYSCALL_64_after_hwframe+0x49/0xb3 [ 6396.252154] __rtnl_newlink+0x9c5/0x1270 [ ... ] Reported-by: [email protected] Fixes: b76cdba ("[PATCH] bonding: add sysfs functionality to bonding (large)") Signed-off-by: Taehee Yoo <[email protected]>

When starting at 744MHz, the Mali 450 core crashes on S805X based boards: lima d00c0000.gpu: IRQ ppmmu3 not found lima d00c0000.gpu: IRQ ppmmu4 not found lima d00c0000.gpu: IRQ ppmmu5 not found lima d00c0000.gpu: IRQ ppmmu6 not found lima d00c0000.gpu: IRQ ppmmu7 not found Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.2+ torvalds#492 Hardware name: Libre Computer AML-S805X-AC (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : lima_gp_init+0x28/0x188 ... Call trace: lima_gp_init+0x28/0x188 lima_device_init+0x334/0x534 lima_pdev_probe+0xa4/0xe4 ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reverting to a safer 666Mhz frequency on the S805X that doesn't use the GP0 PLL makes it more stable. Fixes: fd47716 ("ARM64: dts: add S805X based P241 board") Fixes: 0449b8e ("arm64: dts: meson: add libretech aml-s805x-ac board") Signed-off-by: Neil Armstrong <[email protected]>

When starting at 744MHz, the Mali 450 core crashes on S805X based boards: lima d00c0000.gpu: IRQ ppmmu3 not found lima d00c0000.gpu: IRQ ppmmu4 not found lima d00c0000.gpu: IRQ ppmmu5 not found lima d00c0000.gpu: IRQ ppmmu6 not found lima d00c0000.gpu: IRQ ppmmu7 not found Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.2+ torvalds#492 Hardware name: Libre Computer AML-S805X-AC (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : lima_gp_init+0x28/0x188 ... Call trace: lima_gp_init+0x28/0x188 lima_device_init+0x334/0x534 lima_pdev_probe+0xa4/0xe4 ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reverting to a safer 666Mhz frequency on the S805X that doesn't use the GP0 PLL makes it more stable. Fixes: fd47716 ("ARM64: dts: add S805X based P241 board") Fixes: 0449b8e ("arm64: dts: meson: add libretech aml-s805x-ac board") Signed-off-by: Neil Armstrong <[email protected]> Signed-off-by: Kevin Hilman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

[ Upstream commit b2037da ] When starting at 744MHz, the Mali 450 core crashes on S805X based boards: lima d00c0000.gpu: IRQ ppmmu3 not found lima d00c0000.gpu: IRQ ppmmu4 not found lima d00c0000.gpu: IRQ ppmmu5 not found lima d00c0000.gpu: IRQ ppmmu6 not found lima d00c0000.gpu: IRQ ppmmu7 not found Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.2+ torvalds#492 Hardware name: Libre Computer AML-S805X-AC (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : lima_gp_init+0x28/0x188 ... Call trace: lima_gp_init+0x28/0x188 lima_device_init+0x334/0x534 lima_pdev_probe+0xa4/0xe4 ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reverting to a safer 666Mhz frequency on the S805X that doesn't use the GP0 PLL makes it more stable. Fixes: fd47716 ("ARM64: dts: add S805X based P241 board") Fixes: 0449b8e ("arm64: dts: meson: add libretech aml-s805x-ac board") Signed-off-by: Neil Armstrong <[email protected]> Signed-off-by: Kevin Hilman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>

When starting at 744MHz, the Mali 450 core crashes on S805X based boards: lima d00c0000.gpu: IRQ ppmmu3 not found lima d00c0000.gpu: IRQ ppmmu4 not found lima d00c0000.gpu: IRQ ppmmu5 not found lima d00c0000.gpu: IRQ ppmmu6 not found lima d00c0000.gpu: IRQ ppmmu7 not found Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.2+ torvalds#492 Hardware name: Libre Computer AML-S805X-AC (DT) pstate: 40000005 (nZcv daif -PAN -UAO) pc : lima_gp_init+0x28/0x188 ... Call trace: lima_gp_init+0x28/0x188 lima_device_init+0x334/0x534 lima_pdev_probe+0xa4/0xe4 ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b Reverting to a safer 666Mhz frequency on the S805X that doesn't use the GP0 PLL makes it more stable. Fixes: fd47716 ("ARM64: dts: add S805X based P241 board") Fixes: 0449b8e ("arm64: dts: meson: add libretech aml-s805x-ac board") Signed-off-by: Neil Armstrong <[email protected]> Signed-off-by: Kevin Hilman <[email protected]> Link: https://lore.kernel.org/r/[email protected]

rust: remove usage of `const_raw_ptr_deref`

If the starting position of our insert range happens to be in the hole between the two ext4_extent_idx, because the lblk of the ext4_extent in the previous ext4_extent_idx is always less than the start, which leads to the "extent" variable access across the boundary, the following UAF is triggered: ================================================================== BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790 Read of size 4 at addr ffff88819807a008 by task fallocate/8010 CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ torvalds#492 Call Trace: dump_stack+0x7d/0xa3 print_address_description.constprop.0+0x1e/0x220 kasan_report.cold+0x67/0x7f ext4_ext_shift_extents+0x257/0x790 ext4_insert_range+0x5b6/0x700 ext4_fallocate+0x39e/0x3d0 vfs_fallocate+0x26f/0x470 ksys_fallocate+0x3a/0x70 __x64_sys_fallocate+0x4f/0x60 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ================================================================== To solve this issue, when the ee_block of the last extent is less than the start, exit the loop in advance to avoid UAF. Fixes: 331573f ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Signed-off-by: Baokun Li <[email protected]> Signed-off-by: Zhihao Cheng <[email protected]>

If the starting position of our insert range happens to be in the hole between the two ext4_extent_idx, because the lblk of the ext4_extent in the previous ext4_extent_idx is always less than the start, which leads to the "extent" variable access across the boundary, the following UAF is triggered: ================================================================== BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790 Read of size 4 at addr ffff88819807a008 by task fallocate/8010 CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ torvalds#492 Call Trace: dump_stack+0x7d/0xa3 print_address_description.constprop.0+0x1e/0x220 kasan_report.cold+0x67/0x7f ext4_ext_shift_extents+0x257/0x790 ext4_insert_range+0x5b6/0x700 ext4_fallocate+0x39e/0x3d0 vfs_fallocate+0x26f/0x470 ksys_fallocate+0x3a/0x70 __x64_sys_fallocate+0x4f/0x60 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ================================================================== For right shifts, we can divide them into the following situations： 1. When the first ee_block of ext4_extent_idx is greater than or equal to start, make right shifts directly from the first ee_block. 1) If it is greater than start, we need to continue searching in the previous ext4_extent_idx. 2) If it is equal to start, we can exit the loop (iterator=NULL). 2. When the first ee_block of ext4_extent_idx is less than start, then traverse from the last extent to find the first extent whose ee_block is less than start. 1) If extent is still the last extent after traversal, it means that the last ee_block of ext4_extent_idx is less than start, that is, start is located in the hole between idx and (idx+1), so we can exit the loop directly (break) without right shifts. 2) Otherwise, make right shifts at the corresponding position of the found extent, and then exit the loop (iterator=NULL). Fixes: 331573f ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: [email protected] # v4.2+ Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Baokun Li <[email protected]>

If the starting position of our insert range happens to be in the hole between the two ext4_extent_idx, because the lblk of the ext4_extent in the previous ext4_extent_idx is always less than the start, which leads to the "extent" variable access across the boundary, the following UAF is triggered: ================================================================== BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790 Read of size 4 at addr ffff88819807a008 by task fallocate/8010 CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ torvalds#492 Call Trace: dump_stack+0x7d/0xa3 print_address_description.constprop.0+0x1e/0x220 kasan_report.cold+0x67/0x7f ext4_ext_shift_extents+0x257/0x790 ext4_insert_range+0x5b6/0x700 ext4_fallocate+0x39e/0x3d0 vfs_fallocate+0x26f/0x470 ksys_fallocate+0x3a/0x70 __x64_sys_fallocate+0x4f/0x60 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ================================================================== For right shifts, we can divide them into the following situations： 1. When the first ee_block of ext4_extent_idx is greater than or equal to start, make right shifts directly from the first ee_block. 1) If it is greater than start, we need to continue searching in the previous ext4_extent_idx. 2) If it is equal to start, we can exit the loop (iterator=NULL). 2. When the first ee_block of ext4_extent_idx is less than start, then traverse from the last extent to find the first extent whose ee_block is less than start. 1) If extent is still the last extent after traversal, it means that the last ee_block of ext4_extent_idx is less than start, that is, start is located in the hole between idx and (idx+1), so we can exit the loop directly (break) without right shifts. 2) Otherwise, make right shifts at the corresponding position of the found extent, and then exit the loop (iterator=NULL). Fixes: 331573f ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: [email protected] # v4.2+ Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Baokun Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]>

commit f6b1a1c upstream. If the starting position of our insert range happens to be in the hole between the two ext4_extent_idx, because the lblk of the ext4_extent in the previous ext4_extent_idx is always less than the start, which leads to the "extent" variable access across the boundary, the following UAF is triggered: ================================================================== BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790 Read of size 4 at addr ffff88819807a008 by task fallocate/8010 CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ torvalds#492 Call Trace: dump_stack+0x7d/0xa3 print_address_description.constprop.0+0x1e/0x220 kasan_report.cold+0x67/0x7f ext4_ext_shift_extents+0x257/0x790 ext4_insert_range+0x5b6/0x700 ext4_fallocate+0x39e/0x3d0 vfs_fallocate+0x26f/0x470 ksys_fallocate+0x3a/0x70 __x64_sys_fallocate+0x4f/0x60 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ================================================================== For right shifts, we can divide them into the following situations： 1. When the first ee_block of ext4_extent_idx is greater than or equal to start, make right shifts directly from the first ee_block. 1) If it is greater than start, we need to continue searching in the previous ext4_extent_idx. 2) If it is equal to start, we can exit the loop (iterator=NULL). 2. When the first ee_block of ext4_extent_idx is less than start, then traverse from the last extent to find the first extent whose ee_block is less than start. 1) If extent is still the last extent after traversal, it means that the last ee_block of ext4_extent_idx is less than start, that is, start is located in the hole between idx and (idx+1), so we can exit the loop directly (break) without right shifts. 2) Otherwise, make right shifts at the corresponding position of the found extent, and then exit the loop (iterator=NULL). Fixes: 331573f ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: [email protected] # v4.2+ Signed-off-by: Zhihao Cheng <[email protected]> Signed-off-by: Baokun Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Theodore Ts'o <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Add a test case for the jmp32/k fix to ensure selftests have coverage. Before fix: # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k [...] ./test_progs -t verifier_or_jmp32_k tester_init:PASS:tester_log_buf 0 nsec process_subtest:PASS:obj_open_mem 0 nsec process_subtest:PASS:specs_alloc 0 nsec run_subtest:PASS:obj_open_mem 0 nsec run_subtest:FAIL:unexpected_load_success unexpected success: 0 torvalds#492/1 verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:FAIL torvalds#492 verifier_or_jmp32_k:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED After fix: # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k [...] ./test_progs -t verifier_or_jmp32_k torvalds#492/1 verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:OK torvalds#492 verifier_or_jmp32_k:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]>

Add a test case for the jmp32/k fix to ensure selftests have coverage. Before fix: # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k [...] ./test_progs -t verifier_or_jmp32_k tester_init:PASS:tester_log_buf 0 nsec process_subtest:PASS:obj_open_mem 0 nsec process_subtest:PASS:specs_alloc 0 nsec run_subtest:PASS:obj_open_mem 0 nsec run_subtest:FAIL:unexpected_load_success unexpected success: 0 torvalds#492/1 verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:FAIL torvalds#492 verifier_or_jmp32_k:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED After fix: # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k [...] ./test_progs -t verifier_or_jmp32_k torvalds#492/1 verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:OK torvalds#492 verifier_or_jmp32_k:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]>

Add a test case for the jmp32/k fix to ensure selftests have coverage. Before fix: # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k [...] ./test_progs -t verifier_or_jmp32_k tester_init:PASS:tester_log_buf 0 nsec process_subtest:PASS:obj_open_mem 0 nsec process_subtest:PASS:specs_alloc 0 nsec run_subtest:PASS:obj_open_mem 0 nsec run_subtest:FAIL:unexpected_load_success unexpected success: 0 torvalds#492/1 verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:FAIL torvalds#492 verifier_or_jmp32_k:FAIL Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED After fix: # ./vmtest.sh -- ./test_progs -t verifier_or_jmp32_k [...] ./test_progs -t verifier_or_jmp32_k torvalds#492/1 verifier_or_jmp32_k/or_jmp32_k: bit ops + branch on unknown value:OK torvalds#492 verifier_or_jmp32_k:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

It's observed that a crash occurs during hot-remove a memory device, in which user is accessing the hugetlb. See calltrace as following: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:do_user_addr_fault+0x2a0/0x790 Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x8d/0x190 ? do_user_addr_fault+0x2a0/0x790 ? report_bug+0x1c3/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? do_user_addr_fault+0x2a0/0x790 ? exc_page_fault+0x31/0x200 exc_page_fault+0x68/0x200 <...snip...> BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI ---[ end trace 0000000000000000 ]--- BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dentry_name+0x1f4/0x440 <...snip...> ? dentry_name+0x2fa/0x440 vsnprintf+0x1f3/0x4f0 vprintk_store+0x23a/0x540 vprintk_emit+0x6d/0x330 _printk+0x58/0x80 dump_mapping+0x10b/0x1a0 ? __pfx_free_object_rcu+0x10/0x10 __dump_page+0x26b/0x3e0 ? vprintk_emit+0xe0/0x330 ? _printk+0x58/0x80 ? dump_page+0x17/0x50 dump_page+0x17/0x50 do_migrate_range+0x2f7/0x7f0 ? do_migrate_range+0x42/0x7f0 ? offline_pages+0x2f4/0x8c0 offline_pages+0x60a/0x8c0 memory_subsys_offline+0x9f/0x1c0 ? lockdep_hardirqs_on+0x77/0x100 ? _raw_spin_unlock_irqrestore+0x38/0x60 device_offline+0xe3/0x110 state_store+0x6e/0xc0 kernfs_fop_write_iter+0x143/0x200 vfs_write+0x39f/0x560 ksys_write+0x65/0xf0 do_syscall_64+0x62/0x130 Previously, some sanity check have been done in dump_mapping() before the print facility parsing '%pd' though, it's still possible to run into an invalid dentry.d_name.name. Since dump_mapping() only needs to dump the filename only, retrieve it by itself in a safer way to prevent an unnecessary crash. Note that either retrieving the filename with '%pd' or strncpy_from_kernel_nofault(), the filename could be unreliable. Signed-off-by: Li Zhijian <[email protected]>

It's observed that a crash occurs during hot-remove a memory device, in which user is accessing the hugetlb. See calltrace as following: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:do_user_addr_fault+0x2a0/0x790 Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x8d/0x190 ? do_user_addr_fault+0x2a0/0x790 ? report_bug+0x1c3/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? do_user_addr_fault+0x2a0/0x790 ? exc_page_fault+0x31/0x200 exc_page_fault+0x68/0x200 <...snip...> BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI ---[ end trace 0000000000000000 ]--- BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dentry_name+0x1f4/0x440 <...snip...> ? dentry_name+0x2fa/0x440 vsnprintf+0x1f3/0x4f0 vprintk_store+0x23a/0x540 vprintk_emit+0x6d/0x330 _printk+0x58/0x80 dump_mapping+0x10b/0x1a0 ? __pfx_free_object_rcu+0x10/0x10 __dump_page+0x26b/0x3e0 ? vprintk_emit+0xe0/0x330 ? _printk+0x58/0x80 ? dump_page+0x17/0x50 dump_page+0x17/0x50 do_migrate_range+0x2f7/0x7f0 ? do_migrate_range+0x42/0x7f0 ? offline_pages+0x2f4/0x8c0 offline_pages+0x60a/0x8c0 memory_subsys_offline+0x9f/0x1c0 ? lockdep_hardirqs_on+0x77/0x100 ? _raw_spin_unlock_irqrestore+0x38/0x60 device_offline+0xe3/0x110 state_store+0x6e/0xc0 kernfs_fop_write_iter+0x143/0x200 vfs_write+0x39f/0x560 ksys_write+0x65/0xf0 do_syscall_64+0x62/0x130 Previously, some sanity check have been done in dump_mapping() before the print facility parsing '%pd' though, it's still possible to run into an invalid dentry.d_name.name. Since dump_mapping() only needs to dump the filename only, retrieve it by itself in a safer way to prevent an unnecessary crash. Note that either retrieving the filename with '%pd' or strncpy_from_kernel_nofault(), the filename could be unreliable. Signed-off-by: Li Zhijian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>

[ Upstream commit 7f7b850 ] It's observed that a crash occurs during hot-remove a memory device, in which user is accessing the hugetlb. See calltrace as following: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:do_user_addr_fault+0x2a0/0x790 Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x8d/0x190 ? do_user_addr_fault+0x2a0/0x790 ? report_bug+0x1c3/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? do_user_addr_fault+0x2a0/0x790 ? exc_page_fault+0x31/0x200 exc_page_fault+0x68/0x200 <...snip...> BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI ---[ end trace 0000000000000000 ]--- BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dentry_name+0x1f4/0x440 <...snip...> ? dentry_name+0x2fa/0x440 vsnprintf+0x1f3/0x4f0 vprintk_store+0x23a/0x540 vprintk_emit+0x6d/0x330 _printk+0x58/0x80 dump_mapping+0x10b/0x1a0 ? __pfx_free_object_rcu+0x10/0x10 __dump_page+0x26b/0x3e0 ? vprintk_emit+0xe0/0x330 ? _printk+0x58/0x80 ? dump_page+0x17/0x50 dump_page+0x17/0x50 do_migrate_range+0x2f7/0x7f0 ? do_migrate_range+0x42/0x7f0 ? offline_pages+0x2f4/0x8c0 offline_pages+0x60a/0x8c0 memory_subsys_offline+0x9f/0x1c0 ? lockdep_hardirqs_on+0x77/0x100 ? _raw_spin_unlock_irqrestore+0x38/0x60 device_offline+0xe3/0x110 state_store+0x6e/0xc0 kernfs_fop_write_iter+0x143/0x200 vfs_write+0x39f/0x560 ksys_write+0x65/0xf0 do_syscall_64+0x62/0x130 Previously, some sanity check have been done in dump_mapping() before the print facility parsing '%pd' though, it's still possible to run into an invalid dentry.d_name.name. Since dump_mapping() only needs to dump the filename only, retrieve it by itself in a safer way to prevent an unnecessary crash. Note that either retrieving the filename with '%pd' or strncpy_from_kernel_nofault(), the filename could be unreliable. Signed-off-by: Li Zhijian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

The driver needs to set the correct max_segment_size; otherwise debug_dma_map_sg() will complain about the over-mapping of the AMDGPU sg length as following: WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370 [ 364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd [ 364.049532] ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii [ 364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G OE 6.10.0-custom torvalds#492 [ 364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370 [ 364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05 [ 364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286 [ 364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027 [ 364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680 [ 364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930 [ 364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000 [ 364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800 [ 364.049599] FS: 00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000 [ 364.049601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0 [ 364.049605] Call Trace: [ 364.049607] <TASK> [ 364.049609] ? show_regs+0x6d/0x80 [ 364.049614] ? __warn+0x8c/0x140 [ 364.049618] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049621] ? report_bug+0x193/0x1a0 [ 364.049627] ? handle_bug+0x46/0x80 [ 364.049631] ? exc_invalid_op+0x1d/0x80 [ 364.049635] ? asm_exc_invalid_op+0x1f/0x30 [ 364.049642] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049647] __dma_map_sg_attrs+0x90/0xe0 [ 364.049651] dma_map_sgtable+0x25/0x40 [ 364.049654] amdgpu_bo_move+0x59a/0x850 [amdgpu] [ 364.049935] ? srso_return_thunk+0x5/0x5f [ 364.049939] ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu] [ 364.050095] ttm_bo_handle_move_mem+0xc3/0x180 [ttm] [ 364.050103] ttm_bo_validate+0xc1/0x160 [ttm] [ 364.050108] ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu] [ 364.050263] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu] [ 364.050473] kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu] [ 364.050680] kfd_ioctl+0x3c2/0x530 [amdgpu] [ 364.050866] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 364.051054] ? srso_return_thunk+0x5/0x5f [ 364.051057] ? tomoyo_file_ioctl+0x20/0x30 [ 364.051063] __x64_sys_ioctl+0x9c/0xd0 [ 364.051068] x64_sys_call+0x1219/0x20d0 [ 364.051073] do_syscall_64+0x51/0x120 [ 364.051077] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 364.051081] RIP: 0033:0x7fb2d2f1a94f Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

[ Upstream commit 7f7b850 ] It's observed that a crash occurs during hot-remove a memory device, in which user is accessing the hugetlb. See calltrace as following: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 14045 at arch/x86/mm/fault.c:1278 do_user_addr_fault+0x2a0/0x790 Modules linked in: kmem device_dax cxl_mem cxl_pmem cxl_port cxl_pci dax_hmem dax_pmem nd_pmem cxl_acpi nd_btt cxl_core crc32c_intel nvme virtiofs fuse nvme_core nfit libnvdimm dm_multipath scsi_dh_rdac scsi_dh_emc s mirror dm_region_hash dm_log dm_mod CPU: 1 PID: 14045 Comm: daxctl Not tainted 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:do_user_addr_fault+0x2a0/0x790 Code: 48 8b 00 a8 04 0f 84 b5 fe ff ff e9 1c ff ff ff 4c 89 e9 4c 89 e2 be 01 00 00 00 bf 02 00 00 00 e8 b5 ef 24 00 e9 42 fe ff ff <0f> 0b 48 83 c4 08 4c 89 ea 48 89 ee 4c 89 e7 5b 5d 41 5c 41 5d 41 RSP: 0000:ffffc90000a575f0 EFLAGS: 00010046 RAX: ffff88800c303600 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffffffff82504162 RDI: ffffffff824b2c36 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a57658 R13: 0000000000001000 R14: ffff88800bc2e040 R15: 0000000000000000 FS: 00007f51cb57d880(0000) GS:ffff88807fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001000 CR3: 00000000072e2004 CR4: 00000000001706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __warn+0x8d/0x190 ? do_user_addr_fault+0x2a0/0x790 ? report_bug+0x1c3/0x1d0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? do_user_addr_fault+0x2a0/0x790 ? exc_page_fault+0x31/0x200 exc_page_fault+0x68/0x200 <...snip...> BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI ---[ end trace 0000000000000000 ]--- BUG: unable to handle page fault for address: 0000000000001000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000000ad92067 P4D 800000000ad92067 PUD 7677067 PMD 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 14045 Comm: daxctl Kdump: loaded Tainted: G W 6.10.0-rc2-lizhijian+ torvalds#492 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:dentry_name+0x1f4/0x440 <...snip...> ? dentry_name+0x2fa/0x440 vsnprintf+0x1f3/0x4f0 vprintk_store+0x23a/0x540 vprintk_emit+0x6d/0x330 _printk+0x58/0x80 dump_mapping+0x10b/0x1a0 ? __pfx_free_object_rcu+0x10/0x10 __dump_page+0x26b/0x3e0 ? vprintk_emit+0xe0/0x330 ? _printk+0x58/0x80 ? dump_page+0x17/0x50 dump_page+0x17/0x50 do_migrate_range+0x2f7/0x7f0 ? do_migrate_range+0x42/0x7f0 ? offline_pages+0x2f4/0x8c0 offline_pages+0x60a/0x8c0 memory_subsys_offline+0x9f/0x1c0 ? lockdep_hardirqs_on+0x77/0x100 ? _raw_spin_unlock_irqrestore+0x38/0x60 device_offline+0xe3/0x110 state_store+0x6e/0xc0 kernfs_fop_write_iter+0x143/0x200 vfs_write+0x39f/0x560 ksys_write+0x65/0xf0 do_syscall_64+0x62/0x130 Previously, some sanity check have been done in dump_mapping() before the print facility parsing '%pd' though, it's still possible to run into an invalid dentry.d_name.name. Since dump_mapping() only needs to dump the filename only, retrieve it by itself in a safer way to prevent an unnecessary crash. Note that either retrieving the filename with '%pd' or strncpy_from_kernel_nofault(), the filename could be unreliable. Signed-off-by: Li Zhijian <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]> Signed-off-by: Sasha Levin <[email protected]> [Xiangyu: Bp to fix CVE: CVE-2024-49934, modified strscpy step due to 6.1/6.6 need pass the max len to strscpy] Signed-off-by: Xiangyu Chen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

The SVM DMA device map direction should be set the same as the DMA unmap setting, otherwise the DMA core will report the following warning. Before finialize this solution, there're some discussion on the DMA mapping type(stream-based or coherent) in this KFD migration case, followed by https://lore.kernel.org/all/04d4ab32 [email protected]/T/. As there's no dma_sync_single_for_*() in the DMA buffer accessed that because this migration operation should be sync properly and automatically. Give that there's might not be a performance problem in various cache sync policy of DMA sync. Therefore, in order to simplify the DMA direction setting alignment, let's set the DMA map direction as BIDIRECTIONAL. [ 150.834218] WARNING: CPU: 8 PID: 1812 at kernel/dma/debug.c:1028 check_unmap+0x1cc/0x930 [ 150.834225] Modules linked in: amdgpu(OE) amdxcp drm_exec(OE) gpu_sched drm_buddy(OE) drm_ttm_helper(OE) ttm(OE) drm_suballoc_helper(OE) drm_display_helper(OE) drm_kms_helper(OE) i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc sch_fq_codel intel_rapl_msr amd_atl intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd snd_pci_acp6x snd_hda_codec snd_acp_config snd_hda_core snd_hwdep snd_soc_acpi kvm_amd sunrpc snd_pcm kvm binfmt_misc snd_seq_midi crct10dif_pclmul snd_seq_midi_event ghash_clmulni_intel sha512_ssse3 snd_rawmidi nls_iso8859_1 sha256_ssse3 sha1_ssse3 snd_seq aesni_intel snd_seq_device crypto_simd snd_timer cryptd input_leds [ 150.834310] wmi_bmof serio_raw k10temp rapl snd sp5100_tco ipmi_devintf soundcore ccp ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport efi_pstore drm(OE) ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii [ 150.834354] CPU: 8 PID: 1812 Comm: rocrtst64 Tainted: G OE 6.10.0-custom torvalds#492 [ 150.834358] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 150.834360] RIP: 0010:check_unmap+0x1cc/0x930 [ 150.834363] Code: c0 4c 89 4d c8 e8 34 bf 86 00 4c 8b 4d c8 4c 8b 45 c0 48 8b 4d b8 48 89 c6 41 57 4c 89 ea 48 c7 c7 80 49 b4 84 e8 b4 81 f3 ff <0f> 0b 48 c7 c7 04 83 ac 84 e8 76 ba fc ff 41 8b 76 4c 49 8d 7e 50 [ 150.834365] RSP: 0018:ffffaac5023739e0 EFLAGS: 00010086 [ 150.834368] RAX: 0000000000000000 RBX: ffffffff8566a2e0 RCX: 0000000000000027 [ 150.834370] RDX: ffff8f6a8f621688 RSI: 0000000000000001 RDI: ffff8f6a8f621680 [ 150.834372] RBP: ffffaac502373a30 R08: 00000000000000c9 R09: ffffaac502373850 [ 150.834373] R10: ffffaac502373848 R11: ffffffff84f46328 R12: ffffaac502373a40 [ 150.834375] R13: ffff8f6741045330 R14: ffff8f6741a77700 R15: ffffffff84ac831b [ 150.834377] FS: 00007faf0fc94c00(0000) GS:ffff8f6a8f600000(0000) knlGS:0000000000000000 [ 150.834379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 150.834381] CR2: 00007faf0b600020 CR3: 000000010a52e000 CR4: 0000000000350ef0 [ 150.834383] Call Trace: [ 150.834385] <TASK> [ 150.834387] ? show_regs+0x6d/0x80 [ 150.834393] ? __warn+0x8c/0x140 [ 150.834397] ? check_unmap+0x1cc/0x930 [ 150.834400] ? report_bug+0x193/0x1a0 [ 150.834406] ? handle_bug+0x46/0x80 [ 150.834410] ? exc_invalid_op+0x1d/0x80 [ 150.834413] ? asm_exc_invalid_op+0x1f/0x30 [ 150.834420] ? check_unmap+0x1cc/0x930 [ 150.834425] debug_dma_unmap_page+0x86/0x90 [ 150.834431] ? srso_return_thunk+0x5/0x5f [ 150.834435] ? rmap_walk+0x28/0x50 [ 150.834438] ? srso_return_thunk+0x5/0x5f [ 150.834441] ? remove_migration_ptes+0x79/0x80 [ 150.834445] ? srso_return_thunk+0x5/0x5f [ 150.834448] dma_unmap_page_attrs+0xfa/0x1d0 [ 150.834453] svm_range_dma_unmap_dev+0x8a/0xf0 [amdgpu] [ 150.834710] svm_migrate_ram_to_vram+0x361/0x740 [amdgpu] [ 150.834914] svm_migrate_to_vram+0xa8/0xe0 [amdgpu] [ 150.835111] svm_range_set_attr+0xff2/0x1450 [amdgpu] [ 150.835311] svm_ioctl+0x4a/0x50 [amdgpu] [ 150.835510] kfd_ioctl_svm+0x54/0x90 [amdgpu] [ 150.835701] kfd_ioctl+0x3c2/0x530 [amdgpu] [ 150.835888] ? __pfx_kfd_ioctl_svm+0x10/0x10 [amdgpu] [ 150.836075] ? srso_return_thunk+0x5/0x5f [ 150.836080] ? tomoyo_file_ioctl+0x20/0x30 [ 150.836086] __x64_sys_ioctl+0x9c/0xd0 [ 150.836091] x64_sys_call+0x1219/0x20d0 [ 150.836095] do_syscall_64+0x51/0x120 [ 150.836098] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 150.836102] RIP: 0033:0x7faf0f11a94f [ 150.836105] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 150.836107] RSP: 002b:00007ffeced26bc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 150.836110] RAX: ffffffffffffffda RBX: 000055c683528fb0 RCX: 00007faf0f11a94f [ 150.836112] RDX: 00007ffeced26c60 RSI: 00000000c0484b20 RDI: 0000000000000003 [ 150.836114] RBP: 00007ffeced26c50 R08: 0000000000000000 R09: 0000000000000001 [ 150.836115] R10: 0000000000000032 R11: 0000000000000246 R12: 000055c683528bd0 [ 150.836117] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000000 [ 150.836122] </TASK> [ 150.836124] ---[ end trace 0000000000000000 ]--- Signed-off-by: Prike Liang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>

Merge pull request #1 from torvalds/master

b679ccc

Merge

ghost closed this Nov 25, 2017

ojeda added a commit to ojeda/linux that referenced this pull request Sep 27, 2021

Merge pull request torvalds#492 from nbdd0121/rust

3976bd9

rust: remove usage of `const_raw_ptr_deref`

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pull request #1 from torvalds/master #492

Merge pull request #1 from torvalds/master #492

ghost commented Nov 25, 2017

Merge pull request #1 from torvalds/master #492

Merge pull request #1 from torvalds/master #492

Conversation

ghost commented Nov 25, 2017