forked from raspberrypi/linux
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch 8812AU driver to the one from https://github.com/aircrack-ng/rtl8812au #1
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit 467bf30 ] Move another allocation out of regulator_list_mutex-protected region, as reclaim might want to take the same lock. WARNING: possible circular locking dependency detected 5.7.13+ raspberrypi#534 Not tainted ------------------------------------------------------ kswapd0/383 is trying to acquire lock: c0e5d920 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x2c0 but task is already holding lock: c0e38518 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x50 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire.part.11+0x40/0x50 fs_reclaim_acquire+0x24/0x28 kmem_cache_alloc_trace+0x40/0x1e8 regulator_register+0x384/0x1630 devm_regulator_register+0x50/0x84 reg_fixed_voltage_probe+0x248/0x35c [...] other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(regulator_list_mutex); lock(fs_reclaim); lock(regulator_list_mutex); *** DEADLOCK *** [...] 2 locks held by kswapd0/383: #0: c0e38518 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x0/0x50 #1: cb70e5e0 (hctx->srcu){....}-{0:0}, at: hctx_lock+0x60/0xb8 [...] Fixes: 541d052 ("regulator: core: Only support passing enable GPIO descriptors") [this commit only changes context] Fixes: f8702f9 ("regulator: core: Use ww_mutex for regulators locking") [this is when the regulator_list_mutex was introduced in reclaim locking path] Signed-off-by: Michał Mirosław <[email protected]> Link: https://lore.kernel.org/r/41fe6a9670335721b48e8f5195038c3d67a3bf92.1597195321.git.mirq-linux@rere.qmqm.pl Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit d862060 ] To avoid the following kernel panic when calling kmem_cache_create() with a NULL pointer from pool_cache(), Block the rxe_param_set_add() from running if the rdma_rxe module is not initialized. BUG: unable to handle kernel NULL pointer dereference at 000000000000000b PGD 0 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 4 PID: 8512 Comm: modprobe Kdump: loaded Not tainted 4.18.0-231.el8.x86_64 #1 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 10/02/2018 RIP: 0010:kmem_cache_alloc+0xd1/0x1b0 Code: 8b 57 18 45 8b 77 1c 48 8b 5c 24 30 0f 1f 44 00 00 5b 48 89 e8 5d 41 5c 41 5d 41 5e 41 5f c3 81 e3 00 00 10 00 75 0e 4d 89 fe <41> f6 47 0b 04 0f 84 6c ff ff ff 4c 89 ff e8 cc da 01 00 49 89 c6 RSP: 0018:ffffa2b8c773f9d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000005 RDX: 0000000000000004 RSI: 00000000006080c0 RDI: 0000000000000000 RBP: ffff8ea0a8634fd0 R08: ffffa2b8c773f988 R09: 00000000006000c0 R10: 0000000000000000 R11: 0000000000000230 R12: 00000000006080c0 R13: ffffffffc0a97fc8 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f9138ed9740(0000) GS:ffff8ea4ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000000b CR3: 000000046d59a000 CR4: 00000000003406e0 Call Trace: rxe_alloc+0xc8/0x160 [rdma_rxe] rxe_get_dma_mr+0x25/0xb0 [rdma_rxe] __ib_alloc_pd+0xcb/0x160 [ib_core] ib_mad_init_device+0x296/0x8b0 [ib_core] add_client_context+0x11a/0x160 [ib_core] enable_device_and_get+0xdc/0x1d0 [ib_core] ib_register_device+0x572/0x6b0 [ib_core] ? crypto_create_tfm+0x32/0xe0 ? crypto_create_tfm+0x7a/0xe0 ? crypto_alloc_tfm+0x58/0xf0 rxe_register_device+0x19d/0x1c0 [rdma_rxe] rxe_net_add+0x3d/0x70 [rdma_rxe] ? dev_get_by_name_rcu+0x73/0x90 rxe_param_set_add+0xaf/0xc0 [rdma_rxe] parse_args+0x179/0x370 ? ref_module+0x1b0/0x1b0 load_module+0x135e/0x17e0 ? ref_module+0x1b0/0x1b0 ? __do_sys_init_module+0x13b/0x180 __do_sys_init_module+0x13b/0x180 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7f9137ed296e This can be triggered if a user tries to use the 'module option' which is not actually a real module option but some idiotic (and thankfully no obsolete) sysfs interface. Fixes: 8700e3e ("Soft RoCE driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kamal Heib <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit 7ad92f6 ] This patch addresses an irq free warning and null pointer dereference error problem when nvme devices got timeout error during initialization. This problem happens when nvme_timeout() function is called while nvme_reset_work() is still in execution. This patch fixed the problem by setting flag of the problematic request to NVME_REQ_CANCELLED before calling nvme_dev_disable() to make sure __nvme_submit_sync_cmd() returns an error code and let nvme_submit_sync_cmd() fail gracefully. The following is console output. [ 62.472097] nvme nvme0: I/O 13 QID 0 timeout, disable controller [ 62.488796] nvme nvme0: could not set timestamp (881) [ 62.494888] ------------[ cut here ]------------ [ 62.495142] Trying to free already-free IRQ 11 [ 62.495366] WARNING: CPU: 0 PID: 7 at kernel/irq/manage.c:1751 free_irq+0x1f7/0x370 [ 62.495742] Modules linked in: [ 62.495902] CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 5.8.0+ #8 [ 62.496206] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 62.496772] Workqueue: nvme-reset-wq nvme_reset_work [ 62.497019] RIP: 0010:free_irq+0x1f7/0x370 [ 62.497223] Code: e8 ce 49 11 00 48 83 c4 08 4c 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 44 89 f6 48 c70 [ 62.498133] RSP: 0000:ffffa96800043d40 EFLAGS: 00010086 [ 62.498391] RAX: 0000000000000000 RBX: ffff9b87fc458400 RCX: 0000000000000000 [ 62.498741] RDX: 0000000000000001 RSI: 0000000000000096 RDI: ffffffff9693d72c [ 62.499091] RBP: ffff9b87fd4c8f60 R08: ffffa96800043bfd R09: 0000000000000163 [ 62.499440] R10: ffffa96800043bf8 R11: ffffa96800043bfd R12: ffff9b87fd4c8e00 [ 62.499790] R13: ffff9b87fd4c8ea4 R14: 000000000000000b R15: ffff9b87fd76b000 [ 62.500140] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000 [ 62.500534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 62.500816] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0 [ 62.501165] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 62.501515] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 62.501864] Call Trace: [ 62.501993] pci_free_irq+0x13/0x20 [ 62.502167] nvme_reset_work+0x5d0/0x12a0 [ 62.502369] ? update_load_avg+0x59/0x580 [ 62.502569] ? ttwu_queue_wakelist+0xa8/0xc0 [ 62.502780] ? try_to_wake_up+0x1a2/0x450 [ 62.502979] process_one_work+0x1d2/0x390 [ 62.503179] worker_thread+0x45/0x3b0 [ 62.503361] ? process_one_work+0x390/0x390 [ 62.503568] kthread+0xf9/0x130 [ 62.503726] ? kthread_park+0x80/0x80 [ 62.503911] ret_from_fork+0x22/0x30 [ 62.504090] ---[ end trace de9ed4a70f8d71e2 ]--- [ 123.912275] nvme nvme0: I/O 12 QID 0 timeout, disable controller [ 123.914670] nvme nvme0: 1/0/0 default/read/poll queues [ 123.916310] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 123.917469] #PF: supervisor write access in kernel mode [ 123.917725] #PF: error_code(0x0002) - not-present page [ 123.917976] PGD 0 P4D 0 [ 123.918109] Oops: 0002 [#1] SMP PTI [ 123.918283] CPU: 0 PID: 7 Comm: kworker/u4:0 Tainted: G W 5.8.0+ #8 [ 123.918650] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812dda519-p4 [ 123.919219] Workqueue: nvme-reset-wq nvme_reset_work [ 123.919469] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.919757] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4 [ 123.920657] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286 [ 123.920912] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000 [ 123.921258] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000 [ 123.921602] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000 [ 123.921949] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000 [ 123.922295] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000 [ 123.922641] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000 [ 123.923032] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.923312] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0 [ 123.923660] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 123.924007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 123.924353] Call Trace: [ 123.924479] blk_mq_alloc_tag_set+0x137/0x2a0 [ 123.924694] nvme_reset_work+0xed6/0x12a0 [ 123.924898] process_one_work+0x1d2/0x390 [ 123.925099] worker_thread+0x45/0x3b0 [ 123.925280] ? process_one_work+0x390/0x390 [ 123.925486] kthread+0xf9/0x130 [ 123.925642] ? kthread_park+0x80/0x80 [ 123.925825] ret_from_fork+0x22/0x30 [ 123.926004] Modules linked in: [ 123.926158] CR2: 0000000000000000 [ 123.926322] ---[ end trace de9ed4a70f8d71e3 ]--- [ 123.926549] RIP: 0010:__blk_mq_alloc_map_and_request+0x21/0x80 [ 123.926832] Code: 66 0f 1f 84 00 00 00 00 00 41 55 41 54 55 48 63 ee 53 48 8b 47 68 89 ee 48 89 fb 8b4 [ 123.927734] RSP: 0000:ffffa96800043d40 EFLAGS: 00010286 [ 123.927989] RAX: ffff9b87fc4fee40 RBX: ffff9b87fc8cb008 RCX: 0000000000000000 [ 123.928336] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9b87fc618000 [ 123.928679] RBP: 0000000000000000 R08: ffff9b87fdc2c4a0 R09: ffff9b87fc616000 [ 123.929025] R10: 0000000000000000 R11: ffff9b87fffd1500 R12: 0000000000000000 [ 123.929370] R13: 0000000000000000 R14: ffff9b87fc8cb200 R15: ffff9b87fc8cb000 [ 123.929715] FS: 0000000000000000(0000) GS:ffff9b87fdc00000(0000) knlGS:0000000000000000 [ 123.930106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 123.930384] CR2: 0000000000000000 CR3: 000000003aa0a000 CR4: 00000000000006f0 [ 123.930731] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 123.931077] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Co-developed-by: Keith Busch <[email protected]> Signed-off-by: Tong Zhang <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
commit fccc000 upstream. Nikolay reported a lockdep splat in generic/476 that I could reproduce with btrfs/187. ====================================================== WARNING: possible circular locking dependency detected 5.9.0-rc2+ #1 Tainted: G W ------------------------------------------------------ kswapd0/100 is trying to acquire lock: ffff9e8ef38b6268 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330 but task is already holding lock: ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (fs_reclaim){+.+.}-{0:0}: fs_reclaim_acquire+0x65/0x80 slab_pre_alloc_hook.constprop.0+0x20/0x200 kmem_cache_alloc_trace+0x3a/0x1a0 btrfs_alloc_device+0x43/0x210 add_missing_dev+0x20/0x90 read_one_chunk+0x301/0x430 btrfs_read_sys_array+0x17b/0x1b0 open_ctree+0xa62/0x1896 btrfs_mount_root.cold+0x12/0xea legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x379 legacy_get_tree+0x30/0x50 vfs_get_tree+0x28/0xc0 path_mount+0x434/0xc00 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}: __mutex_lock+0x7e/0x7e0 btrfs_chunk_alloc+0x125/0x3a0 find_free_extent+0xdf6/0x1210 btrfs_reserve_extent+0xb3/0x1b0 btrfs_alloc_tree_block+0xb0/0x310 alloc_tree_block_no_bg_flush+0x4a/0x60 __btrfs_cow_block+0x11a/0x530 btrfs_cow_block+0x104/0x220 btrfs_search_slot+0x52e/0x9d0 btrfs_lookup_inode+0x2a/0x8f __btrfs_update_delayed_inode+0x80/0x240 btrfs_commit_inode_delayed_inode+0x119/0x120 btrfs_evict_inode+0x357/0x500 evict+0xcf/0x1f0 vfs_rmdir.part.0+0x149/0x160 do_rmdir+0x136/0x1a0 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xa9 -> #0 (&delayed_node->mutex){+.+.}-{3:3}: __lock_acquire+0x1184/0x1fa0 lock_acquire+0xa4/0x3d0 __mutex_lock+0x7e/0x7e0 __btrfs_release_delayed_node.part.0+0x3f/0x330 btrfs_evict_inode+0x24c/0x500 evict+0xcf/0x1f0 dispose_list+0x48/0x70 prune_icache_sb+0x44/0x50 super_cache_scan+0x161/0x1e0 do_shrink_slab+0x178/0x3c0 shrink_slab+0x17c/0x290 shrink_node+0x2b2/0x6d0 balance_pgdat+0x30a/0x670 kswapd+0x213/0x4c0 kthread+0x138/0x160 ret_from_fork+0x1f/0x30 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(&fs_info->chunk_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/100: #0: ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 #1: ffffffffa9d65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290 #2: ffff9e8e9da260e0 (&type->s_umount_key#48){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0 stack backtrace: CPU: 1 PID: 100 Comm: kswapd0 Tainted: G W 5.9.0-rc2+ #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack+0x92/0xc8 check_noncircular+0x12d/0x150 __lock_acquire+0x1184/0x1fa0 lock_acquire+0xa4/0x3d0 ? __btrfs_release_delayed_node.part.0+0x3f/0x330 __mutex_lock+0x7e/0x7e0 ? __btrfs_release_delayed_node.part.0+0x3f/0x330 ? __btrfs_release_delayed_node.part.0+0x3f/0x330 ? lock_acquire+0xa4/0x3d0 ? btrfs_evict_inode+0x11e/0x500 ? find_held_lock+0x2b/0x80 __btrfs_release_delayed_node.part.0+0x3f/0x330 btrfs_evict_inode+0x24c/0x500 evict+0xcf/0x1f0 dispose_list+0x48/0x70 prune_icache_sb+0x44/0x50 super_cache_scan+0x161/0x1e0 do_shrink_slab+0x178/0x3c0 shrink_slab+0x17c/0x290 shrink_node+0x2b2/0x6d0 balance_pgdat+0x30a/0x670 kswapd+0x213/0x4c0 ? _raw_spin_unlock_irqrestore+0x46/0x60 ? add_wait_queue_exclusive+0x70/0x70 ? balance_pgdat+0x670/0x670 kthread+0x138/0x160 ? kthread_create_worker_on_cpu+0x40/0x40 ret_from_fork+0x1f/0x30 This is because we are holding the chunk_mutex when we call btrfs_alloc_device, which does a GFP_KERNEL allocation. We don't want to switch that to a GFP_NOFS lock because this is the only place where it matters. So instead use memalloc_nofs_save() around the allocation in order to avoid the lockdep splat. Reported-by: Nikolay Borisov <[email protected]> CC: [email protected] # 4.4+ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit 3c27ea2 ] On Linux 5.9-rc1 I get the following warning with apq8016-sbc: WARNING: CPU: 2 PID: 69 at sound/core/init.c:207 snd_card_new+0x36c/0x3b0 [snd] CPU: 2 PID: 69 Comm: kworker/2:1 Not tainted 5.9.0-rc1 #1 Workqueue: events deferred_probe_work_func pc : snd_card_new+0x36c/0x3b0 [snd] lr : snd_card_new+0xf4/0x3b0 [snd] Call trace: snd_card_new+0x36c/0x3b0 [snd] snd_soc_bind_card+0x340/0x9a0 [snd_soc_core] snd_soc_register_card+0xf4/0x110 [snd_soc_core] devm_snd_soc_register_card+0x44/0xa0 [snd_soc_core] apq8016_sbc_platform_probe+0x11c/0x140 [snd_soc_apq8016_sbc] This warning was introduced in commit 81033c6 ("ALSA: core: Warn on empty module"). It looks like we are supposed to set card->owner to THIS_MODULE. Fix this for all the qcom ASoC drivers. Cc: Srinivas Kandagatla <[email protected]> Fixes: 79119c7 ("ASoC: qcom: Add Storm machine driver") Fixes: bdb052e ("ASoC: qcom: add apq8016 sound card support") Fixes: a6f933f ("ASoC: qcom: apq8096: Add db820c machine driver") Fixes: 6b1687b ("ASoC: qcom: add sdm845 sound card support") Signed-off-by: Stephan Gerhold <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit 8a39e8c ] When compiling with DEBUG=1 on Fedora 32 I'm getting crash for 'perf test signal': Program received signal SIGSEGV, Segmentation fault. 0x0000000000c68548 in __test_function () (gdb) bt #0 0x0000000000c68548 in __test_function () #1 0x00000000004d62e9 in test_function () at tests/bp_signal.c:61 #2 0x00000000004d689a in test__bp_signal (test=0xa8e280 <generic_ ... #3 0x00000000004b7d49 in run_test (test=0xa8e280 <generic_tests+1 ... #4 0x00000000004b7e7f in test_and_print (t=0xa8e280 <generic_test ... #5 0x00000000004b8927 in __cmd_test (argc=1, argv=0x7fffffffdce0, ... ... It's caused by the symbol __test_function being in the ".bss" section: $ readelf -a ./perf | less [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [28] .bss NOBITS 0000000000c356a0 008346a0 00000000000511f8 0000000000000000 WA 0 0 32 $ nm perf | grep __test_function 0000000000c68548 B __test_function I guess most of the time we're just lucky the inline asm ended up in the ".text" section, so making it specific explicit with push and pop section clauses. $ readelf -a ./perf | less [Nr] Name Type Address Offset Size EntSize Flags Link Info Align ... [13] .text PROGBITS 0000000000431240 00031240 0000000000306faa 0000000000000000 AX 0 0 16 $ nm perf | grep __test_function 00000000004d62c8 T __test_function Committer testing: $ readelf -wi ~/bin/perf | grep producer -m1 <c> DW_AT_producer : (indirect string, offset: 0x254a): GNU C99 10.2.1 20200723 (Red Hat 10.2.1-1) -mtune=generic -march=x86-64 -ggdb3 -std=gnu99 -fno-omit-frame-pointer -funwind-tables -fstack-protector-all ^^^^^ ^^^^^ ^^^^^ $ Before: $ perf test signal 20: Breakpoint overflow signal handler : FAILED! $ After: $ perf test signal 20: Breakpoint overflow signal handler : Ok $ Fixes: 8fd34e1 ("perf test: Improve bp_signal") Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit b12eea5 ] The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 raspberrypi#9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 raspberrypi#10 0x559fbbc0109b in run_test tests/builtin-test.c:410 raspberrypi#11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 raspberrypi#12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 raspberrypi#13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 raspberrypi#14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 raspberrypi#15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 raspberrypi#16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 raspberrypi#17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 raspberrypi#18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb11 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit d26383d ] The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 #4 0x560578e07045 in test__pmu tests/pmu.c:155 #5 0x560578de109b in run_test tests/builtin-test.c:410 #6 0x560578de109b in test_and_print tests/builtin-test.c:440 #7 0x560578de401a in __cmd_test tests/builtin-test.c:661 #8 0x560578de401a in cmd_test tests/builtin-test.c:807 raspberrypi#9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 raspberrypi#10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 raspberrypi#11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 raspberrypi#12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 raspberrypi#13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f95 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit 32f6865 ] Running the eBPF test_verifier leads to random errors looking like this: [ 6525.735488] Unexpected kernel BRK exception at EL1 [ 6525.735502] Internal error: ptrace BRK handler: f2000100 [#1] SMP [ 6525.741609] Modules linked in: nls_utf8 cifs libdes libarc4 dns_resolver fscache binfmt_misc nls_ascii nls_cp437 vfat fat aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce gf128mul efi_pstore sha2_ce sha256_arm64 sha1_ce evdev efivars efivarfs ip_tables x_tables autofs4 btrfs blake2b_generic xor xor_neon zstd_compress raid6_pq libcrc32c crc32c_generic ahci xhci_pci libahci xhci_hcd igb libata i2c_algo_bit nvme realtek usbcore nvme_core scsi_mod t10_pi netsec mdio_devres of_mdio gpio_keys fixed_phy libphy gpio_mb86s7x [ 6525.787760] CPU: 3 PID: 7881 Comm: test_verifier Tainted: G W 5.9.0-rc1+ raspberrypi#47 [ 6525.796111] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #1 Jun 6 2020 [ 6525.804812] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) [ 6525.810390] pc : bpf_prog_c3d01833289b6311_F+0xc8/0x9f4 [ 6525.815613] lr : bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c [ 6525.820832] sp : ffff8000130cbb80 [ 6525.824141] x29: ffff8000130cbbb0 x28: 0000000000000000 [ 6525.829451] x27: 000005ef6fcbf39b x26: 0000000000000000 [ 6525.834759] x25: ffff8000130cbb80 x24: ffff800011dc7038 [ 6525.840067] x23: ffff8000130cbd00 x22: ffff0008f624d080 [ 6525.845375] x21: 0000000000000001 x20: ffff800011dc7000 [ 6525.850682] x19: 0000000000000000 x18: 0000000000000000 [ 6525.855990] x17: 0000000000000000 x16: 0000000000000000 [ 6525.861298] x15: 0000000000000000 x14: 0000000000000000 [ 6525.866606] x13: 0000000000000000 x12: 0000000000000000 [ 6525.871913] x11: 0000000000000001 x10: ffff8000000a660c [ 6525.877220] x9 : ffff800010951810 x8 : ffff8000130cbc38 [ 6525.882528] x7 : 0000000000000000 x6 : 0000009864cfa881 [ 6525.887836] x5 : 00ffffffffffffff x4 : 002880ba1a0b3e9f [ 6525.893144] x3 : 0000000000000018 x2 : ffff8000000a4374 [ 6525.898452] x1 : 000000000000000a x0 : 0000000000000009 [ 6525.903760] Call trace: [ 6525.906202] bpf_prog_c3d01833289b6311_F+0xc8/0x9f4 [ 6525.911076] bpf_prog_d53bb52e3f4483f9_F+0x38/0xc8c [ 6525.915957] bpf_dispatcher_xdp_func+0x14/0x20 [ 6525.920398] bpf_test_run+0x70/0x1b0 [ 6525.923969] bpf_prog_test_run_xdp+0xec/0x190 [ 6525.928326] __do_sys_bpf+0xc88/0x1b28 [ 6525.932072] __arm64_sys_bpf+0x24/0x30 [ 6525.935820] el0_svc_common.constprop.0+0x70/0x168 [ 6525.940607] do_el0_svc+0x28/0x88 [ 6525.943920] el0_sync_handler+0x88/0x190 [ 6525.947838] el0_sync+0x140/0x180 [ 6525.951154] Code: d4202000 d4202000 d4202000 d4202000 (d4202000) [ 6525.957249] ---[ end trace cecc3f93b14927e2 ]--- The reason is the offset[] creation and later usage, while building the eBPF body. The code currently omits the first instruction, since build_insn() will increase our ctx->idx before saving it. That was fine up until bounded eBPF loops were introduced. After that introduction, offset[0] must be the offset of the end of prologue which is the start of the 1st insn while, offset[n] holds the offset of the end of n-th insn. When "taken loop with back jump to 1st insn" test runs, it will eventually call bpf2a64_offset(-1, 2, ctx). Since negative indexing is permitted, the current outcome depends on the value stored in ctx->offset[-1], which has nothing to do with our array. If the value happens to be 0 the tests will work. If not this error triggers. commit 7c2e988 ("bpf: fix x64 JIT code generation for jmp to 1st insn") fixed an indentical bug on x86 when eBPF bounded loops were introduced. So let's fix it by creating the ctx->offset[] differently. Track the beginning of instruction and account for the extra instruction while calculating the arm instruction offsets. Fixes: 2589726 ("bpf: introduce bounded loops") Reported-by: Naresh Kamboju <[email protected]> Reported-by: Jiri Olsa <[email protected]> Co-developed-by: Jean-Philippe Brucker <[email protected]> Co-developed-by: Yauheni Kaliuta <[email protected]> Signed-off-by: Jean-Philippe Brucker <[email protected]> Signed-off-by: Yauheni Kaliuta <[email protected]> Signed-off-by: Ilias Apalodimas <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Sep 28, 2020
[ Upstream commit 843d926 ] syzbot reported twice a lockdep issue in fib6_del() [1] which I think is caused by net->ipv6.fib6_null_entry having a NULL fib6_table pointer. fib6_del() already checks for fib6_null_entry special case, we only need to return earlier. Bug seems to occur very rarely, I have thus chosen a 'bug origin' that makes backports not too complex. [1] WARNING: suspicious RCU usage 5.9.0-rc4-syzkaller #0 Not tainted ----------------------------- net/ipv6/ip6_fib.c:1996 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by syz-executor.5/8095: #0: ffffffff8a7ea708 (rtnl_mutex){+.+.}-{3:3}, at: ppp_release+0x178/0x240 drivers/net/ppp/ppp_generic.c:401 #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: spin_trylock_bh include/linux/spinlock.h:414 [inline] #1: ffff88804c422dd8 (&net->ipv6.fib6_gc_lock){+.-.}-{2:2}, at: fib6_run_gc+0x21b/0x2d0 net/ipv6/ip6_fib.c:2312 #2: ffffffff89bd6a40 (rcu_read_lock){....}-{1:2}, at: __fib6_clean_all+0x0/0x290 net/ipv6/ip6_fib.c:2613 #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:359 [inline] #3: ffff8880a82e6430 (&tb->tb6_lock){+.-.}-{2:2}, at: __fib6_clean_all+0x107/0x290 net/ipv6/ip6_fib.c:2245 stack backtrace: CPU: 1 PID: 8095 Comm: syz-executor.5 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x198/0x1fd lib/dump_stack.c:118 fib6_del+0x12b4/0x1630 net/ipv6/ip6_fib.c:1996 fib6_clean_node+0x39b/0x570 net/ipv6/ip6_fib.c:2180 fib6_walk_continue+0x4aa/0x8e0 net/ipv6/ip6_fib.c:2102 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2150 fib6_clean_tree+0xdb/0x120 net/ipv6/ip6_fib.c:2230 __fib6_clean_all+0x120/0x290 net/ipv6/ip6_fib.c:2246 fib6_clean_all net/ipv6/ip6_fib.c:2257 [inline] fib6_run_gc+0x113/0x2d0 net/ipv6/ip6_fib.c:2320 ndisc_netdev_event+0x217/0x350 net/ipv6/ndisc.c:1805 notifier_call_chain+0xb5/0x200 kernel/notifier.c:83 call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:2033 call_netdevice_notifiers_extack net/core/dev.c:2045 [inline] call_netdevice_notifiers net/core/dev.c:2059 [inline] dev_close_many+0x30b/0x650 net/core/dev.c:1634 rollback_registered_many+0x3a8/0x1210 net/core/dev.c:9261 rollback_registered net/core/dev.c:9329 [inline] unregister_netdevice_queue+0x2dd/0x570 net/core/dev.c:10410 unregister_netdevice include/linux/netdevice.h:2774 [inline] ppp_release+0x216/0x240 drivers/net/ppp/ppp_generic.c:403 __fput+0x285/0x920 fs/file_table.c:281 task_work_run+0xdd/0x190 kernel/task_work.c:141 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:163 [inline] exit_to_user_mode_prepare+0x1e1/0x200 kernel/entry/common.c:190 syscall_exit_to_user_mode+0x7e/0x2e0 kernel/entry/common.c:265 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 421842e ("net/ipv6: Add fib6_null_entry") Signed-off-by: Eric Dumazet <[email protected]> Cc: David Ahern <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit a0e4001 ] Raven provides retimer feature support that requires i2c interaction in order to make it work well, all settings required for this configuration are loaded from the Atom bios which include the i2c address. If the retimer feature is not available, we should abort the attempt to set this feature, otherwise, it makes the following line return I2C_CHANNEL_OPERATION_NO_RESPONSE: i2c_success = i2c_write(pipe_ctx, slave_address, buffer, sizeof(buffer)); ... if (!i2c_success) ASSERT(i2c_success); This ends up causing problems with hotplugging HDMI displays on Raven, and causes retimer settings to warn like so: WARNING: CPU: 1 PID: 429 at drivers/gpu/drm/amd/amdgpu/../dal/dc/core/dc_link.c:1998 write_i2c_retimer_setting+0xc2/0x3c0 [amdgpu] Modules linked in: edac_mce_amd ccp kvm irqbypass binfmt_misc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel amdgpu(+) snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi aesni_intel snd_seq amd_iommu_v2 gpu_sched aes_x86_64 crypto_simd cryptd glue_helper snd_seq_device ttm drm_kms_helper snd_timer eeepc_wmi wmi_bmof asus_wmi sparse_keymap drm mxm_wmi snd k10temp fb_sys_fops syscopyarea sysfillrect sysimgblt soundcore joydev input_leds mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 igb i2c_algo_bit hid_generic usbhid i2c_piix4 dca ahci hid libahci video wmi gpio_amdpt gpio_generic CPU: 1 PID: 429 Comm: systemd-udevd Tainted: G W 5.2.0-rc1sept162019+ #1 Hardware name: System manufacturer System Product Name/ROG STRIX B450-F GAMING, BIOS 2605 08/06/2019 RIP: 0010:write_i2c_retimer_setting+0xc2/0x3c0 [amdgpu] Code: ff 0f b6 4d ce 44 0f b6 45 cf 44 0f b6 c8 45 89 cf 44 89 e2 48 c7 c6 f0 34 bc c0 bf 04 00 00 00 e8 63 b0 90 ff 45 84 ff 75 02 <0f> 0b 42 0f b6 04 73 8d 50 f6 80 fa 02 77 8c 3c 0a 0f 85 c8 00 00 RSP: 0018:ffffa99d02726fd0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa99d02727035 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff976acc857440 RBP: ffffa99d02727018 R08: 0000000000000002 R09: 000000000002a600 R10: ffffe90610193680 R11: 00000000000005e3 R12: 000000000000005d R13: ffff976ac4b201b8 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f14f99e1680(0000) GS:ffff976acc840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdf212843b8 CR3: 0000000408906000 CR4: 00000000003406e0 Call Trace: core_link_enable_stream+0x626/0x680 [amdgpu] dce110_apply_ctx_to_hw+0x414/0x4e0 [amdgpu] dc_commit_state+0x331/0x5e0 [amdgpu] ? drm_calc_timestamping_constants+0xf9/0x150 [drm] amdgpu_dm_atomic_commit_tail+0x395/0x1e00 [amdgpu] ? dm_plane_helper_prepare_fb+0x20c/0x280 [amdgpu] commit_tail+0x42/0x70 [drm_kms_helper] drm_atomic_helper_commit+0x10c/0x120 [drm_kms_helper] amdgpu_dm_atomic_commit+0x95/0xa0 [amdgpu] drm_atomic_commit+0x4a/0x50 [drm] restore_fbdev_mode_atomic+0x1c0/0x1e0 [drm_kms_helper] restore_fbdev_mode+0x4c/0x160 [drm_kms_helper] ? _cond_resched+0x19/0x40 drm_fb_helper_restore_fbdev_mode_unlocked+0x4e/0xa0 [drm_kms_helper] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper] fbcon_init+0x471/0x630 visual_init+0xd5/0x130 do_bind_con_driver+0x20a/0x430 do_take_over_console+0x7d/0x1b0 do_fbcon_takeover+0x5c/0xb0 fbcon_event_notify+0x6cd/0x8a0 notifier_call_chain+0x4c/0x70 blocking_notifier_call_chain+0x43/0x60 fb_notifier_call_chain+0x1b/0x20 register_framebuffer+0x254/0x360 __drm_fb_helper_initial_config_and_unlock+0x2c5/0x510 [drm_kms_helper] drm_fb_helper_initial_config+0x35/0x40 [drm_kms_helper] amdgpu_fbdev_init+0xcd/0x100 [amdgpu] amdgpu_device_init+0x1156/0x1930 [amdgpu] amdgpu_driver_load_kms+0x8d/0x2e0 [amdgpu] drm_dev_register+0x12b/0x1c0 [drm] amdgpu_pci_probe+0xd3/0x160 [amdgpu] local_pci_probe+0x47/0xa0 pci_device_probe+0x142/0x1b0 really_probe+0xf5/0x3d0 driver_probe_device+0x11b/0x130 device_driver_attach+0x58/0x60 __driver_attach+0xa3/0x140 ? device_driver_attach+0x60/0x60 ? device_driver_attach+0x60/0x60 bus_for_each_dev+0x74/0xb0 ? kmem_cache_alloc_trace+0x1a3/0x1c0 driver_attach+0x1e/0x20 bus_add_driver+0x147/0x220 ? 0xffffffffc0cb9000 driver_register+0x60/0x100 ? 0xffffffffc0cb9000 __pci_register_driver+0x5a/0x60 amdgpu_init+0x74/0x83 [amdgpu] do_one_initcall+0x4a/0x1fa ? _cond_resched+0x19/0x40 ? kmem_cache_alloc_trace+0x3f/0x1c0 ? __vunmap+0x1cc/0x200 do_init_module+0x5f/0x227 load_module+0x2330/0x2b40 __do_sys_finit_module+0xfc/0x120 ? __do_sys_finit_module+0xfc/0x120 __x64_sys_finit_module+0x1a/0x20 do_syscall_64+0x5a/0x130 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f14f9500839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff9bc4f5a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000055afb5abce30 RCX: 00007f14f9500839 RDX: 0000000000000000 RSI: 000055afb5ace0f0 RDI: 0000000000000017 RBP: 000055afb5ace0f0 R08: 0000000000000000 R09: 000000000000000a R10: 0000000000000017 R11: 0000000000000246 R12: 0000000000000000 R13: 000055afb5aad800 R14: 0000000000020000 R15: 0000000000000000 ---[ end trace c286e96563966f08 ]--- This commit reworks the way that we handle i2c write for retimer in the way that we abort this configuration if the feature is not available in the device. For debug sake, we kept a simple log message in case the retimer is not available. Signed-off-by: Rodrigo Siqueira <[email protected]> Reviewed-by: Hersen Wu <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit 402f299 ] When use command to read values, it crashed. command: dd if=/sys/kernel/debug/ieee80211/phy0/ath10k/mem_value count=1 bs=4 skip=$((0x100233)) It will call to ath10k_sdio_hif_diag_read with address = 0x4008cc and buf_len = 4. Then system crash: [ 1786.013258] Unable to handle kernel paging request at virtual address ffffffc00bd45000 [ 1786.013273] Mem abort info: [ 1786.013281] ESR = 0x96000045 [ 1786.013291] Exception class = DABT (current EL), IL = 32 bits [ 1786.013299] SET = 0, FnV = 0 [ 1786.013307] EA = 0, S1PTW = 0 [ 1786.013314] Data abort info: [ 1786.013322] ISV = 0, ISS = 0x00000045 [ 1786.013330] CM = 0, WnR = 1 [ 1786.013342] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 000000008542a60e [ 1786.013350] [ffffffc00bd45000] pgd=0000000000000000, pud=0000000000000000 [ 1786.013368] Internal error: Oops: 96000045 [#1] PREEMPT SMP [ 1786.013609] Process swapper/0 (pid: 0, stack limit = 0x0000000084b153c6) [ 1786.013623] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.86 raspberrypi#137 [ 1786.013631] Hardware name: MediaTek krane sku176 board (DT) [ 1786.013643] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 1786.013662] pc : __memcpy+0x94/0x180 [ 1786.013678] lr : swiotlb_tbl_unmap_single+0x84/0x150 [ 1786.013686] sp : ffffff8008003c60 [ 1786.013694] x29: ffffff8008003c90 x28: ffffffae96411f80 [ 1786.013708] x27: ffffffae960d2018 x26: ffffff8019a4b9a8 [ 1786.013721] x25: 0000000000000000 x24: 0000000000000001 [ 1786.013734] x23: ffffffae96567000 x22: 00000000000051d4 [ 1786.013747] x21: 0000000000000000 x20: 00000000fe6e9000 [ 1786.013760] x19: 0000000000000004 x18: 0000000000000020 [ 1786.013773] x17: 0000000000000001 x16: 0000000000000000 [ 1786.013787] x15: 00000000ffffffff x14: 00000000000044c0 [ 1786.013800] x13: 0000000000365ba4 x12: 0000000000000000 [ 1786.013813] x11: 0000000000000001 x10: 00000037be6e9000 [ 1786.013826] x9 : ffffffc940000000 x8 : 000000000bd45000 [ 1786.013839] x7 : 0000000000000000 x6 : ffffffc00bd45000 [ 1786.013852] x5 : 0000000000000000 x4 : 0000000000000000 [ 1786.013865] x3 : 0000000000000c00 x2 : 0000000000000004 [ 1786.013878] x1 : fffffff7be6e9004 x0 : ffffffc00bd45000 [ 1786.013891] Call trace: [ 1786.013903] __memcpy+0x94/0x180 [ 1786.013914] unmap_single+0x6c/0x84 [ 1786.013925] swiotlb_unmap_sg_attrs+0x54/0x80 [ 1786.013938] __swiotlb_unmap_sg_attrs+0x8c/0xa4 [ 1786.013952] msdc_unprepare_data+0x6c/0x84 [ 1786.013963] msdc_request_done+0x58/0x84 [ 1786.013974] msdc_data_xfer_done+0x1a0/0x1c8 [ 1786.013985] msdc_irq+0x12c/0x17c [ 1786.013996] __handle_irq_event_percpu+0xe4/0x250 [ 1786.014006] handle_irq_event_percpu+0x28/0x68 [ 1786.014015] handle_irq_event+0x48/0x78 [ 1786.014026] handle_fasteoi_irq+0xd0/0x1a0 [ 1786.014039] __handle_domain_irq+0x84/0xc4 [ 1786.014050] gic_handle_irq+0x124/0x1a4 [ 1786.014059] el1_irq+0xb0/0x128 [ 1786.014072] cpuidle_enter_state+0x298/0x328 [ 1786.014082] cpuidle_enter+0x30/0x40 [ 1786.014094] do_idle+0x190/0x268 [ 1786.014104] cpu_startup_entry+0x24/0x28 [ 1786.014116] rest_init+0xd4/0xe0 [ 1786.014126] start_kernel+0x30c/0x38c [ 1786.014139] Code: f8408423 f80084c3 36100062 b8404423 (b80044c3) [ 1786.014150] ---[ end trace 3b02ddb698ea69ee ]--- [ 1786.015415] Kernel panic - not syncing: Fatal exception in interrupt [ 1786.015433] SMP: stopping secondary CPUs [ 1786.015447] Kernel Offset: 0x2e8d200000 from 0xffffff8008000000 [ 1786.015458] CPU features: 0x0,2188200c [ 1786.015466] Memory Limit: none For sdio chip, it need the memory which is kmalloc, if it is vmalloc from ath10k_mem_value_read, then it have a memory error. kzalloc of ath10k_sdio_hif_diag_read32 is the correct type, so add kzalloc in ath10k_sdio_hif_diag_read to replace the buffer which is vmalloc from ath10k_mem_value_read. This patch only effect sdio chip. Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00029. Signed-off-by: Wen Gong <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit a451b12 ] In NFSv4, the lock stateids are tied to the lockowner, and the open stateid, so that the action of closing the file also results in either an automatic loss of the locks, or an error of the form NFS4ERR_LOCKS_HELD. In practice this means we must not add new locks to the open stateid after the close process has been invoked. In fact doing so, can result in the following panic: kernel BUG at lib/list_debug.c:51! invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 PID: 1085 Comm: nfsd Not tainted 5.6.0-rc3+ #2 Hardware name: VMware, Inc. VMware7,1/440BX Desktop Reference Platform, BIOS VMW71.00V.14410784.B64.1908150010 08/15/2019 RIP: 0010:__list_del_entry_valid.cold+0x31/0x55 Code: 1a 3d 9b e8 74 10 c2 ff 0f 0b 48 c7 c7 f0 1a 3d 9b e8 66 10 c2 ff 0f 0b 48 89 f2 48 89 fe 48 c7 c7 b0 1a 3d 9b e8 52 10 c2 ff <0f> 0b 48 89 fe 4c 89 c2 48 c7 c7 78 1a 3d 9b e8 3e 10 c2 ff 0f 0b RSP: 0018:ffffb296c1d47d90 EFLAGS: 00010246 RAX: 0000000000000054 RBX: ffff8ba032456ec8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff8ba039e99cc8 RDI: ffff8ba039e99cc8 RBP: ffff8ba032456e60 R08: 0000000000000781 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ba009a4abe0 R13: ffff8ba032456e8c R14: 0000000000000000 R15: ffff8ba00adb01d8 FS: 0000000000000000(0000) GS:ffff8ba039e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb213f0b008 CR3: 00000001347de006 CR4: 00000000003606e0 Call Trace: release_lock_stateid+0x2b/0x80 [nfsd] nfsd4_free_stateid+0x1e9/0x210 [nfsd] nfsd4_proc_compound+0x414/0x700 [nfsd] ? nfs4svc_decode_compoundargs+0x407/0x4c0 [nfsd] nfsd_dispatch+0xc1/0x200 [nfsd] svc_process_common+0x476/0x6f0 [sunrpc] ? svc_sock_secure_port+0x12/0x30 [sunrpc] ? svc_recv+0x313/0x9c0 [sunrpc] ? nfsd_svc+0x2d0/0x2d0 [nfsd] svc_process+0xd4/0x110 [sunrpc] nfsd+0xe3/0x140 [nfsd] kthread+0xf9/0x130 ? nfsd_destroy+0x50/0x50 [nfsd] ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x40 The fix is to ensure that lock creation tests for whether or not the open stateid is unhashed, and to fail if that is the case. Fixes: 659aefb ("nfsd: Ensure we don't recognise lock stateids after freeing them") Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
… like the valid ones [ Upstream commit 1dff306 ] On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by KVM. This is handled at first by the hardware raising a softpatch interrupt when certain TM instructions that need KVM assistance are executed in the guest. Althought some TM instructions per Power ISA are invalid forms they can raise a softpatch interrupt too. For instance, 'tresume.' instruction as defined in the ISA must have bit 31 set (1), but an instruction that matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like 0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.' and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and 0x7c0007dc, respectively. Hence, if a code like the following is executed in the guest it will raise a softpatch interrupt just like a 'tresume.' when the TM facility is enabled ('tabort. 0' in the example is used only to enable the TM facility): int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); } Currently in such a case KVM throws a complete trace like: [345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv] [345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3 crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv] [345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G W E 5.5.0+ #1 [345523.706031] NIP: c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850 [345523.706034] REGS: c000000399467680 TRAP: 0700 Tainted: G W E (5.5.0+) [345523.706034] MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 24022428 XER: 00000000 [345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0 GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720 GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000 GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530 GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000 GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001 GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000 GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500 GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720 [345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv] [345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv] [345523.706066] Call Trace: [345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable) [345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980 [345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv] [345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv] [345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm] [345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm] [345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm] [345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170 [345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80 [345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68 [345523.706112] Instruction dump: [345523.706114] 419e039 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd [345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0 and then treats the executed instruction as a 'nop'. However the POWER9 User's Manual, in section "4.6.10 Book II Invalid Forms", informs that for TM instructions bit 31 is in fact ignored, thus for the TM-related invalid forms ignoring bit 31 and handling them like the valid forms is an acceptable way to handle them. POWER8 behaves the same way too. This commit changes the handling of the cases here described by treating the TM-related invalid forms that can generate a softpatch interrupt just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by gently reporting any other unrecognized case to the host and treating it as illegal instruction instead of throwing a trace and treating it as a 'nop'. Signed-off-by: Gustavo Romero <[email protected]> Reviewed-by: Segher Boessenkool <[email protected]> Acked-By: Michael Neuling <[email protected]> Reviewed-by: Leonardo Bras <[email protected]> Signed-off-by: Paul Mackerras <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
… interrupt context [ Upstream commit edec6e0 ] apic->lapic_timer.timer was initialized with HRTIMER_MODE_ABS_HARD but started later with HRTIMER_MODE_ABS, which may cause the following warning in PREEMPT_RT kernel. WARNING: CPU: 1 PID: 2957 at kernel/time/hrtimer.c:1129 hrtimer_start_range_ns+0x348/0x3f0 CPU: 1 PID: 2957 Comm: qemu-system-x86 Not tainted 5.4.23-rt11 #1 Hardware name: Supermicro SYS-E300-9A-8C/A2SDi-8C-HLN4F, BIOS 1.1a 09/18/2018 RIP: 0010:hrtimer_start_range_ns+0x348/0x3f0 Code: 4d b8 0f 94 c1 0f b6 c9 e8 35 f1 ff ff 4c 8b 45 b0 e9 3b fd ff ff e8 d7 3f fa ff 48 98 4c 03 34 c5 a0 26 bf 93 e9 a1 fd ff ff <0f> 0b e9 fd fc ff ff 65 8b 05 fa b7 90 6d 89 c0 48 0f a3 05 60 91 RSP: 0018:ffffbc60026ffaf8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff9d81657d4110 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000006cc7987bcf RDI: ffff9d81657d4110 RBP: ffffbc60026ffb58 R08: 0000000000000001 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000000 R12: 0000006cc7987bcf R13: 0000000000000000 R14: 0000006cc7987bcf R15: ffffbc60026d6a00 FS: 00007f401daed700(0000) GS:ffff9d81ffa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000ffffffff CR3: 0000000fa7574000 CR4: 00000000003426e0 Call Trace: ? kvm_release_pfn_clean+0x22/0x60 [kvm] start_sw_timer+0x85/0x230 [kvm] ? vmx_vmexit+0x1b/0x30 [kvm_intel] kvm_lapic_switch_to_sw_timer+0x72/0x80 [kvm] vmx_pre_block+0x1cb/0x260 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_vmexit+0x1b/0x30 [kvm_intel] ? vmx_vmexit+0xf/0x30 [kvm_intel] ? vmx_sync_pir_to_irr+0x9e/0x100 [kvm_intel] ? kvm_apic_has_interrupt+0x46/0x80 [kvm] kvm_arch_vcpu_ioctl_run+0x85b/0x1fa0 [kvm] ? _raw_spin_unlock_irqrestore+0x18/0x50 ? _copy_to_user+0x2c/0x30 kvm_vcpu_ioctl+0x235/0x660 [kvm] ? rt_spin_unlock+0x2c/0x50 do_vfs_ioctl+0x3e4/0x650 ? __fget+0x7a/0xa0 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x4d/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f4027cc54a7 Code: 00 00 90 48 8b 05 e9 59 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b9 59 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f401dae9858 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005558bd029690 RCX: 00007f4027cc54a7 RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000d RBP: 00007f4028b72000 R08: 00005558bc829ad0 R09: 00000000ffffffff R10: 00005558bcf90ca0 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 00005558bce1c840 --[ end trace 0000000000000002 ]-- Signed-off-by: He Zhe <[email protected]> Message-Id: <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
…during probe [ Upstream commit 4ce35a3 ] When booting j721e the following bug is printed: [ 1.154821] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 [ 1.154827] in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 12, name: kworker/0:1 [ 1.154832] 3 locks held by kworker/0:1/12: [ 1.154836] #0: ffff000840030728 ((wq_completion)events){+.+.}, at: process_one_work+0x1d4/0x6e8 [ 1.154852] #1: ffff80001214fdd8 (deferred_probe_work){+.+.}, at: process_one_work+0x1d4/0x6e8 [ 1.154860] #2: ffff00084060b170 (&dev->mutex){....}, at: __device_attach+0x38/0x138 [ 1.154872] irq event stamp: 63096 [ 1.154881] hardirqs last enabled at (63095): [<ffff800010b74318>] _raw_spin_unlock_irqrestore+0x70/0x78 [ 1.154887] hardirqs last disabled at (63096): [<ffff800010b740d8>] _raw_spin_lock_irqsave+0x28/0x80 [ 1.154893] softirqs last enabled at (62254): [<ffff800010080c88>] _stext+0x488/0x564 [ 1.154899] softirqs last disabled at (62247): [<ffff8000100fdb3c>] irq_exit+0x114/0x140 [ 1.154906] CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.6.0-rc6-next-20200318-00094-g45e4089b0bd3 raspberrypi#221 [ 1.154911] Hardware name: Texas Instruments K3 J721E SoC (DT) [ 1.154917] Workqueue: events deferred_probe_work_func [ 1.154923] Call trace: [ 1.154928] dump_backtrace+0x0/0x190 [ 1.154933] show_stack+0x14/0x20 [ 1.154940] dump_stack+0xe0/0x148 [ 1.154946] ___might_sleep+0x150/0x1f0 [ 1.154952] __might_sleep+0x4c/0x80 [ 1.154957] wait_for_completion_timeout+0x40/0x140 [ 1.154964] ti_sci_set_device_state+0xa0/0x158 [ 1.154969] ti_sci_cmd_get_device_exclusive+0x14/0x20 [ 1.154977] ti_sci_dev_start+0x34/0x50 [ 1.154984] genpd_runtime_resume+0x78/0x1f8 [ 1.154991] __rpm_callback+0x3c/0x140 [ 1.154996] rpm_callback+0x20/0x80 [ 1.155001] rpm_resume+0x568/0x758 [ 1.155007] __pm_runtime_resume+0x44/0xb0 [ 1.155013] omap8250_probe+0x2b4/0x508 [ 1.155019] platform_drv_probe+0x50/0xa0 [ 1.155023] really_probe+0xd4/0x318 [ 1.155028] driver_probe_device+0x54/0xe8 [ 1.155033] __device_attach_driver+0x80/0xb8 [ 1.155039] bus_for_each_drv+0x74/0xc0 [ 1.155044] __device_attach+0xdc/0x138 [ 1.155049] device_initial_probe+0x10/0x18 [ 1.155053] bus_probe_device+0x98/0xa0 [ 1.155058] deferred_probe_work_func+0x74/0xb0 [ 1.155063] process_one_work+0x280/0x6e8 [ 1.155068] worker_thread+0x48/0x430 [ 1.155073] kthread+0x108/0x138 [ 1.155079] ret_from_fork+0x10/0x18 To fix the bug we need to first call pm_runtime_enable() prior to any pm_runtime calls. Reported-by: Tomi Valkeinen <[email protected]> Signed-off-by: Peter Ujfalusi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit 24201a6 ] The DMA error handler routine is currently a tasklet, scheduled to run after the DMA error IRQ was handled. However it needs to take the MDIO mutex, which is not allowed to do in a tasklet. A kernel (with debug options) complains consequently: [ 614.050361] net eth0: DMA Tx error 0x174019 [ 614.064002] net eth0: Current BD is at: 0x8f84aa0ce [ 614.080195] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 [ 614.109484] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 40, name: kworker/u4:4 [ 614.135428] 3 locks held by kworker/u4:4/40: [ 614.149075] #0: ffff000879863328 ((wq_completion)rpciod){....}, at: process_one_work+0x1f0/0x6a8 [ 614.177528] #1: ffff80001251bdf8 ((work_completion)(&task->u.tk_work)){....}, at: process_one_work+0x1f0/0x6a8 [ 614.209033] #2: ffff0008784e0110 (sk_lock-AF_INET-RPC){....}, at: tcp_sendmsg+0x24/0x58 [ 614.235429] CPU: 0 PID: 40 Comm: kworker/u4:4 Not tainted 5.6.0-rc3-00926-g4a165a9d5921 raspberrypi#26 [ 614.260854] Hardware name: ARM Test FPGA (DT) [ 614.274734] Workqueue: rpciod rpc_async_schedule [ 614.289022] Call trace: [ 614.296871] dump_backtrace+0x0/0x1a0 [ 614.308311] show_stack+0x14/0x20 [ 614.318751] dump_stack+0xbc/0x100 [ 614.329403] ___might_sleep+0xf0/0x140 [ 614.341018] __might_sleep+0x4c/0x80 [ 614.352201] __mutex_lock+0x5c/0x8a8 [ 614.363348] mutex_lock_nested+0x1c/0x28 [ 614.375654] axienet_dma_err_handler+0x38/0x388 [ 614.389999] tasklet_action_common.isra.15+0x160/0x1a8 [ 614.405894] tasklet_action+0x24/0x30 [ 614.417297] efi_header_end+0xe0/0x494 [ 614.429020] irq_exit+0xd0/0xd8 [ 614.439047] __handle_domain_irq+0x60/0xb0 [ 614.451877] gic_handle_irq+0xdc/0x2d0 [ 614.463486] el1_irq+0xcc/0x180 [ 614.473451] __tcp_transmit_skb+0x41c/0xb58 [ 614.486513] tcp_write_xmit+0x224/0x10a0 [ 614.498792] __tcp_push_pending_frames+0x38/0xc8 [ 614.513126] tcp_rcv_established+0x41c/0x820 [ 614.526301] tcp_v4_do_rcv+0x8c/0x218 [ 614.537784] __release_sock+0x5c/0x108 [ 614.549466] release_sock+0x34/0xa0 [ 614.560318] tcp_sendmsg+0x40/0x58 [ 614.571053] inet_sendmsg+0x40/0x68 [ 614.582061] sock_sendmsg+0x18/0x30 [ 614.593074] xs_sendpages+0x218/0x328 [ 614.604506] xs_tcp_send_request+0xa0/0x1b8 [ 614.617461] xprt_transmit+0xc8/0x4f0 [ 614.628943] call_transmit+0x8c/0xa0 [ 614.640028] __rpc_execute+0xbc/0x6f8 [ 614.651380] rpc_async_schedule+0x28/0x48 [ 614.663846] process_one_work+0x298/0x6a8 [ 614.676299] worker_thread+0x40/0x490 [ 614.687687] kthread+0x134/0x138 [ 614.697804] ret_from_fork+0x10/0x18 [ 614.717319] xilinx_axienet 7fe00000.ethernet eth0: Link is Down [ 615.748343] xilinx_axienet 7fe00000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off Since tasklets are not really popular anymore anyway, lets convert this over to a work queue, which can sleep and thus can take the MDIO mutex. Signed-off-by: Andre Przywara <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit 72e0ef0 ] On some EFI systems, the video BIOS is provided by the EFI firmware. The boot stub code stores the physical address of the ROM image in pdev->rom. Currently we attempt to access this pointer using phys_to_virt(), which doesn't work with CONFIG_HIGHMEM. On these systems, attempting to load the radeon module on a x86_32 kernel can result in the following: BUG: unable to handle page fault for address: 3e8ed03c #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP CPU: 0 PID: 317 Comm: systemd-udevd Not tainted 5.6.0-rc3-next-20200228 #2 Hardware name: Apple Computer, Inc. MacPro1,1/Mac-F4208DC8, BIOS MP11.88Z.005C.B08.0707021221 07/02/07 EIP: radeon_get_bios+0x5ed/0xe50 [radeon] Code: 00 00 84 c0 0f 85 12 fd ff ff c7 87 64 01 00 00 00 00 00 00 8b 47 08 8b 55 b0 e8 1e 83 e1 d6 85 c0 74 1a 8b 55 c0 85 d2 74 13 <80> 38 55 75 0e 80 78 01 aa 0f 84 a4 03 00 00 8d 74 26 00 68 dc 06 EAX: 3e8ed03c EBX: 00000000 ECX: 3e8ed03c EDX: 00010000 ESI: 00040000 EDI: eec04000 EBP: eef3fc60 ESP: eef3fbe0 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010206 CR0: 80050033 CR2: 3e8ed03c CR3: 2ec77000 CR4: 000006d0 Call Trace: r520_init+0x26/0x240 [radeon] radeon_device_init+0x533/0xa50 [radeon] radeon_driver_load_kms+0x80/0x220 [radeon] drm_dev_register+0xa7/0x180 [drm] radeon_pci_probe+0x10f/0x1a0 [radeon] pci_device_probe+0xd4/0x140 Fix the issue by updating all drivers which can access a platform provided ROM. Instead of calling the helper function pci_platform_rom() which uses phys_to_virt(), call ioremap() directly on the pdev->rom. radeon_read_platform_bios() previously directly accessed an __iomem pointer. Avoid this by calling memcpy_fromio() instead of kmemdup(). pci_platform_rom() now has no remaining callers, so remove it. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mikel Rychliski <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit aec7db3 ] I made a mistake with my previous fix, I assumed that we didn't need to mess with the reloc roots once we were out of the part of relocation where we are actually moving the extents. The subtle thing that I missed is that btrfs_init_reloc_root() also updates the last_trans for the reloc root when we do btrfs_record_root_in_trans() for the corresponding fs_root. I've added a comment to make sure future me doesn't make this mistake again. This showed up as a WARN_ON() in btrfs_copy_root() because our last_trans didn't == the current transid. This could happen if we snapshotted a fs root with a reloc root after we set rc->create_reloc_tree = 0, but before we actually merge the reloc root. Worth mentioning that the regression produced the following warning when running snapshot creation and balance in parallel: BTRFS info (device sdc): relocating block group 30408704 flags metadata|dup ------------[ cut here ]------------ WARNING: CPU: 0 PID: 12823 at fs/btrfs/ctree.c:191 btrfs_copy_root+0x26f/0x430 [btrfs] CPU: 0 PID: 12823 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_copy_root+0x26f/0x430 [btrfs] RSP: 0018:ffffb96e044279b8 EFLAGS: 00010202 RAX: 0000000000000009 RBX: ffff9da70bf61000 RCX: ffffb96e04427a48 RDX: ffff9da733a770c8 RSI: ffff9da70bf61000 RDI: ffff9da694163818 RBP: ffff9da733a770c8 R08: fffffffffffffff8 R09: 0000000000000002 R10: ffffb96e044279a0 R11: 0000000000000000 R12: ffff9da694163818 R13: fffffffffffffff8 R14: ffff9da6d2512000 R15: ffff9da714cdac00 FS: 00007fdeacf328c0(0000) GS:ffff9da735e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a2a5b8a118 CR3: 00000001eed78002 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? create_reloc_root+0x49/0x2b0 [btrfs] ? kmem_cache_alloc_trace+0xe5/0x200 create_reloc_root+0x8b/0x2b0 [btrfs] btrfs_reloc_post_snapshot+0x96/0x5b0 [btrfs] create_pending_snapshot+0x610/0x1010 [btrfs] create_pending_snapshots+0xa8/0xd0 [btrfs] btrfs_commit_transaction+0x4c7/0xc50 [btrfs] ? btrfs_mksubvol+0x3cd/0x560 [btrfs] btrfs_mksubvol+0x455/0x560 [btrfs] __btrfs_ioctl_snap_create+0x15f/0x190 [btrfs] btrfs_ioctl_snap_create_v2+0xa4/0xf0 [btrfs] ? mem_cgroup_commit_charge+0x6e/0x540 btrfs_ioctl+0x12d8/0x3760 [btrfs] ? do_raw_spin_unlock+0x49/0xc0 ? _raw_spin_unlock+0x29/0x40 ? __handle_mm_fault+0x11b3/0x14b0 ? ksys_ioctl+0x92/0xb0 ksys_ioctl+0x92/0xb0 ? trace_hardirqs_off_thunk+0x1a/0x1c __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fdeabd3bdd7 Fixes: 2abc726 ("btrfs: do not init a reloc root if we aren't relocating") Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit 266150c ] Realloc of size zero is a free not an error, avoid this causing a double free. Caught by clang's address sanitizer: ==2634==ERROR: AddressSanitizer: attempting double-free on 0x6020000015f0 in thread T0: #0 0x5649659297fd in free llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:123:3 #1 0x5649659e9251 in __zfree tools/lib/zalloc.c:13:2 #2 0x564965c0f92c in mem2node__exit tools/perf/util/mem2node.c:114:2 #3 0x564965a08b4c in perf_c2c__report tools/perf/builtin-c2c.c:2867:2 #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10 #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11 #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8 #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2 #8 0x564965942e41 in main tools/perf/perf.c:538:3 0x6020000015f0 is located 0 bytes inside of 1-byte region [0x6020000015f0,0x6020000015f1) freed by thread T0 here: #0 0x564965929da3 in realloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:164:3 #1 0x564965c0f55e in mem2node__init tools/perf/util/mem2node.c:97:16 #2 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8 #3 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10 #4 0x564965944348 in run_builtin tools/perf/perf.c:312:11 #5 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8 #6 0x5649659440c4 in run_argv tools/perf/perf.c:408:2 #7 0x564965942e41 in main tools/perf/perf.c:538:3 previously allocated by thread T0 here: #0 0x564965929c42 in calloc third_party/llvm/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:154:3 #1 0x5649659e9220 in zalloc tools/lib/zalloc.c:8:9 #2 0x564965c0f32d in mem2node__init tools/perf/util/mem2node.c:61:12 #3 0x564965a08956 in perf_c2c__report tools/perf/builtin-c2c.c:2803:8 #4 0x564965a0616a in cmd_c2c tools/perf/builtin-c2c.c:2989:10 #5 0x564965944348 in run_builtin tools/perf/perf.c:312:11 #6 0x564965943235 in handle_internal_command tools/perf/perf.c:364:8 #7 0x5649659440c4 in run_argv tools/perf/perf.c:408:2 #8 0x564965942e41 in main tools/perf/perf.c:538:3 v2: add a WARN_ON_ONCE when the free condition arises. Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit 95a3d8f ] When xfstests generic/451, there is an BUG at mm/memcontrol.c: page:ffffea000560f2c0 refcount:2 mapcount:0 mapping:000000008544e0ea index:0xf mapping->aops:cifs_addr_ops dentry name:"tst-aio-dio-cycle-write.451" flags: 0x2fffff80000001(locked) raw: 002fffff80000001 ffffc90002023c50 ffffea0005280088 ffff88815cda0210 raw: 000000000000000f 0000000000000000 00000002ffffffff ffff88817287d000 page dumped because: VM_BUG_ON_PAGE(page->mem_cgroup) page->mem_cgroup:ffff88817287d000 ------------[ cut here ]------------ kernel BUG at mm/memcontrol.c:2659! invalid opcode: 0000 [#1] SMP CPU: 2 PID: 2038 Comm: xfs_io Not tainted 5.8.0-rc1 raspberrypi#44 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_ 073836-buildvm-ppc64le-16.ppc.4 RIP: 0010:commit_charge+0x35/0x50 Code: 0d 48 83 05 54 b2 02 05 01 48 89 77 38 c3 48 c7 c6 78 4a ea ba 48 83 05 38 b2 02 05 01 e8 63 0d9 RSP: 0018:ffffc90002023a50 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88817287d000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88817ac97ea0 RDI: ffff88817ac97ea0 RBP: ffffea000560f2c0 R08: 0000000000000203 R09: 0000000000000005 R10: 0000000000000030 R11: ffffc900020237a8 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000001 R15: ffff88815a1272c0 FS: 00007f5071ab0800(0000) GS:ffff88817ac80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055efcd5ca000 CR3: 000000015d312000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mem_cgroup_charge+0x166/0x4f0 __add_to_page_cache_locked+0x4a9/0x710 add_to_page_cache_locked+0x15/0x20 cifs_readpages+0x217/0x1270 read_pages+0x29a/0x670 page_cache_readahead_unbounded+0x24f/0x390 __do_page_cache_readahead+0x3f/0x60 ondemand_readahead+0x1f1/0x470 page_cache_async_readahead+0x14c/0x170 generic_file_buffered_read+0x5df/0x1100 generic_file_read_iter+0x10c/0x1d0 cifs_strict_readv+0x139/0x170 new_sync_read+0x164/0x250 __vfs_read+0x39/0x60 vfs_read+0xb5/0x1e0 ksys_pread64+0x85/0xf0 __x64_sys_pread64+0x22/0x30 do_syscall_64+0x69/0x150 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f5071fcb1af Code: Bad RIP value. RSP: 002b:00007ffde2cdb8e0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011 RAX: ffffffffffffffda RBX: 00007ffde2cdb990 RCX: 00007f5071fcb1af RDX: 0000000000001000 RSI: 000055efcd5ca000 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000001000 R11: 0000000000000293 R12: 0000000000000001 R13: 000000000009f000 R14: 0000000000000000 R15: 0000000000001000 Modules linked in: ---[ end trace 725fa14a3e1af65c ]--- Since commit 3fea5a4 ("mm: memcontrol: convert page cache to a new mem_cgroup_charge() API") not cancel the page charge, the pages maybe double add to pagecache: thread1 | thread2 cifs_readpages readpages_get_pages add_to_page_cache_locked(head,index=n)=0 | readpages_get_pages | add_to_page_cache_locked(head,index=n+1)=0 add_to_page_cache_locked(head, index=n+1)=-EEXIST then, will next loop with list head page's index=n+1 and the page->mapping not NULL readpages_get_pages add_to_page_cache_locked(head, index=n+1) commit_charge VM_BUG_ON_PAGE So, we should not do the next loop when any page add to page cache failed. Reported-by: Hulk Robot <[email protected]> Signed-off-by: Zhang Xiaoxu <[email protected]> Signed-off-by: Steve French <[email protected]> Acked-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit b872d06 ] The vfio_pci_release call will free and clear the error and request eventfd ctx while these ctx could be in use at the same time in the function like vfio_pci_request, and it's expected to protect them under the vdev->igate mutex, which is missing in vfio_pci_release. This issue is introduced since commit 1518ac2 ("vfio/pci: fix memory leaks of eventfd ctx"),and since commit 5c5866c ("vfio/pci: Clear error and request eventfd ctx after releasing"), it's very easily to trigger the kernel panic like this: [ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 9513.913091] Mem abort info: [ 9513.915871] ESR = 0x96000006 [ 9513.918912] EC = 0x25: DABT (current EL), IL = 32 bits [ 9513.924198] SET = 0, FnV = 0 [ 9513.927238] EA = 0, S1PTW = 0 [ 9513.930364] Data abort info: [ 9513.933231] ISV = 0, ISS = 0x00000006 [ 9513.937048] CM = 0, WnR = 0 [ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000 [ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000 [ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci] [ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G W 5.8.0-rc4+ #3 [ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020 [ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88 [ 9513.999515] lr : eventfd_signal+0x6c/0x1b0 [ 9514.003591] sp : ffff800038a0b960 [ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10 [ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000 [ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400 [ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008 [ 9514.028033] x21: 0000000000000000 x20: 0000000000000000 [ 9514.033321] x19: 0000000000000008 x18: 0000000000000000 [ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328 [ 9514.043893] x15: 0000000000000000 x14: 0000000000000001 [ 9514.049179] x13: 0000000000000000 x12: 0000000000000002 [ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00 [ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60 [ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748 [ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000 [ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001 [ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000 [ 9514.086181] Call trace: [ 9514.088618] _raw_spin_lock_irqsave+0x48/0x88 [ 9514.092954] eventfd_signal+0x6c/0x1b0 [ 9514.096691] vfio_pci_request+0x84/0xd0 [vfio_pci] [ 9514.101464] vfio_del_group_dev+0x150/0x290 [vfio] [ 9514.106234] vfio_pci_remove+0x30/0x128 [vfio_pci] [ 9514.111007] pci_device_remove+0x48/0x108 [ 9514.115001] device_release_driver_internal+0x100/0x1b8 [ 9514.120200] device_release_driver+0x28/0x38 [ 9514.124452] pci_stop_bus_device+0x68/0xa8 [ 9514.128528] pci_stop_and_remove_bus_device+0x20/0x38 [ 9514.133557] pci_iov_remove_virtfn+0xb4/0x128 [ 9514.137893] sriov_disable+0x3c/0x108 [ 9514.141538] pci_disable_sriov+0x28/0x38 [ 9514.145445] hns3_pci_sriov_configure+0x48/0xb8 [hns3] [ 9514.150558] sriov_numvfs_store+0x110/0x198 [ 9514.154724] dev_attr_store+0x44/0x60 [ 9514.158373] sysfs_kf_write+0x5c/0x78 [ 9514.162018] kernfs_fop_write+0x104/0x210 [ 9514.166010] __vfs_write+0x48/0x90 [ 9514.169395] vfs_write+0xbc/0x1c0 [ 9514.172694] ksys_write+0x74/0x100 [ 9514.176079] __arm64_sys_write+0x24/0x30 [ 9514.179987] el0_svc_common.constprop.4+0x110/0x200 [ 9514.184842] do_el0_svc+0x34/0x98 [ 9514.188144] el0_svc+0x14/0x40 [ 9514.191185] el0_sync_handler+0xb0/0x2d0 [ 9514.195088] el0_sync+0x140/0x180 [ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61) [ 9514.204455] ---[ end trace 648de00c8406465f ]--- [ 9514.212308] note: bash[1327] exited with preempt_count 1 Cc: Qian Cai <[email protected]> Cc: Alex Williamson <[email protected]> Fixes: 1518ac2 ("vfio/pci: fix memory leaks of eventfd ctx") Signed-off-by: Zeng Tao <[email protected]> Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
[ Upstream commit ce26842 ] syzbot reported the following KASAN splat: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 1 PID: 6826 Comm: syz-executor142 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__lock_acquire+0x84/0x2ae0 kernel/locking/lockdep.c:4296 Code: ff df 8a 04 30 84 c0 0f 85 e3 16 00 00 83 3d 56 58 35 08 00 0f 84 0e 17 00 00 83 3d 25 c7 f5 07 00 74 2c 4c 89 e8 48 c1 e8 03 <80> 3c 30 00 74 12 4c 89 ef e8 3e d1 5a 00 48 be 00 00 00 00 00 fc RSP: 0018:ffffc90004b9f850 EFLAGS: 00010006 Call Trace: lock_acquire+0x140/0x6f0 kernel/locking/lockdep.c:5006 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline] _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151 spin_lock include/linux/spinlock.h:354 [inline] madvise_cold_or_pageout_pte_range+0x52f/0x25c0 mm/madvise.c:389 walk_pmd_range mm/pagewalk.c:89 [inline] walk_pud_range mm/pagewalk.c:160 [inline] walk_p4d_range mm/pagewalk.c:193 [inline] walk_pgd_range mm/pagewalk.c:229 [inline] __walk_page_range+0xe7b/0x1da0 mm/pagewalk.c:331 walk_page_range+0x2c3/0x5c0 mm/pagewalk.c:427 madvise_pageout_page_range mm/madvise.c:521 [inline] madvise_pageout mm/madvise.c:557 [inline] madvise_vma mm/madvise.c:946 [inline] do_madvise+0x12d0/0x2090 mm/madvise.c:1145 __do_sys_madvise mm/madvise.c:1171 [inline] __se_sys_madvise mm/madvise.c:1169 [inline] __x64_sys_madvise+0x76/0x80 mm/madvise.c:1169 do_syscall_64+0x31/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The backing vma was shmem. In case of split page of file-backed THP, madvise zaps the pmd instead of remapping of sub-pages. So we need to check pmd validity after split. Reported-by: [email protected] Fixes: 1a4e58c ("mm: introduce MADV_PAGEOUT") Signed-off-by: Minchan Kim <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 2, 2020
commit 35be885 upstream. Syzkaller reported a buffer overflow in btree_readpage_end_io_hook() when loop mounting a crafted image: detected buffer overflow in memcpy ------------[ cut here ]------------ kernel BUG at lib/string.c:1129! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 26 Comm: kworker/u4:2 Not tainted 5.9.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: btrfs-endio-meta btrfs_work_helper RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129 RSP: 0018:ffffc90000e27980 EFLAGS: 00010286 RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000 RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22 RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7 R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557335f440d0 CR3: 000000009647d000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: memcpy include/linux/string.h:405 [inline] btree_readpage_end_io_hook.cold+0x206/0x221 fs/btrfs/disk-io.c:642 end_bio_extent_readpage+0x4de/0x10c0 fs/btrfs/extent_io.c:2854 bio_endio+0x3cf/0x7f0 block/bio.c:1449 end_workqueue_fn+0x114/0x170 fs/btrfs/disk-io.c:1695 btrfs_work_helper+0x221/0xe20 fs/btrfs/async-thread.c:318 process_one_work+0x94c/0x1670 kernel/workqueue.c:2269 worker_thread+0x64c/0x1120 kernel/workqueue.c:2415 kthread+0x3b5/0x4a0 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294 Modules linked in: ---[ end trace b68924293169feef ]--- RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129 RSP: 0018:ffffc90000e27980 EFLAGS: 00010286 RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000 RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22 RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7 R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c FS: 0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f95b7c4d008 CR3: 000000009647d000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 The overflow happens, because in btree_readpage_end_io_hook() we assume that we have found a 4 byte checksum instead of the real possible 32 bytes we have for the checksums. With the fix applied: [ 35.726623] BTRFS: device fsid 815caf9a-dc43-4d2a-ac54-764b8333d765 devid 1 transid 5 /dev/loop0 scanned by syz-repro (215) [ 35.738994] BTRFS info (device loop0): disk space caching is enabled [ 35.738998] BTRFS info (device loop0): has skinny extents [ 35.743337] BTRFS warning (device loop0): loop0 checksum verify failed on 1052672 wanted 0xf9c035fc8d239a54 found 0x67a25c14b7eabcf9 level 0 [ 35.743420] BTRFS error (device loop0): failed to read chunk root [ 35.745899] BTRFS error (device loop0): open_ctree failed Reported-by: [email protected] Fixes: d517857 ("btrfs: directly call into crypto framework for checksumming") CC: [email protected] # 5.4+ Signed-off-by: Johannes Thumshirn <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Oct 7, 2020
commit f85086f upstream. In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ raspberrypi#25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: 4fbce63 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Oscar Salvador <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Nathan Lynch <[email protected]> Cc: Scott Cheloha <[email protected]> Cc: Tony Luck <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit 7b0033d ] In my test case, concurrent calls to f2fs shutdown report the following stack trace: Oops: general protection fault, probably for non-canonical address 0xc6cfff63bb5513fc: 0000 [#1] PREEMPT SMP PTI CPU: 0 UID: 0 PID: 678 Comm: f2fs_rep_shutdo Not tainted 6.12.0-rc5-next-20241029-g6fb2fa9805c5-dirty raspberrypi#85 Call Trace: <TASK> ? show_regs+0x8b/0xa0 ? __die_body+0x26/0xa0 ? die_addr+0x54/0x90 ? exc_general_protection+0x24b/0x5c0 ? asm_exc_general_protection+0x26/0x30 ? kthread_stop+0x46/0x390 f2fs_stop_gc_thread+0x6c/0x110 f2fs_do_shutdown+0x309/0x3a0 f2fs_ioc_shutdown+0x150/0x1c0 __f2fs_ioctl+0xffd/0x2ac0 f2fs_ioctl+0x76/0xe0 vfs_ioctl+0x23/0x60 __x64_sys_ioctl+0xce/0xf0 x64_sys_call+0x2b1b/0x4540 do_syscall_64+0xa7/0x240 entry_SYSCALL_64_after_hwframe+0x76/0x7e The root cause is a race condition in f2fs_stop_gc_thread() called from different f2fs shutdown paths: [CPU0] [CPU1] ---------------------- ----------------------- f2fs_stop_gc_thread f2fs_stop_gc_thread gc_th = sbi->gc_thread gc_th = sbi->gc_thread kfree(gc_th) sbi->gc_thread = NULL < gc_th != NULL > kthread_stop(gc_th->f2fs_gc_task) //UAF The commit c7f114d ("f2fs: fix to avoid use-after-free in f2fs_stop_gc_thread()") attempted to fix this issue by using a read semaphore to prevent races between shutdown and remount threads, but it fails to prevent all race conditions. Fix it by converting to write lock of s_umount in f2fs_do_shutdown(). Fixes: 7950e9a ("f2fs: stop gc/discard thread after fs shutdown") Signed-off-by: Long Li <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit f8c989a ] The last reference for `cache_head` can be reduced to zero in `c_show` and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently, `svc_export_put` and `expkey_put` will be invoked, leading to two issues: 1. The `svc_export_put` will directly free ex_uuid. However, `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can trigger a use-after-free issue, shown below. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e 2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`. However, `svc_export_put`/`expkey_put` will call path_put, which subsequently triggers a sleeping operation due to the following `dput`. ============================= WARNING: suspicious RCU usage 5.10.0-dirty raspberrypi#141 Not tainted ----------------------------- ... Call Trace: dump_stack+0x9a/0xd0 ___might_sleep+0x231/0x240 dput+0x39/0x600 path_put+0x1b/0x30 svc_export_put+0x17/0x80 e_show+0x1c9/0x200 seq_read_iter+0x63f/0x7c0 seq_read+0x226/0x2d0 vfs_read+0x113/0x2c0 ksys_read+0xc9/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Fix these issues by using `rcu_work` to help release `svc_expkey`/`svc_export`. This approach allows for an asynchronous context to invoke `path_put` and also facilitates the freeing of `uuid/exp/key` after an RCU grace period. Fixes: 9ceddd9 ("knfsd: Allow lockless lookups of the exports") Signed-off-by: Yang Erkun <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit ce89e74 ] There's issue as follows: RPC: Registered rdma transport module. RPC: Registered rdma backchannel transport module. RPC: Unregistered rdma transport module. RPC: Unregistered rdma backchannel transport module. BUG: unable to handle page fault for address: fffffbfff80c609a PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0 Call Trace: <TASK> __die+0x1f/0x70 page_fault_oops+0x2cd/0x860 spurious_kernel_fault+0x36/0x450 do_kern_addr_fault+0xca/0x100 exc_page_fault+0x128/0x150 asm_exc_page_fault+0x26/0x30 percpu_counter_destroy_many+0xf7/0x2a0 mmdrop+0x209/0x350 finish_task_switch.isra.0+0x481/0x840 schedule_tail+0xe/0xd0 ret_from_fork+0x23/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not destroy the percpu counters which init in svc_rdma_proc_init(). If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the 'percpu_counters' list. The above issue may occur once the module is removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory leakage occurs. To solve above issue just destroy all percpu counters when register_sysctl() return NULL. Fixes: 1e7e557 ("svcrdma: Restore read and write stats") Fixes: 22df5a2 ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter") Fixes: df971cd ("svcrdma: Convert rdma_stat_recv to a per-CPU counter") Signed-off-by: Ye Bin <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit fe4bf8d ] There are cases where a PCIe extended capability should be hidden from the user. For example, an unknown capability (i.e., capability with ID greater than PCI_EXT_CAP_ID_MAX) or a capability that is intentionally chosen to be hidden from the user. Hiding a capability is done by virtualizing and modifying the 'Next Capability Offset' field of the previous capability so it points to the capability after the one that should be hidden. The special case where the first capability in the list should be hidden is handled differently because there is no previous capability that can be modified. In this case, the capability ID and version are zeroed while leaving the next pointer intact. This hides the capability and leaves an anchor for the rest of the capability list. However, today, hiding the first capability in the list is not done properly if the capability is unknown, as struct vfio_pci_core_device->pci_config_map is set to the capability ID during initialization but the capability ID is not properly checked later when used in vfio_config_do_rw(). This leads to the following warning [1] and to an out-of-bounds access to ecap_perms array. Fix it by checking cap_id in vfio_config_do_rw(), and if it is greater than PCI_EXT_CAP_ID_MAX, use an alternative struct perm_bits for direct read only access instead of the ecap_perms array. Note that this is safe since the above is the only case where cap_id can exceed PCI_EXT_CAP_ID_MAX (except for the special capabilities, which are already checked before). [1] WARNING: CPU: 118 PID: 5329 at drivers/vfio/pci/vfio_pci_config.c:1900 vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] CPU: 118 UID: 0 PID: 5329 Comm: simx-qemu-syste Not tainted 6.12.0+ #1 (snip) Call Trace: <TASK> ? show_regs+0x69/0x80 ? __warn+0x8d/0x140 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] ? report_bug+0x18f/0x1a0 ? handle_bug+0x63/0xa0 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? vfio_pci_config_rw+0x395/0x430 [vfio_pci_core] ? vfio_pci_config_rw+0x244/0x430 [vfio_pci_core] vfio_pci_rw+0x101/0x1b0 [vfio_pci_core] vfio_pci_core_read+0x1d/0x30 [vfio_pci_core] vfio_device_fops_read+0x27/0x40 [vfio] vfs_read+0xbd/0x340 ? vfio_device_fops_unl_ioctl+0xbb/0x740 [vfio] ? __rseq_handle_notify_resume+0xa4/0x4b0 __x64_sys_pread64+0x96/0xc0 x64_sys_call+0x1c3d/0x20d0 do_syscall_64+0x4d/0x120 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: 89e1f7d ("vfio: Add PCI device driver") Signed-off-by: Avihai Horon <[email protected]> Reviewed-by: Yi Liu <[email protected]> Tested-by: Yi Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit ebaf813 ] Passing MSG_PEEK flag to skb_recv_datagram() increments skb refcount (skb->users) and iucv_sock_recvmsg() does not decrement skb refcount at exit. This results in skb memory leak in skb_queue_purge() and WARN_ON in iucv_sock_destruct() during socket close. To fix this decrease skb refcount by one if MSG_PEEK is set in order to prevent memory leak and WARN_ON. WARNING: CPU: 2 PID: 6292 at net/iucv/af_iucv.c:286 iucv_sock_destruct+0x144/0x1a0 [af_iucv] CPU: 2 PID: 6292 Comm: afiucv_test_msg Kdump: loaded Tainted: G W 6.10.0-rc7 #1 Hardware name: IBM 3931 A01 704 (z/VM 7.3.0) Call Trace: [<001587c682c4aa98>] iucv_sock_destruct+0x148/0x1a0 [af_iucv] [<001587c682c4a9d0>] iucv_sock_destruct+0x80/0x1a0 [af_iucv] [<001587c704117a32>] __sk_destruct+0x52/0x550 [<001587c704104a54>] __sock_release+0xa4/0x230 [<001587c704104c0c>] sock_close+0x2c/0x40 [<001587c702c5f5a8>] __fput+0x2e8/0x970 [<001587c7024148c4>] task_work_run+0x1c4/0x2c0 [<001587c7023b0716>] do_exit+0x996/0x1050 [<001587c7023b13aa>] do_group_exit+0x13a/0x360 [<001587c7023b1626>] __s390x_sys_exit_group+0x56/0x60 [<001587c7022bccca>] do_syscall+0x27a/0x380 [<001587c7049a6a0c>] __do_syscall+0x9c/0x160 [<001587c7049ce8a8>] system_call+0x70/0x98 Last Breaking-Event-Address: [<001587c682c4a9d4>] iucv_sock_destruct+0x84/0x1a0 [af_iucv] Fixes: eac3731 ("[S390]: Add AF_IUCV socket support") Reviewed-by: Alexandra Winter <[email protected]> Reviewed-by: Thorsten Winkler <[email protected]> Signed-off-by: Sidraya Jayagond <[email protected]> Signed-off-by: Alexandra Winter <[email protected]> Reviewed-by: David Wei <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit a66dfaf ] This fixes possible deadlocks like the following caused by hci_cmd_sync_dequeue causing the destroy function to run: INFO: task kworker/u19:0:143 blocked for more than 120 seconds. Tainted: G W O 6.8.0-2024-03-19-intel-next-iLS-24ww14 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u19:0 state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000 Workqueue: hci0 hci_cmd_sync_work [bluetooth] Call Trace: <TASK> __schedule+0x374/0xaf0 schedule+0x3c/0xf0 schedule_preempt_disabled+0x1c/0x30 __mutex_lock.constprop.0+0x3ef/0x7a0 __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x3c/0x50 mgmt_set_connectable_complete+0xa4/0x150 [bluetooth] ? kfree+0x211/0x2a0 hci_cmd_sync_dequeue+0xae/0x130 [bluetooth] ? __pfx_cmd_complete_rsp+0x10/0x10 [bluetooth] cmd_complete_rsp+0x26/0x80 [bluetooth] mgmt_pending_foreach+0x4d/0x70 [bluetooth] __mgmt_power_off+0x8d/0x180 [bluetooth] ? _raw_spin_unlock_irq+0x23/0x40 hci_dev_close_sync+0x445/0x5b0 [bluetooth] hci_set_powered_sync+0x149/0x250 [bluetooth] set_powered_sync+0x24/0x60 [bluetooth] hci_cmd_sync_work+0x90/0x150 [bluetooth] process_one_work+0x13e/0x300 worker_thread+0x2f7/0x420 ? __pfx_worker_thread+0x10/0x10 kthread+0x107/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x3d/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Tested-by: Kiran K <[email protected]> Fixes: f53e1c9 ("Bluetooth: MGMT: Fix possible crash on mgmt_index_removed") Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
[ Upstream commit d73dc7b ] [Syzbot reported two possible deadlocks] The first possible deadlock is: WARNING: possible recursive locking detected 6.12.0-rc1-syzkaller-00027-g4a9fe2a8ac53 #0 Not tainted -------------------------------------------- syz-executor363/2651 is trying to acquire lock: ffffffff89b120e8 (chaoskey_list_lock){+.+.}-{3:3}, at: chaoskey_release+0x15d/0x2c0 drivers/usb/misc/chaoskey.c:322 but task is already holding lock: ffffffff89b120e8 (chaoskey_list_lock){+.+.}-{3:3}, at: chaoskey_release+0x7f/0x2c0 drivers/usb/misc/chaoskey.c:299 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(chaoskey_list_lock); lock(chaoskey_list_lock); *** DEADLOCK *** The second possible deadlock is: WARNING: possible circular locking dependency detected 6.12.0-rc1-syzkaller-00027-g4a9fe2a8ac53 #0 Not tainted ------------------------------------------------------ kworker/0:2/804 is trying to acquire lock: ffffffff899dadb0 (minor_rwsem){++++}-{3:3}, at: usb_deregister_dev+0x7c/0x1e0 drivers/usb/core/file.c:186 but task is already holding lock: ffffffff89b120e8 (chaoskey_list_lock){+.+.}-{3:3}, at: chaoskey_disconnect+0xa8/0x2a0 drivers/usb/misc/chaoskey.c:235 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (chaoskey_list_lock){+.+.}-{3:3}: __mutex_lock_common kernel/locking/mutex.c:608 [inline] __mutex_lock+0x175/0x9c0 kernel/locking/mutex.c:752 chaoskey_open+0xdd/0x220 drivers/usb/misc/chaoskey.c:274 usb_open+0x186/0x220 drivers/usb/core/file.c:47 chrdev_open+0x237/0x6a0 fs/char_dev.c:414 do_dentry_open+0x6cb/0x1390 fs/open.c:958 vfs_open+0x82/0x3f0 fs/open.c:1088 do_open fs/namei.c:3774 [inline] path_openat+0x1e6a/0x2d60 fs/namei.c:3933 do_filp_open+0x1dc/0x430 fs/namei.c:3960 do_sys_openat2+0x17a/0x1e0 fs/open.c:1415 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x175/0x210 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (minor_rwsem){++++}-{3:3}: check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain kernel/locking/lockdep.c:3904 [inline] __lock_acquire+0x250b/0x3ce0 kernel/locking/lockdep.c:5202 lock_acquire.part.0+0x11b/0x380 kernel/locking/lockdep.c:5825 down_write+0x93/0x200 kernel/locking/rwsem.c:1577 usb_deregister_dev+0x7c/0x1e0 drivers/usb/core/file.c:186 chaoskey_disconnect+0xb7/0x2a0 drivers/usb/misc/chaoskey.c:236 usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461 device_remove drivers/base/dd.c:569 [inline] device_remove+0x122/0x170 drivers/base/dd.c:561 __device_release_driver drivers/base/dd.c:1273 [inline] device_release_driver_internal+0x44a/0x610 drivers/base/dd.c:1296 bus_remove_device+0x22f/0x420 drivers/base/bus.c:576 device_del+0x396/0x9f0 drivers/base/core.c:3864 usb_disable_device+0x36c/0x7f0 drivers/usb/core/message.c:1418 usb_disconnect+0x2e1/0x920 drivers/usb/core/hub.c:2304 hub_port_connect drivers/usb/core/hub.c:5361 [inline] hub_port_connect_change drivers/usb/core/hub.c:5661 [inline] port_event drivers/usb/core/hub.c:5821 [inline] hub_event+0x1bed/0x4f40 drivers/usb/core/hub.c:5903 process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229 process_scheduled_works kernel/workqueue.c:3310 [inline] worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(chaoskey_list_lock); lock(minor_rwsem); lock(chaoskey_list_lock); lock(minor_rwsem); *** DEADLOCK *** [Analysis] The first is AA lock, it because wrong logic, it need a unlock. The second is AB lock, it needs to rearrange the order of lock usage. Fixes: 422dc0a ("USB: chaoskey: fail open after removal") Reported-by: [email protected] Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=685e14d04fe35692d3bc Signed-off-by: Edward Adam Davis <[email protected]> Tested-by: [email protected] Reported-by: [email protected] Tested-by: [email protected] Tested-by: [email protected] Link: https://lore.kernel.org/r/[email protected] Cc: Oliver Neukum <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit 339b84a upstream. If a BUG_ON() can be hit in the wild, it shouldn't be a BUG_ON() For reference, this has popped up once in the CI, and we'll need more info to debug it: 03240 ------------[ cut here ]------------ 03240 kernel BUG at lib/closure.c:21! 03240 kernel BUG at lib/closure.c:21! 03240 Internal error: Oops - BUG: 00000000f2000800 [#1] SMP 03240 Modules linked in: 03240 CPU: 15 PID: 40534 Comm: kworker/u80:1 Not tainted 6.10.0-rc4-ktest-ga56da69799bd #25570 03240 Hardware name: linux,dummy-virt (DT) 03240 Workqueue: btree_update btree_interior_update_work 03240 pstate: 00001005 (nzcv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--) 03240 pc : closure_put+0x224/0x2a0 03240 lr : closure_put+0x24/0x2a0 03240 sp : ffff0000d12071c0 03240 x29: ffff0000d12071c0 x28: dfff800000000000 x27: ffff0000d1207360 03240 x26: 0000000000000040 x25: 0000000000000040 x24: 0000000000000040 03240 x23: ffff0000c1f20180 x22: 0000000000000000 x21: ffff0000c1f20168 03240 x20: 0000000040000000 x19: ffff0000c1f20140 x18: 0000000000000001 03240 x17: 0000000000003aa0 x16: 0000000000003ad0 x15: 1fffe0001c326974 03240 x14: 0000000000000a1e x13: 0000000000000000 x12: 1fffe000183e402d 03240 x11: ffff6000183e402d x10: dfff800000000000 x9 : ffff6000183e402e 03240 x8 : 0000000000000001 x7 : 00009fffe7c1bfd3 x6 : ffff0000c1f2016b 03240 x5 : ffff0000c1f20168 x4 : ffff6000183e402e x3 : ffff800081391954 03240 x2 : 0000000000000001 x1 : 0000000000000000 x0 : 00000000a8000000 03240 Call trace: 03240 closure_put+0x224/0x2a0 03240 bch2_check_for_deadlock+0x910/0x1028 03240 bch2_six_check_for_deadlock+0x1c/0x30 03240 six_lock_slowpath.isra.0+0x29c/0xed0 03240 six_lock_ip_waiter+0xa8/0xf8 03240 __bch2_btree_node_lock_write+0x14c/0x298 03240 bch2_trans_lock_write+0x6d4/0xb10 03240 __bch2_trans_commit+0x135c/0x5520 03240 btree_interior_update_work+0x1248/0x1c10 03240 process_scheduled_works+0x53c/0xd90 03240 worker_thread+0x370/0x8c8 03240 kthread+0x258/0x2e8 03240 ret_from_fork+0x10/0x20 03240 Code: aa1303e0 d63f0020 a94363f7 17ffff8c (d4210000) 03240 ---[ end trace 0000000000000000 ]--- 03240 Kernel panic - not syncing: Oops - BUG: Fatal exception 03240 SMP: stopping secondary CPUs 03241 SMP: failed to stop secondary CPUs 13,15 03241 Kernel Offset: disabled 03241 CPU features: 0x00,00000003,80000008,4240500b 03241 Memory Limit: none 03241 ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]--- 03246 ========= FAILED TIMEOUT copygc_torture_no_checksum in 7200s Signed-off-by: Kent Overstreet <[email protected]> [ Resolve minor conflicts to fix CVE-2024-42252 ] Signed-off-by: Bin Lan <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit c7acef9 upstream. Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock. Since commit 0fc7088 ("thunderbolt: Reset USB4 v2 host router") and commit 59a54c5 ("thunderbolt: Reset topology created by the boot firmware"), USB4 v2 and v1 Host Routers are reset on probe of the thunderbolt driver. The reset clears the Presence Detect State and Data Link Layer Link Active bits at the USB4 Host Router's Root Port and thus causes hot removal of the dock. The crash occurs when pciehp is unbound from one of the dock's Downstream Ports: pciehp creates a pci_slot on bind and destroys it on unbind. The pci_slot contains a pointer to the pci_bus below the Downstream Port, but a reference on that pci_bus is never acquired. The pci_bus is destroyed before the pci_slot, so a use-after-free ensues when pci_slot_release() accesses slot->bus. In principle this should not happen because pci_stop_bus_device() unbinds pciehp (and therefore destroys the pci_slot) before the pci_bus is destroyed by pci_remove_bus_device(). However the stacktrace provided by Dennis shows that pciehp is unbound from pci_remove_bus_device() instead of pci_stop_bus_device(). To understand the significance of this, one needs to know that the PCI core uses a two step process to remove a portion of the hierarchy: It first unbinds all drivers in the sub-hierarchy in pci_stop_bus_device() and then actually removes the devices in pci_remove_bus_device(). There is no precaution to prevent driver binding in-between pci_stop_bus_device() and pci_remove_bus_device(). In Dennis' case, it seems removal of the hierarchy by pciehp races with driver binding by pci_bus_add_devices(). pciehp is bound to the Downstream Port after pci_stop_bus_device() has run, so it is unbound by pci_remove_bus_device() instead of pci_stop_bus_device(). Because the pci_bus has already been destroyed at that point, accesses to it result in a use-after-free. One might conclude that driver binding needs to be prevented after pci_stop_bus_device() has run. However it seems risky that pci_slot points to pci_bus without holding a reference. Solely relying on correct ordering of driver unbind versus pci_bus destruction is certainly not defensive programming. If pci_slot has a need to access data in pci_bus, it ought to acquire a reference. Amend pci_create_slot() accordingly. Dennis reports that the crash is not reproducible with this change. Abridged stacktrace: pcieport 0000:00:07.0: PME: Signaling with IRQ 156 pcieport 0000:00:07.0: pciehp: Slot raspberrypi#12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+ pci_bus 0000:20: dev 00, created physical slot 12 pcieport 0000:00:07.0: pciehp: Slot(12): Card not present ... pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0 Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1 RIP: 0010:dev_driver_string+0x12/0x40 pci_destroy_slot pciehp_remove pcie_port_remove_service device_release_driver_internal bus_remove_device device_del device_unregister remove_iter device_for_each_child pcie_portdrv_remove pci_device_remove device_release_driver_internal bus_remove_device device_del pci_remove_bus_device (recursive invocation) pci_remove_bus_device pciehp_unconfigure_device pciehp_disable_slot pciehp_handle_presence_or_link_change pciehp_ist Link: https://lore.kernel.org/r/4bfd4c0e976c1776cd08e76603903b338cf25729.1728579288.git.lukas@wunner.de Reported-by: Dennis Wassenberg <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Tested-by: Dennis Wassenberg <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Mika Westerberg <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit 4bdec0d upstream. Neither SMB3.0 or SMB3.02 supports encryption negotiate context, so when SMB2_GLOBAL_CAP_ENCRYPTION flag is set in the negotiate response, the client uses AES-128-CCM as the default cipher. See MS-SMB2 3.3.5.4. Commit b0abcd6 ("smb: client: fix UAF in async decryption") added a @server->cipher_type check to conditionally call smb3_crypto_aead_allocate(), but that check would always be false as @server->cipher_type is unset for SMB3.02. Fix the following KASAN splat by setting @server->cipher_type for SMB3.02 as well. mount.cifs //srv/share /mnt -o vers=3.02,seal,... BUG: KASAN: null-ptr-deref in crypto_aead_setkey+0x2c/0x130 Read of size 8 at addr 0000000000000020 by task mount.cifs/1095 CPU: 1 UID: 0 PID: 1095 Comm: mount.cifs Not tainted 6.12.0 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ? crypto_aead_setkey+0x2c/0x130 kasan_report+0xda/0x110 ? crypto_aead_setkey+0x2c/0x130 crypto_aead_setkey+0x2c/0x130 crypt_message+0x258/0xec0 [cifs] ? __asan_memset+0x23/0x50 ? __pfx_crypt_message+0x10/0x10 [cifs] ? mark_lock+0xb0/0x6a0 ? hlock_class+0x32/0xb0 ? mark_lock+0xb0/0x6a0 smb3_init_transform_rq+0x352/0x3f0 [cifs] ? lock_acquire.part.0+0xf4/0x2a0 smb_send_rqst+0x144/0x230 [cifs] ? __pfx_smb_send_rqst+0x10/0x10 [cifs] ? hlock_class+0x32/0xb0 ? smb2_setup_request+0x225/0x3a0 [cifs] ? __pfx_cifs_compound_last_callback+0x10/0x10 [cifs] compound_send_recv+0x59b/0x1140 [cifs] ? __pfx_compound_send_recv+0x10/0x10 [cifs] ? __create_object+0x5e/0x90 ? hlock_class+0x32/0xb0 ? do_raw_spin_unlock+0x9a/0xf0 cifs_send_recv+0x23/0x30 [cifs] SMB2_tcon+0x3ec/0xb30 [cifs] ? __pfx_SMB2_tcon+0x10/0x10 [cifs] ? lock_acquire.part.0+0xf4/0x2a0 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_trylock+0xc6/0x120 ? lock_acquire+0x3f/0x90 ? _get_xid+0x16/0xd0 [cifs] ? __pfx_SMB2_tcon+0x10/0x10 [cifs] ? cifs_get_smb_ses+0xcdd/0x10a0 [cifs] cifs_get_smb_ses+0xcdd/0x10a0 [cifs] ? __pfx_cifs_get_smb_ses+0x10/0x10 [cifs] ? cifs_get_tcp_session+0xaa0/0xca0 [cifs] cifs_mount_get_session+0x8a/0x210 [cifs] dfs_mount_share+0x1b0/0x11d0 [cifs] ? __pfx___lock_acquire+0x10/0x10 ? __pfx_dfs_mount_share+0x10/0x10 [cifs] ? lock_acquire.part.0+0xf4/0x2a0 ? find_held_lock+0x8a/0xa0 ? hlock_class+0x32/0xb0 ? lock_release+0x203/0x5d0 cifs_mount+0xb3/0x3d0 [cifs] ? do_raw_spin_trylock+0xc6/0x120 ? __pfx_cifs_mount+0x10/0x10 [cifs] ? lock_acquire+0x3f/0x90 ? find_nls+0x16/0xa0 ? smb3_update_mnt_flags+0x372/0x3b0 [cifs] cifs_smb3_do_mount+0x1e2/0xc80 [cifs] ? __pfx_vfs_parse_fs_string+0x10/0x10 ? __pfx_cifs_smb3_do_mount+0x10/0x10 [cifs] smb3_get_tree+0x1bf/0x330 [cifs] vfs_get_tree+0x4a/0x160 path_mount+0x3c1/0xfb0 ? kasan_quarantine_put+0xc7/0x1d0 ? __pfx_path_mount+0x10/0x10 ? kmem_cache_free+0x118/0x3e0 ? user_path_at+0x74/0xa0 __x64_sys_mount+0x1a6/0x1e0 ? __pfx___x64_sys_mount+0x10/0x10 ? mark_held_locks+0x1a/0x90 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Cc: Tom Talpey <[email protected]> Reported-by: Jianhong Yin <[email protected]> Cc: [email protected] # v6.12 Fixes: b0abcd6 ("smb: client: fix UAF in async decryption") Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit 5bee35e upstream. The drvdata is not available in release. Let's just use container_of() to get the ubd instance. Otherwise, removing a ubd device will result in a crash: RIP: 0033:blk_mq_free_tag_set+0x1f/0xba RSP: 00000000e2083bf0 EFLAGS: 00010246 RAX: 000000006021463a RBX: 0000000000000348 RCX: 0000000062604d00 RDX: 0000000004208060 RSI: 00000000605241a0 RDI: 0000000000000348 RBP: 00000000e2083c10 R08: 0000000062414010 R09: 00000000601603f7 R10: 000000000000133a R11: 000000006038c4bd R12: 0000000000000000 R13: 0000000060213a5c R14: 0000000062405d20 R15: 00000000604f7aa0 Kernel panic - not syncing: Segfault with no mm CPU: 0 PID: 17 Comm: kworker/0:1 Not tainted 6.8.0-rc3-00107-gba3f67c11638 #1 Workqueue: events mc_work_proc Stack: 00000000 604f7ef0 62c5d000 62405d20 e2083c30 6002c776 6002c755 600e47ff e2083c60 6025ffe3 04208060 603d36e0 Call Trace: [<6002c776>] ubd_device_release+0x21/0x55 [<6002c755>] ? ubd_device_release+0x0/0x55 [<600e47ff>] ? kfree+0x0/0x100 [<6025ffe3>] device_release+0x70/0xba [<60381d6a>] kobject_put+0xb5/0xe2 [<6026027b>] put_device+0x19/0x1c [<6026a036>] platform_device_put+0x26/0x29 [<6026ac5a>] platform_device_unregister+0x2c/0x2e [<6002c52e>] ubd_remove+0xb8/0xd6 [<6002bb74>] ? mconsole_reply+0x0/0x50 [<6002b926>] mconsole_remove+0x160/0x1cc [<6002bbbc>] ? mconsole_reply+0x48/0x50 [<6003379c>] ? um_set_signals+0x3b/0x43 [<60061c55>] ? update_min_vruntime+0x14/0x70 [<6006251f>] ? dequeue_task_fair+0x164/0x235 [<600620aa>] ? update_cfs_group+0x0/0x40 [<603a0e77>] ? __schedule+0x0/0x3ed [<60033761>] ? um_set_signals+0x0/0x43 [<6002af6a>] mc_work_proc+0x77/0x91 [<600520b4>] process_scheduled_works+0x1af/0x2c3 [<6004ede3>] ? assign_work+0x0/0x58 [<600527a1>] worker_thread+0x2f7/0x37a [<6004ee3b>] ? set_pf_worker+0x0/0x64 [<6005765d>] ? arch_local_irq_save+0x0/0x2d [<60058e07>] ? kthread_exit+0x0/0x3a [<600524aa>] ? worker_thread+0x0/0x37a [<60058f9f>] kthread+0x130/0x135 [<6002068e>] new_thread_handler+0x85/0xb6 Cc: [email protected] Signed-off-by: Tiwei Bie <[email protected]> Acked-By: Anton Ivanov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit d1db692 upstream. The drvdata is not available in release. Let's just use container_of() to get the uml_net instance. Otherwise, removing a network device will result in a crash: RIP: 0033:net_device_release+0x10/0x6f RSP: 00000000e20c7c40 EFLAGS: 00010206 RAX: 000000006002e4e7 RBX: 00000000600f1baf RCX: 00000000624074e0 RDX: 0000000062778000 RSI: 0000000060551c80 RDI: 00000000627af028 RBP: 00000000e20c7c50 R08: 00000000603ad594 R09: 00000000e20c7b70 R10: 000000000000135a R11: 00000000603ad422 R12: 0000000000000000 R13: 0000000062c7af00 R14: 0000000062406d60 R15: 00000000627700b6 Kernel panic - not syncing: Segfault with no mm CPU: 0 UID: 0 PID: 29 Comm: kworker/0:2 Not tainted 6.12.0-rc6-g59b723cd2adb #1 Workqueue: events mc_work_proc Stack: 627af028 62c7af00 e20c7c80 60276fcd 62778000 603f5820 627af028 00000000 e20c7cb0 603a2bcd 627af000 62770010 Call Trace: [<60276fcd>] device_release+0x70/0xba [<603a2bcd>] kobject_put+0xba/0xe7 [<60277265>] put_device+0x19/0x1c [<60281266>] platform_device_put+0x26/0x29 [<60281e5f>] platform_device_unregister+0x2c/0x2e [<6002ec9c>] net_remove+0x63/0x69 [<60031316>] ? mconsole_reply+0x0/0x50 [<600310c8>] mconsole_remove+0x160/0x1cc [<60087d40>] ? __remove_hrtimer+0x38/0x74 [<60087ff8>] ? hrtimer_try_to_cancel+0x8c/0x98 [<6006b3cf>] ? dl_server_stop+0x3f/0x48 [<6006b390>] ? dl_server_stop+0x0/0x48 [<600672e8>] ? dequeue_entities+0x327/0x390 [<60038fa6>] ? um_set_signals+0x0/0x43 [<6003070c>] mc_work_proc+0x77/0x91 [<60057664>] process_scheduled_works+0x1b3/0x2dd [<60055f32>] ? assign_work+0x0/0x58 [<60057f0a>] worker_thread+0x1e9/0x293 [<6005406f>] ? set_pf_worker+0x0/0x64 [<6005d65d>] ? arch_local_irq_save+0x0/0x2d [<6005d748>] ? kthread_exit+0x0/0x3a [<60057d21>] ? worker_thread+0x0/0x293 [<6005dbf1>] kthread+0x126/0x12b [<600219c5>] new_thread_handler+0x85/0xb6 Cc: [email protected] Signed-off-by: Tiwei Bie <[email protected]> Acked-By: Anton Ivanov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit 51b39d7 upstream. The drvdata is not available in release. Let's just use container_of() to get the vector_device instance. Otherwise, removing a vector device will result in a crash: RIP: 0033:vector_device_release+0xf/0x50 RSP: 00000000e187bc40 EFLAGS: 00010202 RAX: 0000000060028f61 RBX: 00000000600f1baf RCX: 00000000620074e0 RDX: 000000006220b9c0 RSI: 0000000060551c80 RDI: 0000000000000000 RBP: 00000000e187bc50 R08: 00000000603ad594 R09: 00000000e187bb70 R10: 000000000000135a R11: 00000000603ad422 R12: 00000000623ae028 R13: 000000006287a200 R14: 0000000062006d30 R15: 00000000623700b6 Kernel panic - not syncing: Segfault with no mm CPU: 0 UID: 0 PID: 16 Comm: kworker/0:1 Not tainted 6.12.0-rc6-g59b723cd2adb #1 Workqueue: events mc_work_proc Stack: 60028f61 623ae028 e187bc80 60276fcd 6220b9c0 603f5820 623ae028 00000000 e187bcb0 603a2bcd 623ae000 62370010 Call Trace: [<60028f61>] ? vector_device_release+0x0/0x50 [<60276fcd>] device_release+0x70/0xba [<603a2bcd>] kobject_put+0xba/0xe7 [<60277265>] put_device+0x19/0x1c [<60281266>] platform_device_put+0x26/0x29 [<60281e5f>] platform_device_unregister+0x2c/0x2e [<60029422>] vector_remove+0x52/0x58 [<60031316>] ? mconsole_reply+0x0/0x50 [<600310c8>] mconsole_remove+0x160/0x1cc [<603b19f4>] ? strlen+0x0/0x15 [<60066611>] ? __dequeue_entity+0x1a9/0x206 [<600666a7>] ? set_next_entity+0x39/0x63 [<6006666e>] ? set_next_entity+0x0/0x63 [<60038fa6>] ? um_set_signals+0x0/0x43 [<6003070c>] mc_work_proc+0x77/0x91 [<60057664>] process_scheduled_works+0x1b3/0x2dd [<60055f32>] ? assign_work+0x0/0x58 [<60057f0a>] worker_thread+0x1e9/0x293 [<6005406f>] ? set_pf_worker+0x0/0x64 [<6005d65d>] ? arch_local_irq_save+0x0/0x2d [<6005d748>] ? kthread_exit+0x0/0x3a [<60057d21>] ? worker_thread+0x0/0x293 [<6005dbf1>] kthread+0x126/0x12b [<600219c5>] new_thread_handler+0x85/0xb6 Cc: [email protected] Signed-off-by: Tiwei Bie <[email protected]> Acked-By: Anton Ivanov <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit 7afb867 upstream. open_cached_dir() may either race with the tcon reconnection even before compound_send_recv() or directly trigger a reconnection via SMB2_open_init() or SMB_query_info_init(). The reconnection process invokes invalidate_all_cached_dirs() via cifs_mark_open_files_invalid(), which removes all cfids from the cfids->entries list but doesn't drop a ref if has_lease isn't true. This results in the currently-being-constructed cfid not being on the list, but still having a refcount of 2. It leaks if returned from open_cached_dir(). Fix this by setting cfid->has_lease when the ref is actually taken; the cfid will not be used by other threads until it has a valid time. Addresses these kmemleaks: unreferenced object 0xffff8881090c4000 (size 1024): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O..... backtrace (crc 6f58c20f): [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350 [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881044fdcf8 (size 8): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 8 bytes): 00 cc cc cc cc cc cc cc ........ backtrace (crc 10c106a9): [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480 [<ffffffff8b7d7256>] kstrdup+0x36/0x60 [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e And addresses these BUG splats when unmounting the SMB filesystem: BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100 Modules linked in: CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty raspberrypi#49 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:umount_check+0xd0/0x100 Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41 RSP: 0018:ffff88811cc27978 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3 R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08 R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0 FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> d_walk+0x6a/0x530 shrink_dcache_for_umount+0x6a/0x200 generic_shutdown_super+0x52/0x2a0 kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050 RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0 RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610 R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000 </TASK> irq event stamp: 1163486 hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60 hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0 softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990 softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0 ---[ end trace 0000000000000000 ]--- VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty raspberrypi#49 Tainted: [W]=WARN Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 This reproduces eventually with an SMB mount and two shells running these loops concurrently - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10 echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20 echo "recovering"; iptables -F OUTPUT; sleep 10; done Fixes: ebe98f1 ("cifs: enable caching of directories for which a lease is held") Fixes: 5c86919 ("smb: client: fix use-after-free in smb2_query_info_compound()") Cc: [email protected] Signed-off-by: Paul Aurich <[email protected]> Signed-off-by: Steve French <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit 2862eee upstream. The function `c_show` was called with protection from RCU. This only ensures that `cp` will not be freed. Therefore, the reference count for `cp` can drop to zero, which will trigger a refcount use-after-free warning when `cache_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `cp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 822 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 Call Trace: <TASK> c_show+0x2fc/0x380 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: [email protected] # v4.20+ Signed-off-by: Yang Erkun <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit fd0af4c upstream. The power suppliers are always requested to suspend asynchronously, dev_pm_domain_detach() requires the caller to ensure proper synchronization of this function with power management callbacks. otherwise the detach may led to kernel panic, like below: [ 1457.107934] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000040 [ 1457.116777] Mem abort info: [ 1457.119589] ESR = 0x0000000096000004 [ 1457.123358] EC = 0x25: DABT (current EL), IL = 32 bits [ 1457.128692] SET = 0, FnV = 0 [ 1457.131764] EA = 0, S1PTW = 0 [ 1457.134920] FSC = 0x04: level 0 translation fault [ 1457.139812] Data abort info: [ 1457.142707] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1457.148196] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1457.153256] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1457.158563] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001138b6000 [ 1457.165000] [0000000000000040] pgd=0000000000000000, p4d=0000000000000000 [ 1457.171792] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 1457.178045] Modules linked in: v4l2_jpeg wave6_vpu_ctrl(-) [last unloaded: mxc_jpeg_encdec] [ 1457.186383] CPU: 0 PID: 51938 Comm: kworker/0:3 Not tainted 6.6.36-gd23d64eea511 raspberrypi#66 [ 1457.194112] Hardware name: NXP i.MX95 19X19 board (DT) [ 1457.199236] Workqueue: pm pm_runtime_work [ 1457.203247] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1457.210188] pc : genpd_runtime_suspend+0x20/0x290 [ 1457.214886] lr : __rpm_callback+0x48/0x1d8 [ 1457.218968] sp : ffff80008250bc50 [ 1457.222270] x29: ffff80008250bc50 x28: 0000000000000000 x27: 0000000000000000 [ 1457.229394] x26: 0000000000000000 x25: 0000000000000008 x24: 00000000000f4240 [ 1457.236518] x23: 0000000000000000 x22: ffff00008590f0e4 x21: 0000000000000008 [ 1457.243642] x20: ffff80008099c434 x19: ffff00008590f000 x18: ffffffffffffffff [ 1457.250766] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 63343a6d726f6674 [ 1457.257890] x14: 0000000000000004 x13: 00000000000003a4 x12: 0000000000000002 [ 1457.265014] x11: 0000000000000000 x10: 0000000000000a60 x9 : ffff80008250bbb0 [ 1457.272138] x8 : ffff000092937200 x7 : ffff0003fdf6af80 x6 : 0000000000000000 [ 1457.279262] x5 : 00000000410fd050 x4 : 0000000000200000 x3 : 0000000000000000 [ 1457.286386] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00008590f000 [ 1457.293510] Call trace: [ 1457.295946] genpd_runtime_suspend+0x20/0x290 [ 1457.300296] __rpm_callback+0x48/0x1d8 [ 1457.304038] rpm_callback+0x6c/0x78 [ 1457.307515] rpm_suspend+0x10c/0x570 [ 1457.311077] pm_runtime_work+0xc4/0xc8 [ 1457.314813] process_one_work+0x138/0x248 [ 1457.318816] worker_thread+0x320/0x438 [ 1457.322552] kthread+0x110/0x114 [ 1457.325767] ret_from_fork+0x10/0x20 Fixes: 2db16c6 ("media: imx-jpeg: Add V4L2 driver for i.MX8 JPEG Encoder/Decoder") Cc: <[email protected]> Signed-off-by: Ming Qian <[email protected]> Reviewed-by: TaoJiang <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit dbc1691 upstream. Boot with slub_debug=UFPZ. If allocated object failed in alloc_consistency_checks, all objects of the slab will be marked as used, and then the slab will be removed from the partial list. When an object belonging to the slab got freed later, the remove_full() function is called. Because the slab is neither on the partial list nor on the full list, it eventually lead to a list corruption (actually a list poison being detected). So we need to mark and isolate the slab page with metadata corruption, do not put it back in circulation. Because the debug caches avoid all the fastpaths, reusing the frozen bit to mark slab page with metadata corruption seems to be fine. [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100) [ 4277.387023] ------------[ cut here ]------------ [ 4277.387880] kernel BUG at lib/list_debug.c:56! [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G OE 6.6.1-1 #1 [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs] [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91 [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082 [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000 [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0 [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910 [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0 [ 4277.404049] FS: 0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000 [ 4277.405357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0 [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4277.410000] PKRU: 55555554 [ 4277.410645] Call Trace: [ 4277.411234] <TASK> [ 4277.411777] ? die+0x32/0x80 [ 4277.412439] ? do_trap+0xd6/0x100 [ 4277.413150] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.414158] ? do_error_trap+0x6a/0x90 [ 4277.414948] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.415915] ? exc_invalid_op+0x4c/0x60 [ 4277.416710] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.417675] ? asm_exc_invalid_op+0x16/0x20 [ 4277.418482] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.419466] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.420410] free_to_partial_list+0x515/0x5e0 [ 4277.421242] ? xfs_iext_remove+0x41a/0xa10 [xfs] [ 4277.422298] xfs_iext_remove+0x41a/0xa10 [xfs] [ 4277.423316] ? xfs_inodegc_worker+0xb4/0x1a0 [xfs] [ 4277.424383] xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs] [ 4277.425490] __xfs_bunmapi+0x50d/0x840 [xfs] [ 4277.426445] xfs_itruncate_extents_flags+0x13a/0x490 [xfs] [ 4277.427553] xfs_inactive_truncate+0xa3/0x120 [xfs] [ 4277.428567] xfs_inactive+0x22d/0x290 [xfs] [ 4277.429500] xfs_inodegc_worker+0xb4/0x1a0 [xfs] [ 4277.430479] process_one_work+0x171/0x340 [ 4277.431227] worker_thread+0x277/0x390 [ 4277.431962] ? __pfx_worker_thread+0x10/0x10 [ 4277.432752] kthread+0xf0/0x120 [ 4277.433382] ? __pfx_kthread+0x10/0x10 [ 4277.434134] ret_from_fork+0x2d/0x50 [ 4277.434837] ? __pfx_kthread+0x10/0x10 [ 4277.435566] ret_from_fork_asm+0x1b/0x30 [ 4277.436280] </TASK> Fixes: 643b113 ("slub: enable tracking of full slabs") Suggested-by: Hyeonggon Yoo <[email protected]> Suggested-by: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: yuan.gao <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit be8f982 upstream. The function `e_show` was called with protection from RCU. This only ensures that `exp` will not be freed. Therefore, the reference count for `exp` can drop to zero, which will trigger a refcount use-after-free warning when `exp_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `exp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 819 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 ... Call Trace: <TASK> e_show+0x20b/0x230 [nfsd] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: bf18f16 ("NFSD: Using exp_get for export getting") Cc: [email protected] # 4.20+ Signed-off-by: Yang Erkun <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 10, 2024
commit b61badd upstream. [ +0.000021] BUG: KASAN: slab-use-after-free in drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000027] Read of size 8 at addr ffff8881b8605f88 by task amd_pci_unplug/2147 [ +0.000023] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1 [ +0.000016] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.000016] Call Trace: [ +0.000008] <TASK> [ +0.000009] dump_stack_lvl+0x76/0xa0 [ +0.000017] print_report+0xce/0x5f0 [ +0.000017] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] ? srso_return_thunk+0x5/0x5f [ +0.000015] ? kasan_complete_mode_report_info+0x72/0x200 [ +0.000016] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000019] kasan_report+0xbe/0x110 [ +0.000015] ? drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000023] __asan_report_load8_noabort+0x14/0x30 [ +0.000014] drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000016] ? __pfx_drm_sched_entity_flush+0x10/0x10 [gpu_sched] [ +0.000020] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? enable_work+0x124/0x220 [ +0.000015] ? __pfx_enable_work+0x10/0x10 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? free_large_kmalloc+0x85/0xf0 [ +0.000016] drm_sched_entity_destroy+0x18/0x30 [gpu_sched] [ +0.000020] amdgpu_vce_sw_fini+0x55/0x170 [amdgpu] [ +0.000735] ? __kasan_check_read+0x11/0x20 [ +0.000016] vce_v4_0_sw_fini+0x80/0x110 [amdgpu] [ +0.000726] amdgpu_device_fini_sw+0x331/0xfc0 [amdgpu] [ +0.000679] ? mutex_unlock+0x80/0xe0 [ +0.000017] ? __pfx_amdgpu_device_fini_sw+0x10/0x10 [amdgpu] [ +0.000662] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? __kasan_check_write+0x14/0x30 [ +0.000013] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? mutex_unlock+0x80/0xe0 [ +0.000016] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000663] drm_minor_release+0xc9/0x140 [drm] [ +0.000081] drm_release+0x1fd/0x390 [drm] [ +0.000082] __fput+0x36c/0xad0 [ +0.000018] __fput_sync+0x3c/0x50 [ +0.000014] __x64_sys_close+0x7d/0xe0 [ +0.000014] x64_sys_call+0x1bc6/0x2680 [ +0.000014] do_syscall_64+0x70/0x130 [ +0.000014] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit_to_user_mode+0x60/0x190 [ +0.000015] ? srso_return_thunk+0x5/0x5f [ +0.000014] ? irqentry_exit+0x43/0x50 [ +0.000012] ? srso_return_thunk+0x5/0x5f [ +0.000013] ? exc_page_fault+0x7c/0x110 [ +0.000015] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] RIP: 0033:0x7ffff7b14f67 [ +0.000013] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff [ +0.000026] RSP: 002b:00007fffffffe378 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ +0.000019] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67 [ +0.000014] RDX: 0000000000000000 RSI: 00007ffff7f6f47a RDI: 0000000000000003 [ +0.000014] RBP: 00007fffffffe3a0 R08: 0000555555569890 R09: 0000000000000000 [ +0.000014] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8 [ +0.000013] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040 [ +0.000020] </TASK> [ +0.000016] Allocated by task 383 on cpu 7 at 26.880319s: [ +0.000014] kasan_save_stack+0x28/0x60 [ +0.000008] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_alloc_info+0x38/0x60 [ +0.000007] __kasan_kmalloc+0xc1/0xd0 [ +0.000007] kmalloc_trace_noprof+0x180/0x380 [ +0.000007] drm_sched_init+0x411/0xec0 [gpu_sched] [ +0.000012] amdgpu_device_init+0x695f/0xa610 [amdgpu] [ +0.000658] amdgpu_driver_load_kms+0x1a/0x120 [amdgpu] [ +0.000662] amdgpu_pci_probe+0x361/0xf30 [amdgpu] [ +0.000651] local_pci_probe+0xe7/0x1b0 [ +0.000009] pci_device_probe+0x248/0x890 [ +0.000008] really_probe+0x1fd/0x950 [ +0.000008] __driver_probe_device+0x307/0x410 [ +0.000007] driver_probe_device+0x4e/0x150 [ +0.000007] __driver_attach+0x223/0x510 [ +0.000006] bus_for_each_dev+0x102/0x1a0 [ +0.000007] driver_attach+0x3d/0x60 [ +0.000006] bus_add_driver+0x2ac/0x5f0 [ +0.000006] driver_register+0x13d/0x490 [ +0.000008] __pci_register_driver+0x1ee/0x2b0 [ +0.000007] llc_sap_close+0xb0/0x160 [llc] [ +0.000009] do_one_initcall+0x9c/0x3e0 [ +0.000008] do_init_module+0x241/0x760 [ +0.000008] load_module+0x51ac/0x6c30 [ +0.000006] __do_sys_init_module+0x234/0x270 [ +0.000007] __x64_sys_init_module+0x73/0xc0 [ +0.000006] x64_sys_call+0xe3/0x2680 [ +0.000006] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000015] Freed by task 2147 on cpu 6 at 160.507651s: [ +0.000013] kasan_save_stack+0x28/0x60 [ +0.000007] kasan_save_track+0x18/0x70 [ +0.000007] kasan_save_free_info+0x3b/0x60 [ +0.000007] poison_slab_object+0x115/0x1c0 [ +0.000007] __kasan_slab_free+0x34/0x60 [ +0.000007] kfree+0xfa/0x2f0 [ +0.000007] drm_sched_fini+0x19d/0x410 [gpu_sched] [ +0.000012] amdgpu_fence_driver_sw_fini+0xc4/0x2f0 [amdgpu] [ +0.000662] amdgpu_device_fini_sw+0x77/0xfc0 [amdgpu] [ +0.000653] amdgpu_driver_release_kms+0x16/0x80 [amdgpu] [ +0.000655] drm_minor_release+0xc9/0x140 [drm] [ +0.000071] drm_release+0x1fd/0x390 [drm] [ +0.000071] __fput+0x36c/0xad0 [ +0.000008] __fput_sync+0x3c/0x50 [ +0.000007] __x64_sys_close+0x7d/0xe0 [ +0.000007] x64_sys_call+0x1bc6/0x2680 [ +0.000007] do_syscall_64+0x70/0x130 [ +0.000007] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000014] The buggy address belongs to the object at ffff8881b8605f80 which belongs to the cache kmalloc-64 of size 64 [ +0.000020] The buggy address is located 8 bytes inside of freed 64-byte region [ffff8881b8605f80, ffff8881b8605fc0) [ +0.000028] The buggy address belongs to the physical page: [ +0.000011] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1b8605 [ +0.000008] anon flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff) [ +0.000007] page_type: 0xffffefff(slab) [ +0.000009] raw: 0017ffffc0000000 ffff8881000428c0 0000000000000000 dead000000000001 [ +0.000006] raw: 0000000000000000 0000000000200020 00000001ffffefff 0000000000000000 [ +0.000006] page dumped because: kasan: bad access detected [ +0.000012] Memory state around the buggy address: [ +0.000011] ffff8881b8605e80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000015] ffff8881b8605f00: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ +0.000015] >ffff8881b8605f80: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ +0.000013] ^ [ +0.000011] ffff8881b8606000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc [ +0.000014] ffff8881b8606080: fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb fb [ +0.000013] ================================================================== The issue reproduced on VG20 during the IGT pci_unplug test. The root cause of the issue is that the function drm_sched_fini is called before drm_sched_entity_kill. In drm_sched_fini, the drm_sched_rq structure is freed, but this structure is later accessed by each entity within the run queue, leading to invalid memory access. To resolve this, the order of cleanup calls is updated: Before: amdgpu_fence_driver_sw_fini amdgpu_device_ip_fini After: amdgpu_device_ip_fini amdgpu_fence_driver_sw_fini This updated order ensures that all entities in the IPs are cleaned up first, followed by proper cleanup of the schedulers. Additional Investigation: During debugging, another issue was identified in the amdgpu_vce_sw_fini function. The vce.vcpu_bo buffer must be freed only as the final step in the cleanup process to prevent any premature access during earlier cleanup stages. v2: Using Christian suggestion call drm_sched_entity_destroy before drm_sched_fini. Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
commit 5954821 upstream. Due to incorrect dev->product reporting by certain devices, null pointer dereferences occur when dev->product is empty, leading to potential system crashes. This issue was found on EXCELSIOR DL37-D05 device with Loongson-LS3A6000-7A2000-DL37 motherboard. Kernel logs: [ 56.470885] usb 4-3: new full-speed USB device number 4 using ohci-pci [ 56.671638] usb 4-3: string descriptor 0 read error: -22 [ 56.671644] usb 4-3: New USB device found, idVendor=056a, idProduct=0374, bcdDevice= 1.07 [ 56.671647] usb 4-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 56.678839] hid-generic 0003:056A:0374.0004: hiddev0,hidraw3: USB HID v1.10 Device [HID 056a:0374] on usb-0000:00:05.0-3/input0 [ 56.697719] CPU 2 Unable to handle kernel paging request at virtual address 0000000000000000, era == 90000000066e35c8, ra == ffff800004f98a80 [ 56.697732] Oops[#1]: [ 56.697734] CPU: 2 PID: 2742 Comm: (udev-worker) Tainted: G OE 6.6.0-loong64-desktop raspberrypi#25.00.2000.015 [ 56.697737] Hardware name: Inspur CE520L2/C09901N000000000, BIOS 2.09.00 10/11/2024 [ 56.697739] pc 90000000066e35c8 ra ffff800004f98a80 tp 9000000125478000 sp 900000012547b8a0 [ 56.697741] a0 0000000000000000 a1 ffff800004818b28 a2 0000000000000000 a3 0000000000000000 [ 56.697743] a4 900000012547b8f0 a5 0000000000000000 a6 0000000000000000 a7 0000000000000000 [ 56.697745] t0 ffff800004818b2d t1 0000000000000000 t2 0000000000000003 t3 0000000000000005 [ 56.697747] t4 0000000000000000 t5 0000000000000000 t6 0000000000000000 t7 0000000000000000 [ 56.697748] t8 0000000000000000 u0 0000000000000000 s9 0000000000000000 s0 900000011aa48028 [ 56.697750] s1 0000000000000000 s2 0000000000000000 s3 ffff800004818e80 s4 ffff800004810000 [ 56.697751] s5 90000001000b98d0 s6 ffff800004811f88 s7 ffff800005470440 s8 0000000000000000 [ 56.697753] ra: ffff800004f98a80 wacom_update_name+0xe0/0x300 [wacom] [ 56.697802] ERA: 90000000066e35c8 strstr+0x28/0x120 [ 56.697806] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 56.697816] PRMD: 0000000c (PPLV0 +PIE +PWE) [ 56.697821] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 56.697827] ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) [ 56.697831] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 56.697835] BADV: 0000000000000000 [ 56.697836] PRID: 0014d000 (Loongson-64bit, Loongson-3A6000) [ 56.697838] Modules linked in: wacom(+) bnep bluetooth rfkill qrtr nls_iso8859_1 nls_cp437 snd_hda_codec_conexant snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore input_leds mousedev led_class joydev deepin_netmonitor(OE) fuse nfnetlink dmi_sysfs ip_tables x_tables overlay amdgpu amdxcp drm_exec gpu_sched drm_buddy radeon drm_suballoc_helper i2c_algo_bit drm_ttm_helper r8169 ttm drm_display_helper spi_loongson_pci xhci_pci cec xhci_pci_renesas spi_loongson_core hid_generic realtek gpio_loongson_64bit [ 56.697887] Process (udev-worker) (pid: 2742, threadinfo=00000000aee0d8b4, task=00000000a9eff1f3) [ 56.697890] Stack : 0000000000000000 ffff800004817e00 0000000000000000 0000251c00000000 [ 56.697896] 0000000000000000 00000011fffffffd 0000000000000000 0000000000000000 [ 56.697901] 0000000000000000 1b67a968695184b9 0000000000000000 90000001000b98d0 [ 56.697906] 90000001000bb8d0 900000011aa48028 0000000000000000 ffff800004f9d74c [ 56.697911] 90000001000ba000 ffff800004f9ce58 0000000000000000 ffff800005470440 [ 56.697916] ffff800004811f88 90000001000b98d0 9000000100da2aa8 90000001000bb8d0 [ 56.697921] 0000000000000000 90000001000ba000 900000011aa48028 ffff800004f9d74c [ 56.697926] ffff8000054704e8 90000001000bb8b8 90000001000ba000 0000000000000000 [ 56.697931] 90000001000bb8d0 9000000006307564 9000000005e666e0 90000001752359b8 [ 56.697936] 9000000008cbe400 900000000804d000 9000000005e666e0 0000000000000000 [ 56.697941] ... [ 56.697944] Call Trace: [ 56.697945] [<90000000066e35c8>] strstr+0x28/0x120 [ 56.697950] [<ffff800004f98a80>] wacom_update_name+0xe0/0x300 [wacom] [ 56.698000] [<ffff800004f9ce58>] wacom_parse_and_register+0x338/0x900 [wacom] [ 56.698050] [<ffff800004f9d74c>] wacom_probe+0x32c/0x420 [wacom] [ 56.698099] [<9000000006307564>] hid_device_probe+0x144/0x260 [ 56.698103] [<9000000005e65d68>] really_probe+0x208/0x540 [ 56.698109] [<9000000005e661dc>] __driver_probe_device+0x13c/0x1e0 [ 56.698112] [<9000000005e66620>] driver_probe_device+0x40/0x100 [ 56.698116] [<9000000005e6680c>] __device_attach_driver+0x12c/0x180 [ 56.698119] [<9000000005e62bc8>] bus_for_each_drv+0x88/0x160 [ 56.698123] [<9000000005e66468>] __device_attach+0x108/0x260 [ 56.698126] [<9000000005e63918>] device_reprobe+0x78/0x100 [ 56.698129] [<9000000005e62a68>] bus_for_each_dev+0x88/0x160 [ 56.698132] [<9000000006304e54>] __hid_bus_driver_added+0x34/0x80 [ 56.698134] [<9000000005e62bc8>] bus_for_each_drv+0x88/0x160 [ 56.698137] [<9000000006304df0>] __hid_register_driver+0x70/0xa0 [ 56.698142] [<9000000004e10fe4>] do_one_initcall+0x104/0x320 [ 56.698146] [<9000000004f38150>] do_init_module+0x90/0x2c0 [ 56.698151] [<9000000004f3a3d8>] init_module_from_file+0xb8/0x120 [ 56.698155] [<9000000004f3a590>] idempotent_init_module+0x150/0x3a0 [ 56.698159] [<9000000004f3a890>] sys_finit_module+0xb0/0x140 [ 56.698163] [<900000000671e4e8>] do_syscall+0x88/0xc0 [ 56.698166] [<9000000004e12404>] handle_syscall+0xc4/0x160 [ 56.698171] Code: 0011958f 00150224 5800cd85 <2a00022c> 0015000 4000c180 0015022c 03400000 03400000 [ 56.698192] ---[ end trace 0000000000000000 ]--- Fixes: 09dc28a ("HID: wacom: Improve generic name generation") Reported-by: Zhenxing Chen <[email protected]> Co-developed-by: Xu Rao <[email protected]> Signed-off-by: Xu Rao <[email protected]> Signed-off-by: WangYuli <[email protected]> Link: https://patch.msgid.link/[email protected] Cc: [email protected] Signed-off-by: Benjamin Tissoires <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
commit ee1dfbd upstream. In commit 6e86a15 ("can: dev: provide optional GPIO based termination support") GPIO based termination support was added. For no particular reason that patch uses gpiod_set_value() to set the GPIO. This leads to the following warning, if the systems uses a sleeping GPIO, i.e. behind an I2C port expander: | WARNING: CPU: 0 PID: 379 at /drivers/gpio/gpiolib.c:3496 gpiod_set_value+0x50/0x6c | CPU: 0 UID: 0 PID: 379 Comm: ip Not tainted 6.11.0-20241016-1 #1 823affae360cc91126e4d316d7a614a8bf86236c Replace gpiod_set_value() by gpiod_set_value_cansleep() to allow the use of sleeping GPIOs. Cc: Nicolai Buchwitz <[email protected]> Cc: Lino Sanfilippo <[email protected]> Cc: [email protected] Reported-by: Leonard Göhrs <[email protected]> Tested-by: Leonard Göhrs <[email protected]> Fixes: 6e86a15 ("can: dev: provide optional GPIO based termination support") Link: https://patch.msgid.link/20241121-dev-fix-can_set_termination-v1-1-41fa6e29216d@pengutronix.de Signed-off-by: Marc Kleine-Budde <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
commit 07c903d upstream. System crash is observed with stack trace warning of use after free. There are 2 signals to tell dpc_thread to terminate (UNLOADING flag and kthread_stop). On setting the UNLOADING flag when dpc_thread happens to run at the time and sees the flag, this causes dpc_thread to exit and clean up itself. When kthread_stop is called for final cleanup, this causes use after free. Remove UNLOADING signal to terminate dpc_thread. Use the kthread_stop as the main signal to exit dpc_thread. [596663.812935] kernel BUG at mm/slub.c:294! [596663.812950] invalid opcode: 0000 [#1] SMP PTI [596663.812957] CPU: 13 PID: 1475935 Comm: rmmod Kdump: loaded Tainted: G IOE --------- - - 4.18.0-240.el8.x86_64 #1 [596663.812960] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/20/2012 [596663.812974] RIP: 0010:__slab_free+0x17d/0x360 ... [596663.813008] Call Trace: [596663.813022] ? __dentry_kill+0x121/0x170 [596663.813030] ? _cond_resched+0x15/0x30 [596663.813034] ? _cond_resched+0x15/0x30 [596663.813039] ? wait_for_completion+0x35/0x190 [596663.813048] ? try_to_wake_up+0x63/0x540 [596663.813055] free_task+0x5a/0x60 [596663.813061] kthread_stop+0xf3/0x100 [596663.813103] qla2x00_remove_one+0x284/0x440 [qla2xxx] Cc: [email protected] Signed-off-by: Quinn Tran <[email protected]> Signed-off-by: Nilesh Javali <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
…imary CPU commit b3fce42 upstream. Commit 5944ce0 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cff ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed. If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer: BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier. Moreover, before determining the validity of the last-level cache info, ensure that it has been allocated. Simply checking for non-zero cache_leaves() is not sufficient, as some architectures (e.g., Intel processors) have non-zero cache_leaves() before allocation. Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size(). This function iterates over all online CPUs. However, a CPU may have come online recently, but its cacheinfo may not have been allocated yet. While here, remove an unnecessary indentation in allocate_cache_info(). [ bp: Massage. ] Fixes: 6539cff ("cacheinfo: Add arch specific early level initializer") Signed-off-by: Ricardo Neri <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Radu Rendec <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Reviewed-by: Andreas Herrmann <[email protected]> Reviewed-by: Sudeep Holla <[email protected]> Cc: [email protected] # 6.3+ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
commit ab244dd upstream. Jordy reported issue against XSKMAP which also applies to DEVMAP - the index used for accessing map entry, due to being a signed integer, causes the OOB writes. Fix is simple as changing the type from int to u32, however, when compared to XSKMAP case, one more thing needs to be addressed. When map is released from system via dev_map_free(), we iterate through all of the entries and an iterator variable is also an int, which implies OOB accesses. Again, change it to be u32. Example splat below: [ 160.724676] BUG: unable to handle page fault for address: ffffc8fc2c001000 [ 160.731662] #PF: supervisor read access in kernel mode [ 160.736876] #PF: error_code(0x0000) - not-present page [ 160.742095] PGD 0 P4D 0 [ 160.744678] Oops: Oops: 0000 [#1] PREEMPT SMP [ 160.749106] CPU: 1 UID: 0 PID: 520 Comm: kworker/u145:12 Not tainted 6.12.0-rc1+ raspberrypi#487 [ 160.757050] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 160.767642] Workqueue: events_unbound bpf_map_free_deferred [ 160.773308] RIP: 0010:dev_map_free+0x77/0x170 [ 160.777735] Code: 00 e8 fd 91 ed ff e8 b8 73 ed ff 41 83 7d 18 19 74 6e 41 8b 45 24 49 8b bd f8 00 00 00 31 db 85 c0 74 48 48 63 c3 48 8d 04 c7 <48> 8b 28 48 85 ed 74 30 48 8b 7d 18 48 85 ff 74 05 e8 b3 52 fa ff [ 160.796777] RSP: 0018:ffffc9000ee1fe38 EFLAGS: 00010202 [ 160.802086] RAX: ffffc8fc2c001000 RBX: 0000000080000000 RCX: 0000000000000024 [ 160.809331] RDX: 0000000000000000 RSI: 0000000000000024 RDI: ffffc9002c001000 [ 160.816576] RBP: 0000000000000000 R08: 0000000000000023 R09: 0000000000000001 [ 160.823823] R10: 0000000000000001 R11: 00000000000ee6b2 R12: dead000000000122 [ 160.831066] R13: ffff88810c928e00 R14: ffff8881002df405 R15: 0000000000000000 [ 160.838310] FS: 0000000000000000(0000) GS:ffff8897e0c40000(0000) knlGS:0000000000000000 [ 160.846528] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 160.852357] CR2: ffffc8fc2c001000 CR3: 0000000005c32006 CR4: 00000000007726f0 [ 160.859604] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 160.866847] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 160.874092] PKRU: 55555554 [ 160.876847] Call Trace: [ 160.879338] <TASK> [ 160.881477] ? __die+0x20/0x60 [ 160.884586] ? page_fault_oops+0x15a/0x450 [ 160.888746] ? search_extable+0x22/0x30 [ 160.892647] ? search_bpf_extables+0x5f/0x80 [ 160.896988] ? exc_page_fault+0xa9/0x140 [ 160.900973] ? asm_exc_page_fault+0x22/0x30 [ 160.905232] ? dev_map_free+0x77/0x170 [ 160.909043] ? dev_map_free+0x58/0x170 [ 160.912857] bpf_map_free_deferred+0x51/0x90 [ 160.917196] process_one_work+0x142/0x370 [ 160.921272] worker_thread+0x29e/0x3b0 [ 160.925082] ? rescuer_thread+0x4b0/0x4b0 [ 160.929157] kthread+0xd4/0x110 [ 160.932355] ? kthread_park+0x80/0x80 [ 160.936079] ret_from_fork+0x2d/0x50 [ 160.943396] ? kthread_park+0x80/0x80 [ 160.950803] ret_from_fork_asm+0x11/0x20 [ 160.958482] </TASK> Fixes: 546ac1f ("bpf: add devmap, a map for storing net device references") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
commit 32cd3db upstream. Jordy says: " In the xsk_map_delete_elem function an unsigned integer (map->max_entries) is compared with a user-controlled signed integer (k). Due to implicit type conversion, a large unsigned value for map->max_entries can bypass the intended bounds check: if (k >= map->max_entries) return -EINVAL; This allows k to hold a negative value (between -2147483648 and -2), which is then used as an array index in m->xsk_map[k], which results in an out-of-bounds access. spin_lock_bh(&m->lock); map_entry = &m->xsk_map[k]; // Out-of-bounds map_entry old_xs = unrcu_pointer(xchg(map_entry, NULL)); // Oob write if (old_xs) xsk_map_sock_delete(old_xs, map_entry); spin_unlock_bh(&m->lock); The xchg operation can then be used to cause an out-of-bounds write. Moreover, the invalid map_entry passed to xsk_map_sock_delete can lead to further memory corruption. " It indeed results in following splat: [76612.897343] BUG: unable to handle page fault for address: ffffc8fc2e461108 [76612.904330] #PF: supervisor write access in kernel mode [76612.909639] #PF: error_code(0x0002) - not-present page [76612.914855] PGD 0 P4D 0 [76612.917431] Oops: Oops: 0002 [#1] PREEMPT SMP [76612.921859] CPU: 11 UID: 0 PID: 10318 Comm: a.out Not tainted 6.12.0-rc1+ raspberrypi#470 [76612.929189] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [76612.939781] RIP: 0010:xsk_map_delete_elem+0x2d/0x60 [76612.944738] Code: 00 00 41 54 55 53 48 63 2e 3b 6f 24 73 38 4c 8d a7 f8 00 00 00 48 89 fb 4c 89 e7 e8 2d bf 05 00 48 8d b4 eb 00 01 00 00 31 ff <48> 87 3e 48 85 ff 74 05 e8 16 ff ff ff 4c 89 e7 e8 3e bc 05 00 31 [76612.963774] RSP: 0018:ffffc9002e407df8 EFLAGS: 00010246 [76612.969079] RAX: 0000000000000000 RBX: ffffc9002e461000 RCX: 0000000000000000 [76612.976323] RDX: 0000000000000001 RSI: ffffc8fc2e461108 RDI: 0000000000000000 [76612.983569] RBP: ffffffff80000001 R08: 0000000000000000 R09: 0000000000000007 [76612.990812] R10: ffffc9002e407e18 R11: ffff888108a38858 R12: ffffc9002e4610f8 [76612.998060] R13: ffff888108a38858 R14: 00007ffd1ae0ac78 R15: ffffc9002e4610c0 [76613.005303] FS: 00007f80b6f59740(0000) GS:ffff8897e0ec0000(0000) knlGS:0000000000000000 [76613.013517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [76613.019349] CR2: ffffc8fc2e461108 CR3: 000000011e3ef001 CR4: 00000000007726f0 [76613.026595] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [76613.033841] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [76613.041086] PKRU: 55555554 [76613.043842] Call Trace: [76613.046331] <TASK> [76613.048468] ? __die+0x20/0x60 [76613.051581] ? page_fault_oops+0x15a/0x450 [76613.055747] ? search_extable+0x22/0x30 [76613.059649] ? search_bpf_extables+0x5f/0x80 [76613.063988] ? exc_page_fault+0xa9/0x140 [76613.067975] ? asm_exc_page_fault+0x22/0x30 [76613.072229] ? xsk_map_delete_elem+0x2d/0x60 [76613.076573] ? xsk_map_delete_elem+0x23/0x60 [76613.080914] __sys_bpf+0x19b7/0x23c0 [76613.084555] __x64_sys_bpf+0x1a/0x20 [76613.088194] do_syscall_64+0x37/0xb0 [76613.091832] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [76613.096962] RIP: 0033:0x7f80b6d1e88d [76613.100592] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48 [76613.119631] RSP: 002b:00007ffd1ae0ac68 EFLAGS: 00000206 ORIG_RAX: 0000000000000141 [76613.131330] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f80b6d1e88d [76613.142632] RDX: 0000000000000098 RSI: 00007ffd1ae0ad20 RDI: 0000000000000003 [76613.153967] RBP: 00007ffd1ae0adc0 R08: 0000000000000000 R09: 0000000000000000 [76613.166030] R10: 00007f80b6f77040 R11: 0000000000000206 R12: 00007ffd1ae0aed8 [76613.177130] R13: 000055ddf42ce1e9 R14: 000055ddf42d0d98 R15: 00007f80b6fab040 [76613.188129] </TASK> Fix this by simply changing key type from int to u32. Fixes: fbfc504 ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP") CC: [email protected] Reported-by: Jordy Zomer <[email protected]> Suggested-by: Jordy Zomer <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
[ Upstream commit b9e9ed9 ] For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ raspberrypi#49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ raspberrypi#49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock. Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map. There are four callers for ->map_fd_put_ptr: (1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well. (2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist. Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is. (3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t. (4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t. After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable|enable} pair. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
…ode. [ Upstream commit d5c367e ] creating a large files during checkpoint disable until it runs out of space and then delete it, then remount to enable checkpoint again, and then unmount the filesystem triggers the f2fs_bug_on as below: ------------[ cut here ]------------ kernel BUG at fs/f2fs/inode.c:896! CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty raspberrypi#360 Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI RIP: 0010:f2fs_evict_inode+0x58c/0x610 Call Trace: __die_body+0x15/0x60 die+0x33/0x50 do_trap+0x10a/0x120 f2fs_evict_inode+0x58c/0x610 do_error_trap+0x60/0x80 f2fs_evict_inode+0x58c/0x610 exc_invalid_op+0x53/0x60 f2fs_evict_inode+0x58c/0x610 asm_exc_invalid_op+0x16/0x20 f2fs_evict_inode+0x58c/0x610 evict+0x101/0x260 dispose_list+0x30/0x50 evict_inodes+0x140/0x190 generic_shutdown_super+0x2f/0x150 kill_block_super+0x11/0x40 kill_f2fs_super+0x7d/0x140 deactivate_locked_super+0x2a/0x70 cleanup_mnt+0xb3/0x140 task_work_run+0x61/0x90 The root cause is: creating large files during disable checkpoint period results in not enough free segments, so when writing back root inode will failed in f2fs_enable_checkpoint. When umount the file system after enabling checkpoint, the root inode is dirty in f2fs_evict_inode function, which triggers BUG_ON. The steps to reproduce are as follows: dd if=/dev/zero of=f2fs.img bs=1M count=55 mount f2fs.img f2fs_dir -o checkpoint=disable:10% dd if=/dev/zero of=big bs=1M count=50 sync rm big mount -o remount,checkpoint=enable f2fs_dir umount f2fs_dir Let's redirty inode when there is not free segments during checkpoint is disable. Signed-off-by: Qi Han <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
…to avoid deadlock [ Upstream commit 6cf7b65 ] A deadlock may happen since the i3c_master_register() acquires &i3cbus->lock twice. See the log below. Use i3cdev->desc->info instead of calling i3c_device_info() to avoid acquiring the lock twice. v2: - Modified the title and commit message ============================================ WARNING: possible recursive locking detected 6.11.0-mainline -------------------------------------------- init/1 is trying to acquire lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_bus_normaluse_lock but task is already holding lock: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&i3cbus->lock); lock(&i3cbus->lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by init/1: #0: fcffff809b6798f8 (&dev->mutex){....}-{3:3}, at: __driver_attach #1: f1ffff80a6a40dc0 (&i3cbus->lock){++++}-{3:3}, at: i3c_master_register stack backtrace: CPU: 6 UID: 0 PID: 1 Comm: init Call trace: dump_backtrace+0xfc/0x17c show_stack+0x18/0x28 dump_stack_lvl+0x40/0xc0 dump_stack+0x18/0x24 print_deadlock_bug+0x388/0x390 __lock_acquire+0x18bc/0x32ec lock_acquire+0x134/0x2b0 down_read+0x50/0x19c i3c_bus_normaluse_lock+0x14/0x24 i3c_device_get_info+0x24/0x58 i3c_device_uevent+0x34/0xa4 dev_uevent+0x310/0x384 kobject_uevent_env+0x244/0x414 kobject_uevent+0x14/0x20 device_add+0x278/0x460 device_register+0x20/0x34 i3c_master_register_new_i3c_devs+0x78/0x154 i3c_master_register+0x6a0/0x6d4 mtk_i3c_master_probe+0x3b8/0x4d8 platform_probe+0xa0/0xe0 really_probe+0x114/0x454 __driver_probe_device+0xa0/0x15c driver_probe_device+0x3c/0x1ac __driver_attach+0xc4/0x1f0 bus_for_each_dev+0x104/0x160 driver_attach+0x24/0x34 bus_add_driver+0x14c/0x294 driver_register+0x68/0x104 __platform_driver_register+0x20/0x30 init_module+0x20/0xfe4 do_one_initcall+0x184/0x464 do_init_module+0x58/0x1ec load_module+0xefc/0x10c8 __arm64_sys_finit_module+0x238/0x33c invoke_syscall+0x58/0x10c el0_svc_common+0xa8/0xdc do_el0_svc+0x1c/0x28 el0_svc+0x50/0xac el0t_64_sync_handler+0x70/0xbc el0t_64_sync+0x1a8/0x1ac Signed-off-by: Defa Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
[ Upstream commit 88fd2b7 ] Commit bab1c29 ("LoongArch: Fix sleeping in atomic context in setup_tlb_handler()") changes the gfp flag from GFP_KERNEL to GFP_ATOMIC for alloc_pages_node(). However, for PREEMPT_RT kernels we can still get a "sleeping in atomic context" error: [ 0.372259] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 0.372266] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 [ 0.372268] preempt_count: 1, expected: 0 [ 0.372270] RCU nest depth: 1, expected: 1 [ 0.372272] 3 locks held by swapper/1/0: [ 0.372274] #0: 900000000c9f5e60 (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x524/0x1c60 [ 0.372294] #1: 90000000087013b8 (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x50/0x140 [ 0.372305] #2: 900000047fffd388 (&zone->lock){+.+.}-{3:3}, at: __rmqueue_pcplist+0x30c/0xea0 [ 0.372314] irq event stamp: 0 [ 0.372316] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 0.372322] hardirqs last disabled at (0): [<9000000005947320>] copy_process+0x9c0/0x26e0 [ 0.372329] softirqs last enabled at (0): [<9000000005947320>] copy_process+0x9c0/0x26e0 [ 0.372335] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 0.372341] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7+ raspberrypi#1891 [ 0.372346] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 [ 0.372349] Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 9000000100388000 [ 0.372486] 900000010038b890 0000000000000000 900000010038b898 9000000007e53788 [ 0.372492] 900000000815bcc8 900000000815bcc0 900000010038b700 0000000000000001 [ 0.372498] 0000000000000001 4b031894b9d6b725 00000000055ec000 9000000100338fc0 [ 0.372503] 00000000000000c4 0000000000000001 000000000000002d 0000000000000003 [ 0.372509] 0000000000000030 0000000000000003 00000000055ec000 0000000000000003 [ 0.372515] 900000000806d000 9000000007e53788 00000000000000b0 0000000000000004 [ 0.372521] 0000000000000000 0000000000000000 900000000c9f5f10 0000000000000000 [ 0.372526] 90000000076f12d8 9000000007e53788 9000000005924778 0000000000000000 [ 0.372532] 00000000000000b0 0000000000000004 0000000000000000 0000000000070000 [ 0.372537] ... [ 0.372540] Call Trace: [ 0.372542] [<9000000005924778>] show_stack+0x38/0x180 [ 0.372548] [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [ 0.372555] [<900000000599b880>] __might_resched+0x1a0/0x260 [ 0.372561] [<90000000071675cc>] rt_spin_lock+0x4c/0x140 [ 0.372565] [<9000000005cbb768>] __rmqueue_pcplist+0x308/0xea0 [ 0.372570] [<9000000005cbed84>] get_page_from_freelist+0x564/0x1c60 [ 0.372575] [<9000000005cc0d98>] __alloc_pages_noprof+0x218/0x1820 [ 0.372580] [<900000000593b36c>] tlb_init+0x1ac/0x298 [ 0.372585] [<9000000005924b74>] per_cpu_trap_init+0x114/0x140 [ 0.372589] [<9000000005921964>] cpu_probe+0x4e4/0xa60 [ 0.372592] [<9000000005934874>] start_secondary+0x34/0xc0 [ 0.372599] [<900000000715615c>] smpboot_entry+0x64/0x6c This is because in PREEMPT_RT kernels normal spinlocks are replaced by rt spinlocks and rt_spin_lock() will cause sleeping. Fix it by disabling NUMA optimization completely for PREEMPT_RT kernels. Signed-off-by: Huacai Chen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
spockfish
pushed a commit
that referenced
this pull request
Dec 16, 2024
…A in a MM [ Upstream commit 091c1dd ] We currently assume that there is at least one VMA in a MM, which isn't true. So we might end up having find_vma() return NULL, to then de-reference NULL. So properly handle find_vma() returning NULL. This fixes the report: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024 RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline] RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194 Code: ... RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000 RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044 RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1 R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003 R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8 FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709 __do_sys_migrate_pages mm/mempolicy.c:1727 [inline] __se_sys_migrate_pages mm/mempolicy.c:1723 [inline] __x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [[email protected]: add unlikely()] Link: https://lkml.kernel.org/r/[email protected] Fixes: 3974388 ("[PATCH] Swap Migration V5: sys_migrate_pages interface") Signed-off-by: David Hildenbrand <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/lkml/[email protected]/T/ Reviewed-by: Liam R. Howlett <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.