test: Revert "test: Update v4.19 surface" #2

kitakar5525 · 2020-05-16T06:57:01Z

Reverts #1

…f fs_info::journal_info commit fcc9973 upstream. [BUG] One run of btrfs/063 triggered the following lockdep warning: ============================================ WARNING: possible recursive locking detected 5.6.0-rc7-custom+ linux-surface#48 Not tainted -------------------------------------------- kworker/u24:0/7 is trying to acquire lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] but task is already holding lock: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sb_internal#2); lock(sb_internal#2); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/u24:0/7: #0: ffff88817b495948 ((wq_completion)btrfs-endio-write){+.+.}, at: process_one_work+0x557/0xb80 #1: ffff888189ea7db8 ((work_completion)(&work->normal_work)){+.+.}, at: process_one_work+0x557/0xb80 #2: ffff88817d3a46e0 (sb_internal#2){.+.+}, at: start_transaction+0x66c/0x890 [btrfs] #3: ffff888174ca4da8 (&fs_info->reloc_mutex){+.+.}, at: btrfs_record_root_in_trans+0x83/0xd0 [btrfs] stack backtrace: CPU: 0 PID: 7 Comm: kworker/u24:0 Not tainted 5.6.0-rc7-custom+ linux-surface#48 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-endio-write btrfs_work_helper [btrfs] Call Trace: dump_stack+0xc2/0x11a __lock_acquire.cold+0xce/0x214 lock_acquire+0xe6/0x210 __sb_start_write+0x14e/0x290 start_transaction+0x66c/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] find_free_extent+0x1504/0x1a50 [btrfs] btrfs_reserve_extent+0xd5/0x1f0 [btrfs] btrfs_alloc_tree_block+0x1ac/0x570 [btrfs] btrfs_copy_root+0x213/0x580 [btrfs] create_reloc_root+0x3bd/0x470 [btrfs] btrfs_init_reloc_root+0x2d2/0x310 [btrfs] record_root_in_trans+0x191/0x1d0 [btrfs] btrfs_record_root_in_trans+0x90/0xd0 [btrfs] start_transaction+0x16e/0x890 [btrfs] btrfs_join_transaction+0x1d/0x20 [btrfs] btrfs_finish_ordered_io+0x55d/0xcd0 [btrfs] finish_ordered_fn+0x15/0x20 [btrfs] btrfs_work_helper+0x116/0x9a0 [btrfs] process_one_work+0x632/0xb80 worker_thread+0x80/0x690 kthread+0x1a3/0x1f0 ret_from_fork+0x27/0x50 It's pretty hard to reproduce, only one hit so far. [CAUSE] This is because we're calling btrfs_join_transaction() without re-using the current running one: btrfs_finish_ordered_io() |- btrfs_join_transaction() <<< Call #1 |- btrfs_record_root_in_trans() |- btrfs_reserve_extent() |- btrfs_join_transaction() <<< Call #2 Normally such btrfs_join_transaction() call should re-use the existing one, without trying to re-start a transaction. But the problem is, in btrfs_join_transaction() call #1, we call btrfs_record_root_in_trans() before initializing current::journal_info. And in btrfs_join_transaction() call #2, we're relying on current::journal_info to avoid such deadlock. [FIX] Call btrfs_record_root_in_trans() after we have initialized current::journal_info. CC: [email protected] # 4.4+ Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

When station connected to AP, and run TX traffic such as TCP/UDP, and system enter suspend state, then mac80211 call ath10k_flush with set drop flag, recently it only send wmi peer flush to firmware and firmware will flush all pending TX packets, for PCIe, firmware will indicate the TX packets status to ath10k, and then ath10k indicate to mac80211 TX complete with the status, then all the packets has been flushed at this moment. For SDIO chip, it is different, its TX complete indication is disabled by default, and it has a tx queue in ath10k, and its tx credit control is enabled, total tx credit is 96, when its credit is not sufficient, then the packets will buffered in the tx queue of ath10k, max packets is TARGET_TLV_NUM_MSDU_DESC_HL which is 1024, for SDIO, when mac80211 call ath10k_flush with set drop flag, maybe it have pending packets in tx queue of ath10k, and if it does not have sufficient tx credit, the packets will stay in queue untill tx credit report from firmware, if it is a noisy environment, tx speed is low and the tx credit report from firmware will delay more time, then the num_pending_tx will remain > 0 untill all packets send to firmware. After the 1st ath10k_flush, mac80211 will call the 2nd ath10k_flush without set drop flag immediately, then it will call to ath10k_mac_wait_tx_complete, and it wait untill num_pending_tx become to 0, in noisy environment, it is esay to wait about near 5 seconds, then it cause the suspend take long time. 1st and 2nd callstack of ath10k_flush [ 303.740427] ath10k_sdio mmc1:0001:1: ath10k_flush drop:1, pending:0-0 [ 303.740495] ------------[ cut here ]------------ [ 303.740739] WARNING: CPU: 1 PID: 3921 at /mnt/host/source/src/third_party/kernel/v4.19/drivers/net/wireless/ath/ath10k/mac.c:7025 ath10k_flush+0x54/0x104 [ath10k_core] [ 303.740757] Modules linked in: bridge stp llc ath10k_sdio ath10k_core rfcomm uinput cros_ec_rpmsg mtk_seninf mtk_cam_isp mtk_vcodec_enc mtk_fd mtk_vcodec_dec mtk_vcodec_common mtk_dip mtk_mdp3 videobuf2_dma_contig videobuf2_memops v4l2_mem2mem videobuf2_v4l2 videobuf2_common hid_google_hammer hci_uart btqca bluetooth dw9768 ov8856 ecdh_generic ov02a10 v4l2_fwnode mtk_scp mtk_rpmsg rpmsg_core mtk_scp_ipi ipt_MASQUERADE fuse iio_trig_sysfs cros_ec_sensors_ring cros_ec_sensors_sync cros_ec_light_prox cros_ec_sensors industrialio_triggered_buffer [ 303.740914] kfifo_buf cros_ec_activity cros_ec_sensors_core lzo_rle lzo_compress ath mac80211 zram cfg80211 joydev [last unloaded: ath10k_core] [ 303.741009] CPU: 1 PID: 3921 Comm: kworker/u16:10 Tainted: G W 4.19.95 #2 [ 303.741027] Hardware name: MediaTek krane sku176 board (DT) [ 303.741061] Workqueue: events_unbound async_run_entry_fn [ 303.741086] pstate: 60000005 (nZCv daif -PAN -UAO) [ 303.741166] pc : ath10k_flush+0x54/0x104 [ath10k_core] [ 303.741244] lr : ath10k_flush+0x54/0x104 [ath10k_core] [ 303.741260] sp : ffffffdf080e77a0 [ 303.741276] x29: ffffffdf080e77a0 x28: ffffffdef3730040 [ 303.741300] x27: ffffff907c2240a0 x26: ffffffde6ff39afc [ 303.741321] x25: ffffffdef3730040 x24: ffffff907bf61018 [ 303.741343] x23: ffffff907c2240a0 x22: ffffffde6ff39a50 [ 303.741364] x21: 0000000000000001 x20: ffffffde6ff39a50 [ 303.741385] x19: ffffffde6bac2420 x18: 0000000000017200 [ 303.741407] x17: ffffff907c24a000 x16: 0000000000000037 [ 303.741428] x15: ffffff907b49a568 x14: ffffff907cf332c1 [ 303.741476] x13: 00000000000922e4 x12: 0000000000000000 [ 303.741497] x11: 0000000000000001 x10: 0000000000000007 [ 303.741518] x9 : f2256b8c1de4bc00 x8 : f2256b8c1de4bc00 [ 303.741539] x7 : ffffff907ab5e764 x6 : 0000000000000000 [ 303.741560] x5 : 0000000000000080 x4 : 0000000000000001 [ 303.741582] x3 : ffffffdf080e74a8 x2 : ffffff907aa91244 [ 303.741603] x1 : ffffffdf080e74a8 x0 : 0000000000000024 [ 303.741624] Call trace: [ 303.741701] ath10k_flush+0x54/0x104 [ath10k_core] [ 303.741941] __ieee80211_flush_queues+0x1dc/0x358 [mac80211] [ 303.742098] ieee80211_flush_queues+0x34/0x44 [mac80211] [ 303.742253] ieee80211_set_disassoc+0xc0/0x5ec [mac80211] [ 303.742399] ieee80211_mgd_deauth+0x720/0x7d4 [mac80211] [ 303.742535] ieee80211_deauth+0x24/0x30 [mac80211] [ 303.742720] cfg80211_mlme_deauth+0x250/0x3bc [cfg80211] [ 303.742849] cfg80211_mlme_down+0x90/0xd0 [cfg80211] [ 303.742971] cfg80211_disconnect+0x340/0x3a0 [cfg80211] [ 303.743087] __cfg80211_leave+0xe4/0x17c [cfg80211] [ 303.743203] cfg80211_leave+0x38/0x50 [cfg80211] [ 303.743319] wiphy_suspend+0x84/0x5bc [cfg80211] [ 303.743335] dpm_run_callback+0x170/0x304 [ 303.743346] __device_suspend+0x2dc/0x3e8 [ 303.743356] async_suspend+0x2c/0xb0 [ 303.743370] async_run_entry_fn+0x48/0xf8 [ 303.743383] process_one_work+0x304/0x604 [ 303.743394] worker_thread+0x248/0x3f4 [ 303.743403] kthread+0x120/0x130 [ 303.743416] ret_from_fork+0x10/0x18 [ 303.743812] ath10k_sdio mmc1:0001:1: ath10k_flush drop:0, pending:0-0 [ 303.743858] ------------[ cut here ]------------ [ 303.744057] WARNING: CPU: 1 PID: 3921 at /mnt/host/source/src/third_party/kernel/v4.19/drivers/net/wireless/ath/ath10k/mac.c:7025 ath10k_flush+0x54/0x104 [ath10k_core] [ 303.744075] Modules linked in: bridge stp llc ath10k_sdio ath10k_core rfcomm uinput cros_ec_rpmsg mtk_seninf mtk_cam_isp mtk_vcodec_enc mtk_fd mtk_vcodec_dec mtk_vcodec_common mtk_dip mtk_mdp3 videobuf2_dma_contig videobuf2_memops v4l2_mem2mem videobuf2_v4l2 videobuf2_common hid_google_hammer hci_uart btqca bluetooth dw9768 ov8856 ecdh_generic ov02a10 v4l2_fwnode mtk_scp mtk_rpmsg rpmsg_core mtk_scp_ipi ipt_MASQUERADE fuse iio_trig_sysfs cros_ec_sensors_ring cros_ec_sensors_sync cros_ec_light_prox cros_ec_sensors industrialio_triggered_buffer kfifo_buf cros_ec_activity cros_ec_sensors_core lzo_rle lzo_compress ath mac80211 zram cfg80211 joydev [last unloaded: ath10k_core] [ 303.744256] CPU: 1 PID: 3921 Comm: kworker/u16:10 Tainted: G W 4.19.95 #2 [ 303.744273] Hardware name: MediaTek krane sku176 board (DT) [ 303.744301] Workqueue: events_unbound async_run_entry_fn [ 303.744325] pstate: 60000005 (nZCv daif -PAN -UAO) [ 303.744403] pc : ath10k_flush+0x54/0x104 [ath10k_core] [ 303.744480] lr : ath10k_flush+0x54/0x104 [ath10k_core] [ 303.744496] sp : ffffffdf080e77a0 [ 303.744512] x29: ffffffdf080e77a0 x28: ffffffdef3730040 [ 303.744534] x27: ffffff907c2240a0 x26: ffffffde6ff39afc [ 303.744556] x25: ffffffdef3730040 x24: ffffff907bf61018 [ 303.744577] x23: ffffff907c2240a0 x22: ffffffde6ff39a50 [ 303.744598] x21: 0000000000000000 x20: ffffffde6ff39a50 [ 303.744620] x19: ffffffde6bac2420 x18: 000000000001831c [ 303.744641] x17: ffffff907c24a000 x16: 0000000000000037 [ 303.744662] x15: ffffff907b49a568 x14: ffffff907cf332c1 [ 303.744683] x13: 00000000000922ea x12: 0000000000000000 [ 303.744704] x11: 0000000000000001 x10: 0000000000000007 [ 303.744747] x9 : f2256b8c1de4bc00 x8 : f2256b8c1de4bc00 [ 303.744768] x7 : ffffff907ab5e764 x6 : 0000000000000000 [ 303.744789] x5 : 0000000000000080 x4 : 0000000000000001 [ 303.744810] x3 : ffffffdf080e74a8 x2 : ffffff907aa91244 [ 303.744831] x1 : ffffffdf080e74a8 x0 : 0000000000000024 [ 303.744853] Call trace: [ 303.744929] ath10k_flush+0x54/0x104 [ath10k_core] [ 303.745098] __ieee80211_flush_queues+0x1dc/0x358 [mac80211] [ 303.745277] ieee80211_flush_queues+0x34/0x44 [mac80211] [ 303.745424] ieee80211_set_disassoc+0x108/0x5ec [mac80211] [ 303.745569] ieee80211_mgd_deauth+0x720/0x7d4 [mac80211] [ 303.745706] ieee80211_deauth+0x24/0x30 [mac80211] [ 303.745853] cfg80211_mlme_deauth+0x250/0x3bc [cfg80211] [ 303.745979] cfg80211_mlme_down+0x90/0xd0 [cfg80211] [ 303.746103] cfg80211_disconnect+0x340/0x3a0 [cfg80211] [ 303.746219] __cfg80211_leave+0xe4/0x17c [cfg80211] [ 303.746335] cfg80211_leave+0x38/0x50 [cfg80211] [ 303.746452] wiphy_suspend+0x84/0x5bc [cfg80211] [ 303.746467] dpm_run_callback+0x170/0x304 [ 303.746477] __device_suspend+0x2dc/0x3e8 [ 303.746487] async_suspend+0x2c/0xb0 [ 303.746498] async_run_entry_fn+0x48/0xf8 [ 303.746510] process_one_work+0x304/0x604 [ 303.746521] worker_thread+0x248/0x3f4 [ 303.746530] kthread+0x120/0x130 [ 303.746542] ret_from_fork+0x10/0x18 one sample's debugging log: it wait 3190 ms(5000 - 1810). 1st ath10k_flush, it has 120 packets in tx queue of ath10k: <...>-1513 [000] .... 25374.786005: ath10k_log_err: ath10k_sdio mmc1:0001:1 ath10k_flush drop:1, pending:120-0 <...>-1513 [000] ...1 25374.788375: ath10k_log_warn: ath10k_sdio mmc1:0001:1 ath10k_htt_tx_mgmt_inc_pending htt->num_pending_mgmt_tx:0 <...>-1500 [001] .... 25374.790143: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:121 2st ath10k_flush, it has 121 packets in tx queue of ath10k: <...>-1513 [000] .... 25374.790571: ath10k_log_err: ath10k_sdio mmc1:0001:1 ath10k_flush drop:0, pending:121-0 <...>-1513 [000] .... 25374.791990: ath10k_log_err: ath10k_sdio mmc1:0001:1 ath10k_mac_wait_tx_complete state:1 pending:121-0 <...>-1508 [001] .... 25374.792696: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit update: delta:46 <...>-1508 [001] .... 25374.792700: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit total:46 <...>-1508 [001] .... 25374.792729: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:121 <...>-1508 [001] .... 25374.792937: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx status:0, eid:1, req count:88, count:32, len:49792 <...>-1508 [001] .... 25374.793031: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx status:0, eid:1, req count:75, count:14, len:21784 kworker/u16:0-25773 [003] .... 25374.793701: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx complete, eid:1, pending complete count:46 <...>-1881 [000] .... 25375.073178: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit update: delta:24 <...>-1881 [000] .... 25375.073182: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit total:24 <...>-1881 [000] .... 25375.073429: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:75 <...>-1879 [001] .... 25375.074090: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx complete, eid:1, pending complete count:24 <...>-1881 [000] .... 25375.074123: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx status:0, eid:1, req count:51, count:24, len:37344 <...>-1879 [001] .... 25375.270126: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit update: delta:26 <...>-1879 [001] .... 25375.270130: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit total:26 <...>-1488 [000] .... 25375.270174: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:51 <...>-1488 [000] .... 25375.270529: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx status:0, eid:1, req count:25, count:26, len:40456 <...>-1879 [001] .... 25375.270693: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx complete, eid:1, pending complete count:26 <...>-1488 [001] .... 25377.775885: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit update: delta:12 <...>-1488 [001] .... 25377.775890: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit total:12 <...>-1488 [001] .... 25377.775933: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:25 <...>-1488 [001] .... 25377.776059: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx status:0, eid:1, req count:13, count:12, len:18672 <...>-1879 [001] .... 25377.776100: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx complete, eid:1, pending complete count:12 <...>-1488 [001] .... 25377.878079: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit update: delta:15 <...>-1488 [001] .... 25377.878087: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit total:15 <...>-1879 [000] .... 25377.878323: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:13 <...>-1879 [000] .... 25377.878487: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx status:0, eid:1, req count:0, count:13, len:20228 <...>-1879 [000] .... 25377.878497: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx complete, eid:1, pending complete count:13 <...>-1488 [001] .... 25377.919927: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit update: delta:11 <...>-1488 [001] .... 25377.919932: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 credit total:13 <...>-1488 [001] .... 25377.919976: ath10k_log_dbg: ath10k_sdio mmc1:0001:1 bundle tx work, eid:1, count:0 <...>-1881 [000] .... 25377.982645: ath10k_log_warn: ath10k_sdio mmc1:0001:1 HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION status:0 <...>-1513 [001] .... 25377.982973: ath10k_log_err: ath10k_sdio mmc1:0001:1 ath10k_mac_wait_tx_complete time_left:1810, pending:0-0 Flush all pending TX packets for the 1st ath10k_flush reduced the wait time of the 2nd ath10k_flush and then suspend take short time. This Patch only effect SDIO chips. Tested with QCA6174 SDIO with firmware WLAN.RMH.4.4.1-00042. Signed-off-by: Wen Gong <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected] (cherry picked from commit dd7fc55 git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git ath-next) BUG=b:147569228 TEST=test success with suspend Change-Id: Ie8b3de267e8a1ad8c0030e5b2d1f3c81da386393 Signed-off-by: Wen Gong <[email protected]> Signed-off-by: Claire Chang <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2097996 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Nicolas Boichat <[email protected]>

… consoles We want to enable kgdb to debug the early parts of the kernel. Unfortunately kgdb normally is a client of the tty API in the kernel and serial drivers don't register to the tty layer until fairly late in the boot process. Serial drivers do, however, commonly register a boot console. Let's enable the kgdboc driver to work with boot consoles to provide early debugging. This change co-opts the existing read() function pointer that's part of "struct console". It's assumed that if a boot console (with the flag CON_BOOT) has implemented read() that both the read() and write() function are polling functions. That means they work without interrupts and read() will return immediately (with 0 bytes read) if there's nothing to read. This should be a safe assumption since it appears that no current boot consoles implement read() right now and there seems no reason to do so unless they wanted to support "kgdboc_earlycon". The normal/expected way to make all this work is to use "kgdboc_earlycon" and "kgdboc" together. You should point them both to the same physical serial connection. At boot time, as the system transitions from the boot console to the normal console (and registers a tty), kgdb will switch over. One awkward part of all this, though, is that there can be a window where the boot console goes away and we can't quite transtion over to the main kgdboc that uses the tty layer. There are two main problems: 1. The act of registering the tty doesn't cause any call into kgdboc so there is a window of time when the tty is there but kgdboc's init code hasn't been called so we can't transition to it. 2. On some serial drivers the normal console inits (and replaces the boot console) quite early in the system. Presumably these drivers were coded up before earlycon worked as well as it does today and probably they don't need to do this anymore, but it causes us problems nontheless. Problem #1 is not too big of a deal somewhat due to the luck of probe ordering. kgdboc is last in the tty/serial/Makefile so its probe gets right after all other tty devices. It's not fun to rely on this, but it does work for the most part. Problem #2 is a big deal, but only for some serial drivers. Other serial drivers end up registering the console (which gets rid of the boot console) and tty at nearly the same time. The way we'll deal with the window when the system has stopped using the boot console and the time when we're setup using the tty is to keep using the boot console. This may sound surprising, but it has been found to work well in practice. If it doesn't work, it shouldn't be too hard for a given serial driver to make it keep working. Specifically, it's expected that the read()/write() function provided in the boot console should be the same (or nearly the same) as the normal kgdb polling functions. That means continuing to use them should work just fine. To make things even more likely to work work we'll also trap the recently added exit() function in the boot console we're using and delay any calls to it until we're all done with the boot console. NOTE: there could be ways to use all this in weird / unexpected ways. If you do something like this, it's a bit of a buyer beware situation. Specifically: - If you specify only "kgdboc_earlycon" but not "kgdboc" then (depending on your serial driver) things will probably work OK, but you'll get a warning printed the first time you use kgdb after the boot console is gone. You'd only be able to do this, of course, if the serial driver you're running atop provided an early boot console. - If your "kgdboc_earlycon" and "kgdboc" devices are not the same device things should work OK, but it'll be your job to switch over which device you're monitoring (including figuring out how to switch over gdb in-flight if you're using it). When trying to enable "kgdboc_earlycon" it should be noted that the names that are registered through the boot console layer and the tty layer are not the same for the same port. For example when debugging on one board I'd need to pass "kgdboc_earlycon=qcom_geni kgdboc=ttyMSM0" to enable things properly. Since digging up the boot console name is a pain and there will rarely be more than one boot console enabled, you can provide the "kgdboc_earlycon" parameter without specifying the name of the boot console. In this case we'll just pick the first boot that implements read() that we find. This new "kgdboc_earlycon" parameter should be contrasted to the existing "ekgdboc" parameter. While both provide a way to debug very early, the usage and mechanisms are quite different. Specifically "kgdboc_earlycon" is meant to be used in tandem with "kgdboc" and there is a transition from one to the other. The "ekgdboc" parameter, on the other hand, replaces the "kgdboc" parameter. It runs the same logic as the "kgdboc" parameter but just relies on your TTY driver being present super early. The only known usage of the old "ekgdboc" parameter is documented as "ekgdboc=kbd earlyprintk=vga". It should be noted that "kbd" has special treatment allowing it to init early as a tty device. Signed-off-by: Douglas Anderson <[email protected]> Reviewed-by: Greg Kroah-Hartman <[email protected]> Tested-by: Sumit Garg <[email protected]> Link: https://lore.kernel.org/r/20200507130644.v4.8.I8fba5961bf452ab92350654aa61957f23ecf0100@changeid Signed-off-by: Daniel Thompson <[email protected]> (cherry picked from commit 2209956 git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux.git kgdb/for-next) BUG=b:148533752 TEST=w/ series kgdb_earlycon works on trogdor Change-Id: I8fba5961bf452ab92350654aa61957f23ecf0100 Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2135954 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Evan Green <[email protected]> Tested-by: Douglas Anderson <[email protected]> Commit-Queue: Douglas Anderson <[email protected]>

This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e linux-surface#5 [f418de68] sctp_do_sm at f8db8c77 [sctp] linux-surface#6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] linux-surface#7 [f418df48] inet_shutdown at c080baf9 linux-surface#8 [f418df5c] sys_shutdown at c079eedf linux-surface#9 [f418df70] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <[email protected]> CC: Vlad Yasevich <[email protected]> CC: "David S. Miller" <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Ido Schimmel says: ==================== netdevsim: Two small fixes Fix two bugs observed while analyzing regression failures. Patch #1 fixes a bug where sometimes the drop counter of a packet trap policer would not increase. Patch #2 adds a missing initialization of a variable in a related selftest. ==================== Signed-off-by: David S. Miller <[email protected]>

Ido Schimmel says: ==================== mlxsw: Various fixes Patch #1 from Jiri fixes a use-after-free discovered while fuzzing mlxsw / devlink with syzkaller. Patch #2 from Amit works around a limitation in new versions of arping, which is used in several selftests. ==================== Signed-off-by: David S. Miller <[email protected]>

…inux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix a warning and a leak [ver #2] Here are a couple of fixes for AF_RXRPC: (1) Fix an uninitialised variable warning. (2) Fix a leak of the ticket on error in rxkad. ==================== Signed-off-by: David S. Miller <[email protected]>

[ Upstream commit 20a785a ] This BUG halt was reported a while back, but the patch somehow got missed: PID: 2879 TASK: c16adaa0 CPU: 1 COMMAND: "sctpn" #0 [f418dd28] crash_kexec at c04a7d8c #1 [f418dd7c] oops_end at c0863e02 #2 [f418dd90] do_invalid_op at c040aaca #3 [f418de28] error_code (via invalid_op) at c08631a5 EAX: f34baac0 EBX: 00000090 ECX: f418deb0 EDX: f5542950 EBP: 00000000 DS: 007b ESI: f34ba800 ES: 007b EDI: f418dea0 GS: 00e0 CS: 0060 EIP: c046fa5e ERR: ffffffff EFLAGS: 00010286 #4 [f418de5c] add_timer at c046fa5e linux-surface#5 [f418de68] sctp_do_sm at f8db8c77 [sctp] linux-surface#6 [f418df30] sctp_primitive_SHUTDOWN at f8dcc1b5 [sctp] linux-surface#7 [f418df48] inet_shutdown at c080baf9 linux-surface#8 [f418df5c] sys_shutdown at c079eedf linux-surface#9 [f418df70] sys_socketcall at c079fe88 EAX: ffffffda EBX: 0000000d ECX: bfceea90 EDX: 0937af98 DS: 007b ESI: 0000000c ES: 007b EDI: b7150ae4 SS: 007b ESP: bfceea7c EBP: bfceeaa8 GS: 0033 CS: 0073 EIP: b775c424 ERR: 00000066 EFLAGS: 00000282 It appears that the side effect that starts the shutdown timer was processed multiple times, which can happen as multiple paths can trigger it. This of course leads to the BUG halt in add_timer getting called. Fix seems pretty straightforward, just check before the timer is added if its already been started. If it has mod the timer instead to min(current expiration, new expiration) Its been tested but not confirmed to fix the problem, as the issue has only occured in production environments where test kernels are enjoined from being installed. It appears to be a sane fix to me though. Also, recentely, Jere found a reproducer posted on list to confirm that this resolves the issues Signed-off-by: Neil Horman <[email protected]> CC: Vlad Yasevich <[email protected]> CC: "David S. Miller" <[email protected]> CC: [email protected] CC: [email protected] CC: [email protected] Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit f6766ff ] We need to check mddev->del_work before flush workqueu since the purpose of flush is to ensure the previous md is disappeared. Otherwise the similar deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the bdev->bd_mutex before flush workqueue. kernel: [ 154.522645] ====================================================== kernel: [ 154.522647] WARNING: possible circular locking dependency detected kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default linux-surface#25 Tainted: G O kernel: [ 154.522651] ------------------------------------------------------ kernel: [ 154.522653] mdadm/2482 is trying to acquire lock: kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0 kernel: [ 154.522673] kernel: [ 154.522673] but task is already holding lock: kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522691] kernel: [ 154.522691] which lock already depends on the new lock. kernel: [ 154.522691] kernel: [ 154.522694] kernel: [ 154.522694] the existing dependency chain (in reverse order) is: kernel: [ 154.522696] kernel: [ 154.522696] -> #4 (&bdev->bd_mutex){+.+.}: kernel: [ 154.522704] __mutex_lock+0x87/0x950 kernel: [ 154.522706] __blkdev_get+0x79/0x590 kernel: [ 154.522708] blkdev_get+0x65/0x140 kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40 kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod] kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod] kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod] kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522735] vfs_write+0xad/0x1a0 kernel: [ 154.522737] ksys_write+0xa4/0xe0 kernel: [ 154.522745] do_syscall_64+0x64/0x2b0 kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522749] kernel: [ 154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}: kernel: [ 154.522752] __mutex_lock+0x87/0x950 kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod] kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522763] vfs_write+0xad/0x1a0 kernel: [ 154.522765] ksys_write+0xa4/0xe0 kernel: [ 154.522767] do_syscall_64+0x64/0x2b0 kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522770] kernel: [ 154.522770] -> #2 (kn->count#253){++++}: kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0 kernel: [ 154.522778] kernfs_remove+0x1f/0x30 kernel: [ 154.522780] kobject_del+0x28/0x60 kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod] kernel: [ 154.522786] process_one_work+0x2a7/0x5f0 kernel: [ 154.522788] worker_thread+0x2d/0x3d0 kernel: [ 154.522793] kthread+0x117/0x130 kernel: [ 154.522795] ret_from_fork+0x3a/0x50 kernel: [ 154.522796] kernel: [ 154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}: kernel: [ 154.522800] process_one_work+0x27e/0x5f0 kernel: [ 154.522802] worker_thread+0x2d/0x3d0 kernel: [ 154.522804] kthread+0x117/0x130 kernel: [ 154.522806] ret_from_fork+0x3a/0x50 kernel: [ 154.522807] kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}: kernel: [ 154.522813] __lock_acquire+0x1392/0x1690 kernel: [ 154.522816] lock_acquire+0xb4/0x1a0 kernel: [ 154.522818] flush_workqueue+0xab/0x4b0 kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522823] __blkdev_get+0xea/0x590 kernel: [ 154.522825] blkdev_get+0x65/0x140 kernel: [ 154.522828] do_dentry_open+0x1d1/0x380 kernel: [ 154.522831] path_openat+0x567/0xcc0 kernel: [ 154.522834] do_filp_open+0x9b/0x110 kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522838] do_sys_open+0x57/0x80 kernel: [ 154.522840] do_syscall_64+0x64/0x2b0 kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522844] kernel: [ 154.522844] other info that might help us debug this: kernel: [ 154.522844] kernel: [ 154.522846] Chain exists of: kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex kernel: [ 154.522846] kernel: [ 154.522850] Possible unsafe locking scenario: kernel: [ 154.522850] kernel: [ 154.522852] CPU0 CPU1 kernel: [ 154.522853] ---- ---- kernel: [ 154.522854] lock(&bdev->bd_mutex); kernel: [ 154.522856] lock(&mddev->reconfig_mutex); kernel: [ 154.522858] lock(&bdev->bd_mutex); kernel: [ 154.522860] lock((wq_completion)md_misc); kernel: [ 154.522861] kernel: [ 154.522861] *** DEADLOCK *** kernel: [ 154.522861] kernel: [ 154.522864] 1 lock held by mdadm/2482: kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522868] kernel: [ 154.522868] stack backtrace: kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default linux-surface#25 kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 kernel: [ 154.522878] Call Trace: kernel: [ 154.522881] dump_stack+0x8f/0xcb kernel: [ 154.522884] check_noncircular+0x194/0x1b0 kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690 kernel: [ 154.522890] __lock_acquire+0x1392/0x1690 kernel: [ 154.522893] lock_acquire+0xb4/0x1a0 kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522898] flush_workqueue+0xab/0x4b0 kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522910] __blkdev_get+0xea/0x590 kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522914] blkdev_get+0x65/0x140 kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522918] do_dentry_open+0x1d1/0x380 kernel: [ 154.522921] path_openat+0x567/0xcc0 kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690 kernel: [ 154.522926] do_filp_open+0x9b/0x110 kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0 kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630 kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0 kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522944] do_sys_open+0x57/0x80 kernel: [ 154.522946] do_syscall_64+0x64/0x2b0 kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae And md_alloc also flushed the same workqueue, but the thing is different here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and the flush is necessary to avoid race condition, so leave it as it is. Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit 54505a1 upstream. The commits cd0e00c and 92d7223 broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: Mikulas Patocka <[email protected]> Fixes: cd0e00c ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: [email protected] # v4.17+ Acked-by: Ivan Kokshaysky <[email protected]> Reviewed-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Matt Turner <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 2d3a8e2 ] In blkdev_get() we call __blkdev_get() to do some internal jobs and if there is some errors in __blkdev_get(), the bdput() is called which means we have released the refcount of the bdev (actually the refcount of the bdev inode). This means we cannot access bdev after that point. But acctually bdev is still accessed in blkdev_get() after calling __blkdev_get(). This results in use-after-free if the refcount is the last one we released in __blkdev_get(). Let's take a look at the following scenerio: CPU0 CPU1 CPU2 blkdev_open blkdev_open Remove disk bd_acquire blkdev_get __blkdev_get del_gendisk bdev_unhash_inode bd_acquire bdev_get_gendisk bd_forget failed because of unhashed bdput bdput (the last one) bdev_evict_inode access bdev => use after free [ 459.350216] BUG: KASAN: use-after-free in __lock_acquire+0x24c1/0x31b0 [ 459.351190] Read of size 8 at addr ffff88806c815a80 by task syz-executor.0/20132 [ 459.352347] [ 459.352594] CPU: 0 PID: 20132 Comm: syz-executor.0 Not tainted 4.19.90 #2 [ 459.353628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 459.354947] Call Trace: [ 459.355337] dump_stack+0x111/0x19e [ 459.355879] ? __lock_acquire+0x24c1/0x31b0 [ 459.356523] print_address_description+0x60/0x223 [ 459.357248] ? __lock_acquire+0x24c1/0x31b0 [ 459.357887] kasan_report.cold+0xae/0x2d8 [ 459.358503] __lock_acquire+0x24c1/0x31b0 [ 459.359120] ? _raw_spin_unlock_irq+0x24/0x40 [ 459.359784] ? lockdep_hardirqs_on+0x37b/0x580 [ 459.360465] ? _raw_spin_unlock_irq+0x24/0x40 [ 459.361123] ? finish_task_switch+0x125/0x600 [ 459.361812] ? finish_task_switch+0xee/0x600 [ 459.362471] ? mark_held_locks+0xf0/0xf0 [ 459.363108] ? __schedule+0x96f/0x21d0 [ 459.363716] lock_acquire+0x111/0x320 [ 459.364285] ? blkdev_get+0xce/0xbe0 [ 459.364846] ? blkdev_get+0xce/0xbe0 [ 459.365390] __mutex_lock+0xf9/0x12a0 [ 459.365948] ? blkdev_get+0xce/0xbe0 [ 459.366493] ? bdev_evict_inode+0x1f0/0x1f0 [ 459.367130] ? blkdev_get+0xce/0xbe0 [ 459.367678] ? destroy_inode+0xbc/0x110 [ 459.368261] ? mutex_trylock+0x1a0/0x1a0 [ 459.368867] ? __blkdev_get+0x3e6/0x1280 [ 459.369463] ? bdev_disk_changed+0x1d0/0x1d0 [ 459.370114] ? blkdev_get+0xce/0xbe0 [ 459.370656] blkdev_get+0xce/0xbe0 [ 459.371178] ? find_held_lock+0x2c/0x110 [ 459.371774] ? __blkdev_get+0x1280/0x1280 [ 459.372383] ? lock_downgrade+0x680/0x680 [ 459.373002] ? lock_acquire+0x111/0x320 [ 459.373587] ? bd_acquire+0x21/0x2c0 [ 459.374134] ? do_raw_spin_unlock+0x4f/0x250 [ 459.374780] blkdev_open+0x202/0x290 [ 459.375325] do_dentry_open+0x49e/0x1050 [ 459.375924] ? blkdev_get_by_dev+0x70/0x70 [ 459.376543] ? __x64_sys_fchdir+0x1f0/0x1f0 [ 459.377192] ? inode_permission+0xbe/0x3a0 [ 459.377818] path_openat+0x148c/0x3f50 [ 459.378392] ? kmem_cache_alloc+0xd5/0x280 [ 459.379016] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 459.379802] ? path_lookupat.isra.0+0x900/0x900 [ 459.380489] ? __lock_is_held+0xad/0x140 [ 459.381093] do_filp_open+0x1a1/0x280 [ 459.381654] ? may_open_dev+0xf0/0xf0 [ 459.382214] ? find_held_lock+0x2c/0x110 [ 459.382816] ? lock_downgrade+0x680/0x680 [ 459.383425] ? __lock_is_held+0xad/0x140 [ 459.384024] ? do_raw_spin_unlock+0x4f/0x250 [ 459.384668] ? _raw_spin_unlock+0x1f/0x30 [ 459.385280] ? __alloc_fd+0x448/0x560 [ 459.385841] do_sys_open+0x3c3/0x500 [ 459.386386] ? filp_open+0x70/0x70 [ 459.386911] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 459.387610] ? trace_hardirqs_off_caller+0x55/0x1c0 [ 459.388342] ? do_syscall_64+0x1a/0x520 [ 459.388930] do_syscall_64+0xc3/0x520 [ 459.389490] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 459.390248] RIP: 0033:0x416211 [ 459.390720] Code: 75 14 b8 02 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 19 00 00 c3 48 83 ec 08 e8 0a fa ff ff 48 89 04 24 b8 02 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fa ff ff 48 89 d0 48 83 c4 08 48 3d 01 [ 459.393483] RSP: 002b:00007fe45dfe9a60 EFLAGS: 00000293 ORIG_RAX: 0000000000000002 [ 459.394610] RAX: ffffffffffffffda RBX: 00007fe45dfea6d4 RCX: 0000000000416211 [ 459.395678] RDX: 00007fe45dfe9b0a RSI: 0000000000000002 RDI: 00007fe45dfe9b00 [ 459.396758] RBP: 000000000076bf20 R08: 0000000000000000 R09: 000000000000000a [ 459.397930] R10: 0000000000000075 R11: 0000000000000293 R12: 00000000ffffffff [ 459.399022] R13: 0000000000000bd9 R14: 00000000004cdb80 R15: 000000000076bf2c [ 459.400168] [ 459.400430] Allocated by task 20132: [ 459.401038] kasan_kmalloc+0xbf/0xe0 [ 459.401652] kmem_cache_alloc+0xd5/0x280 [ 459.402330] bdev_alloc_inode+0x18/0x40 [ 459.402970] alloc_inode+0x5f/0x180 [ 459.403510] iget5_locked+0x57/0xd0 [ 459.404095] bdget+0x94/0x4e0 [ 459.404607] bd_acquire+0xfa/0x2c0 [ 459.405113] blkdev_open+0x110/0x290 [ 459.405702] do_dentry_open+0x49e/0x1050 [ 459.406340] path_openat+0x148c/0x3f50 [ 459.406926] do_filp_open+0x1a1/0x280 [ 459.407471] do_sys_open+0x3c3/0x500 [ 459.408010] do_syscall_64+0xc3/0x520 [ 459.408572] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 459.409415] [ 459.409679] Freed by task 1262: [ 459.410212] __kasan_slab_free+0x129/0x170 [ 459.410919] kmem_cache_free+0xb2/0x2a0 [ 459.411564] rcu_process_callbacks+0xbb2/0x2320 [ 459.412318] __do_softirq+0x225/0x8ac Fix this by delaying bdput() to the end of blkdev_get() which means we have finished accessing bdev. Fixes: 77ea887 ("implement in-kernel gendisk events handling") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Tested-by: Sedat Dilek <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Ming Lei <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

commit e5a15e1 upstream. The following kernel panic was captured when running nfs server over ocfs2, at that time ocfs2_test_inode_bit() was checking whether one inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its "suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from //global_inode_alloc, but here it wrongly assumed that it was got from per slot inode alloctor which would cause array overflow and trigger kernel panic. BUG: unable to handle kernel paging request at 0000000000001088 IP: [<ffffffff816f6898>] _raw_spin_lock+0x18/0xf0 PGD 1e06ba067 PUD 1e9e7d067 PMD 0 Oops: 0002 [#1] SMP CPU: 6 PID: 24873 Comm: nfsd Not tainted 4.1.12-124.36.1.el6uek.x86_64 #2 Hardware name: Huawei CH121 V3/IT11SGCA1, BIOS 3.87 02/02/2018 RIP: _raw_spin_lock+0x18/0xf0 RSP: e02b:ffff88005ae97908 EFLAGS: 00010206 RAX: ffff88005ae98000 RBX: 0000000000001088 RCX: 0000000000000000 RDX: 0000000000020000 RSI: 0000000000000009 RDI: 0000000000001088 RBP: ffff88005ae97928 R08: 0000000000000000 R09: ffff880212878e00 R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000001088 R13: ffff8800063c0aa8 R14: ffff8800650c27d0 R15: 000000000000ffff FS: 0000000000000000(0000) GS:ffff880218180000(0000) knlGS:ffff880218180000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000001088 CR3: 00000002033d0000 CR4: 0000000000042660 Call Trace: igrab+0x1e/0x60 ocfs2_get_system_file_inode+0x63/0x3a0 [ocfs2] ocfs2_test_inode_bit+0x328/0xa00 [ocfs2] ocfs2_get_parent+0xba/0x3e0 [ocfs2] reconnect_path+0xb5/0x300 exportfs_decode_fh+0xf6/0x2b0 fh_verify+0x350/0x660 [nfsd] nfsd4_putfh+0x4d/0x60 [nfsd] nfsd4_proc_compound+0x3d3/0x6f0 [nfsd] nfsd_dispatch+0xe0/0x290 [nfsd] svc_process_common+0x412/0x6a0 [sunrpc] svc_process+0x123/0x210 [sunrpc] nfsd+0xff/0x170 [nfsd] kthread+0xcb/0xf0 ret_from_fork+0x61/0x90 Code: 83 c2 02 0f b7 f2 e8 18 dc 91 ff 66 90 eb bf 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 48 89 fb ba 00 00 02 00 <f0> 0f c1 17 89 d0 45 31 e4 45 31 ed c1 e8 10 66 39 d0 41 89 c6 RIP _raw_spin_lock+0x18/0xf0 CR2: 0000000000001088 ---[ end trace 7264463cd1aac8f9 ]--- Kernel panic - not syncing: Fatal exception Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Junxiao Bi <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Joel Becker <[email protected]> Cc: Jun Piao <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

[ Upstream commit 440ab9e ] At times when I'm using kgdb I see a splat on my console about suspicious RCU usage. I managed to come up with a case that could reproduce this that looked like this: WARNING: suspicious RCU usage 5.7.0-rc4+ #609 Not tainted ----------------------------- kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by swapper/0/1: #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac stack backtrace: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x1b8 show_stack+0x1c/0x24 dump_stack+0xd4/0x134 lockdep_rcu_suspicious+0xf0/0x100 find_task_by_pid_ns+0x5c/0x80 getthread+0x8c/0xb0 gdb_serial_stub+0x9d4/0xd04 kgdb_cpu_enter+0x284/0x7ac kgdb_handle_exception+0x174/0x20c kgdb_brk_fn+0x24/0x30 call_break_hook+0x6c/0x7c brk_handler+0x20/0x5c do_debug_exception+0x1c8/0x22c el1_sync_handler+0x3c/0xe4 el1_sync+0x7c/0x100 rpmh_rsc_probe+0x38/0x420 platform_drv_probe+0x94/0xb4 really_probe+0x134/0x300 driver_probe_device+0x68/0x100 __device_attach_driver+0x90/0xa8 bus_for_each_drv+0x84/0xcc __device_attach+0xb4/0x13c device_initial_probe+0x18/0x20 bus_probe_device+0x38/0x98 device_add+0x38c/0x420 If I understand properly we should just be able to blanket kgdb under one big RCU read lock and the problem should go away. We'll add it to the beast-of-a-function known as kgdb_cpu_enter(). With this I no longer get any splats and things seem to work fine. Signed-off-by: Douglas Anderson <[email protected]> Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e1b7a5bb7e4e@changeid Signed-off-by: Daniel Thompson <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

In pci_disable_sriov(), i.e., # echo 0 > /sys/class/net/enp11s0f1np1/device/sriov_numvfs iommu_release_device iommu_group_remove_device arm_smmu_domain_free kfree(smmu_domain) Later, iommu_release_device arm_smmu_release_device arm_smmu_detach_dev spin_lock_irqsave(&smmu_domain->devices_lock, would trigger an use-after-free. Fixed it by call arm_smmu_release_device() first before iommu_group_remove_device(). BUG: KASAN: use-after-free in __lock_acquire+0x3458/0x4440 __lock_acquire at kernel/locking/lockdep.c:4250 Read of size 8 at addr ffff0089df1a6f68 by task bash/3356 CPU: 5 PID: 3356 Comm: bash Not tainted 5.8.0-rc3-next-20200630 #2 Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019 Call trace: dump_backtrace+0x0/0x398 show_stack+0x14/0x20 dump_stack+0x140/0x1b8 print_address_description.isra.12+0x54/0x4a8 kasan_report+0x134/0x1b8 __asan_report_load8_noabort+0x2c/0x50 __lock_acquire+0x3458/0x4440 lock_acquire+0x204/0xf10 _raw_spin_lock_irqsave+0xf8/0x180 arm_smmu_detach_dev+0xd8/0x4a0 arm_smmu_detach_dev at drivers/iommu/arm-smmu-v3.c:2776 arm_smmu_release_device+0xb4/0x1c8 arm_smmu_disable_pasid at drivers/iommu/arm-smmu-v3.c:2754 (inlined by) arm_smmu_release_device at drivers/iommu/arm-smmu-v3.c:3000 iommu_release_device+0xc0/0x178 iommu_release_device at drivers/iommu/iommu.c:302 iommu_bus_notifier+0x118/0x160 notifier_call_chain+0xa4/0x128 __blocking_notifier_call_chain+0x70/0xa8 blocking_notifier_call_chain+0x14/0x20 device_del+0x618/0xa00 pci_remove_bus_device+0x108/0x2d8 pci_stop_and_remove_bus_device+0x1c/0x28 pci_iov_remove_virtfn+0x228/0x368 sriov_disable+0x8c/0x348 pci_disable_sriov+0x5c/0x70 mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core] sriov_numvfs_store+0x240/0x318 dev_attr_store+0x38/0x68 sysfs_kf_write+0xdc/0x128 kernfs_fop_write+0x23c/0x448 __vfs_write+0x54/0xe8 vfs_write+0x124/0x3f0 ksys_write+0xe8/0x1b8 __arm64_sys_write+0x68/0x98 do_el0_svc+0x124/0x220 el0_sync_handler+0x260/0x408 el0_sync+0x140/0x180 Allocated by task 3356: save_stack+0x24/0x50 __kasan_kmalloc.isra.13+0xc4/0xe0 kasan_kmalloc+0xc/0x18 kmem_cache_alloc_trace+0x1ec/0x318 arm_smmu_domain_alloc+0x54/0x148 iommu_group_alloc_default_domain+0xc0/0x440 iommu_probe_device+0x1c0/0x308 iort_iommu_configure+0x434/0x518 acpi_dma_configure+0xf0/0x128 pci_dma_configure+0x114/0x160 really_probe+0x124/0x6d8 driver_probe_device+0xc4/0x180 __device_attach_driver+0x184/0x1e8 bus_for_each_drv+0x114/0x1a0 __device_attach+0x19c/0x2a8 device_attach+0x10/0x18 pci_bus_add_device+0x70/0xf8 pci_iov_add_virtfn+0x7b4/0xb40 sriov_enable+0x5c8/0xc30 pci_enable_sriov+0x64/0x80 mlx5_core_sriov_configure+0x58/0x260 [mlx5_core] sriov_numvfs_store+0x1c0/0x318 dev_attr_store+0x38/0x68 sysfs_kf_write+0xdc/0x128 kernfs_fop_write+0x23c/0x448 __vfs_write+0x54/0xe8 vfs_write+0x124/0x3f0 ksys_write+0xe8/0x1b8 __arm64_sys_write+0x68/0x98 do_el0_svc+0x124/0x220 el0_sync_handler+0x260/0x408 el0_sync+0x140/0x180 Freed by task 3356: save_stack+0x24/0x50 __kasan_slab_free+0x124/0x198 kasan_slab_free+0x10/0x18 slab_free_freelist_hook+0x110/0x298 kfree+0x128/0x668 arm_smmu_domain_free+0xf4/0x1a0 iommu_group_release+0xec/0x160 kobject_put+0xf4/0x238 kobject_del+0x110/0x190 kobject_put+0x1e4/0x238 iommu_group_remove_device+0x394/0x938 iommu_release_device+0x9c/0x178 iommu_release_device at drivers/iommu/iommu.c:300 iommu_bus_notifier+0x118/0x160 notifier_call_chain+0xa4/0x128 __blocking_notifier_call_chain+0x70/0xa8 blocking_notifier_call_chain+0x14/0x20 device_del+0x618/0xa00 pci_remove_bus_device+0x108/0x2d8 pci_stop_and_remove_bus_device+0x1c/0x28 pci_iov_remove_virtfn+0x228/0x368 sriov_disable+0x8c/0x348 pci_disable_sriov+0x5c/0x70 mlx5_core_sriov_configure+0xd8/0x260 [mlx5_core] sriov_numvfs_store+0x240/0x318 dev_attr_store+0x38/0x68 sysfs_kf_write+0xdc/0x128 kernfs_fop_write+0x23c/0x448 __vfs_write+0x54/0xe8 vfs_write+0x124/0x3f0 ksys_write+0xe8/0x1b8 __arm64_sys_write+0x68/0x98 do_el0_svc+0x124/0x220 el0_sync_handler+0x260/0x408 el0_sync+0x140/0x180 The buggy address belongs to the object at ffff0089df1a6e00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 360 bytes inside of 512-byte region [ffff0089df1a6e00, ffff0089df1a7000) The buggy address belongs to the page: page:ffffffe02257c680 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff0089df1a1400 flags: 0x7ffff800000200(slab) raw: 007ffff800000200 ffffffe02246b8c8 ffffffe02257ff88 ffff000000320680 raw: ffff0089df1a1400 00000000002a000e 00000001ffffffff ffff0089df1a5001 page dumped because: kasan: bad access detected page->mem_cgroup:ffff0089df1a5001 Memory state around the buggy address: ffff0089df1a6e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff0089df1a6e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff0089df1a6f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff0089df1a6f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff0089df1a7000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: a6a4c7e ("iommu: Add probe_device() and release_device() call-backs") Signed-off-by: Qian Cai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]> (cherry picked from commit 9ac8545) BUG=b:150320512 TEST=Boot trogdor with modem Signed-off-by: Linux Patches Robot <linux-patches-robot@chromeos-missing-patches.google.com.iam.gserviceaccount.com> Change-Id: I287943c8b0d127d699951196dd189f4927244b1b Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2297666 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Commit-Queue: Douglas Anderson <[email protected]> Tested-by: Douglas Anderson <[email protected]>

[ Upstream commit cb551b8 ] In BRM_status_show(), if the condition "!ioc->is_warpdrive" tested on entry to the function is true, a "goto out" is called. This results in unlocking ioc->pci_access_mutex without this mutex lock being taken. This generates the following splat: [ 1148.539883] mpt3sas_cm2: BRM_status_show: BRM attribute is only for warpdrive [ 1148.547184] [ 1148.548708] ===================================== [ 1148.553501] WARNING: bad unlock balance detected! [ 1148.558277] 5.8.0-rc3+ #827 Not tainted [ 1148.562183] ------------------------------------- [ 1148.566959] cat/5008 is trying to release lock (&ioc->pci_access_mutex) at: [ 1148.574035] [<ffffffffc070b7a3>] BRM_status_show+0xd3/0x100 [mpt3sas] [ 1148.580574] but there are no more locks to release! [ 1148.585524] [ 1148.585524] other info that might help us debug this: [ 1148.599624] 3 locks held by cat/5008: [ 1148.607085] #0: ffff92aea3e392c0 (&p->lock){+.+.}-{3:3}, at: seq_read+0x34/0x480 [ 1148.618509] #1: ffff922ef14c4888 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x2a/0xb0 [ 1148.630729] #2: ffff92aedb5d7310 (kn->active#224){.+.+}-{0:0}, at: kernfs_seq_start+0x32/0xb0 [ 1148.643347] [ 1148.643347] stack backtrace: [ 1148.655259] CPU: 73 PID: 5008 Comm: cat Not tainted 5.8.0-rc3+ #827 [ 1148.665309] Hardware name: HGST H4060-S/S2600STB, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 1148.678394] Call Trace: [ 1148.684750] dump_stack+0x78/0xa0 [ 1148.691802] lock_release.cold+0x45/0x4a [ 1148.699451] __mutex_unlock_slowpath+0x35/0x270 [ 1148.707675] BRM_status_show+0xd3/0x100 [mpt3sas] [ 1148.716092] dev_attr_show+0x19/0x40 [ 1148.723664] sysfs_kf_seq_show+0x87/0x100 [ 1148.731193] seq_read+0xbc/0x480 [ 1148.737882] vfs_read+0xa0/0x160 [ 1148.744514] ksys_read+0x58/0xd0 [ 1148.751129] do_syscall_64+0x4c/0xa0 [ 1148.757941] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1148.766240] RIP: 0033:0x7f1230566542 [ 1148.772957] Code: Bad RIP value. [ 1148.779206] RSP: 002b:00007ffeac1bcac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 1148.790063] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f1230566542 [ 1148.800284] RDX: 0000000000020000 RSI: 00007f1223460000 RDI: 0000000000000003 [ 1148.810474] RBP: 00007f1223460000 R08: 00007f122345f010 R09: 0000000000000000 [ 1148.820641] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000000 [ 1148.830728] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 Fix this by returning immediately instead of jumping to the out label. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Johannes Thumshirn <[email protected]> Acked-by: Sreekanth Reddy <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

Function that converts orientation string into orientation value. BUG=b:153109152 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it Signed-off-by: Heikki Krogerus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit 8c49c9e) Change-Id: I6135241357af2b484c5d1e51e72352cd2d054179 Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324233 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Prashant Malani <[email protected]>

…orientation The SBU and HSL orientation may be fixed/static from the mux PoW. Apparently the retimer may take care of the orientation of these lines. Handling the static SBU (AUX) and HSL orientation with device properties. If the SBU orientation is static, a device property "sbu-orintation" can be used. When the property exists, the driver always sets the SBU orientation according to the property value, and when it's not set, the driver uses the cable plug orientation with SBU. And with static HSL orientation, "hsl-orientation" device property can be used in the same way. BUG=b:153109152 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it Signed-off-by: Heikki Krogerus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> (cherry picked from commit ff4a30d) Change-Id: I8f64d95f2c0a705fe3c72a1513ce5c74ce9945ee Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324234 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Prashant Malani <[email protected]>

With USB4 mode the mux driver needs the Enter_USB Data Object (EUDO) that was used when the USB mode was entered. Though the object is not available in the driver, it is possible to construct it from the information we have. BUG=b:140645106 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it TOREVERT=Once the upstream patch lands in maintainer's branch, this patch should be reverted. Signed-off-by: Heikki Krogerus <[email protected]> Signed-off-by: Prashant Malani <[email protected]> (am from https://lore.kernel.org/patchwork/patch/1271073/) Signed-off-by: Azhar Shaikh <[email protected]> Change-Id: I3fb43b02f03387c88923dca3f89e63465866eb64 Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324236 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Prashant Malani <[email protected]> Commit-Queue: Prashant Malani <[email protected]>

…e_switch_set_role() usb_role_switch_set_role() has the second argument as enum for usb_role. Currently depending upon the data role i.e. UFP(0) or DFP(1) is sent. This eventually translates to USB_ROLE_NONE in case of UFP and USB_ROLE_DEVICE in case of DFP. Correct this by sending correct enum values as USB_ROLE_DEVICE in case of UFP and USB_ROLE_HOST in case of DFP. BUG=b:161934188 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it TOREVERT=Once the upstream patch lands in maintainer's branch, this patch should be reverted. Fixes: 7e7def1 ("platform/chrome: cros_ec_typec: Add USB mux control") Signed-off-by: Azhar Shaikh <[email protected]> Change-Id: I1f4025999cf7edcc23dd1c08d49a09bbea29bc90 (am from https://lore.kernel.org/patchwork/patch/1282220/) Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324238 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Prashant Malani <[email protected]> Commit-Queue: Prashant Malani <[email protected]>

commit fde9f39 upstream. This patch fixes a race condition that causes a use-after-free during amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits are requested and the second one finishes before the first. Essentially, this bug occurs when the following sequence of events happens: 1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is deferred to the workqueue. 2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is deferred to the workqueue. 3. Commit #2 starts before commit #1, dm_state #1 is used in the commit_tail and commit #2 completes, freeing dm_state #1. 4. Commit #1 starts after commit #2 completes, uses the freed dm_state 1 and dereferences a freelist pointer while setting the context. Since this bug has only been spotted with fast commits, this patch fixes the bug by clearing the dm_state instead of using the old dc_state for fast updates. In addition, since dm_state is only used for its dc_state and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found, removing the dm_state should not have any consequences in fast updates. This use-after-free bug has existed for a while now, but only caused a noticeable issue starting from 5.7-rc1 due to 3202fa6 ("slub: relocate freelist pointer to middle of object") moving the freelist pointer from dm_state->base (which was unused) to dm_state->context (which is dereferenced). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383 Fixes: bd200d1 ("drm/amd/display: Don't replace the dc_state for fast updates") Reported-by: Duncan <[email protected]> Signed-off-by: Mazin Rezk <[email protected]> Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>

This adds support for device data role, and data role swapping. The driver no longer relies on the cached role, as it may not be valid (for example after bootup). Instead, the role is always checked by readding the port status from IOM. Note. After this, the orientation is always only cached, so the driver does not support scenario where the role is set before orientation. It means the typec drivers must always set the orientation first before role. BUG=b:153109152 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it TOREVERT=Once the upstream patch lands in maintainer's branch, this patch should be reverted. Signed-off-by: Heikki Krogerus <[email protected]> Change-Id: Ie30c771db581591c73d70342f3d34c81d6b35a15 Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324235 Tested-by: Chiranjeevi Rapolu <[email protected]> Reviewed-by: Prashant Malani <[email protected]>

…te mode Previously HPD deassert was not sent as part of displayport HPD request due to which the hotplug of monitor was not working. So added code that configures HPD assert and deassert on the displayport based on the previous transition states. BUG=b:161337127 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it TOREVERT=Once the upstream patch lands in maintainer's branch, this patch should be reverted. Signed-off-by: Utkarsh Patel <[email protected]> Change-Id: I2793de1111bcb10a39de3f97214597493cc17ef2 Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324237 Tested-by: Chiranjeevi Rapolu <[email protected]> Reviewed-by: Prashant Malani <[email protected]> Reviewed-by: Rajmohan Mani <[email protected]>

…ng disconnect On disconnect port partner is removed and usb role is set to NONE. But then in cros_typec_port_update() the role is set again. Avoid this by moving usb_role_switch_set_role() to cros_typec_configure_mux(). Signed-off-by: Azhar Shaikh <[email protected]> Suggested-by: Prashant Malani <[email protected]> (am from https://lore.kernel.org/patchwork/patch/1282221/) BUG=b:161934188 TEST=All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it TOREVERT=Once the upstream patch lands in maintainer's branch, this patch should be reverted. Change-Id: Iafeab966acaacd95bde533097b4a051642f631ff Signed-off-by: Azhar Shaikh <[email protected]> Signed-off-by: Rajmohan Mani <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324239 Reviewed-by: Sean Paul <[email protected]> Reviewed-by: Prashant Malani <[email protected]> Tested-by: Chiranjeevi Rapolu <[email protected]>

This is basically a partial revert of commit cb6961d ("CHROMIUM: chromeos/config: Enable TCSS Extcon driver") Disabling TCSS extcon driver, enables the connector class driver. Also removing the OF config option since it is not needed. cat <<END | tee -a \ chromeos/config/x86_64/chromeos-intel-pineview.flavour.config \ chromeos/config/x86_64/chromiumos-x86_64.flavour.config \ CONFIG_OF=n CONFIG_EXTCON=n CONFIG_EXTCON_TCSS_CROS_EC=n END ./chromeos/scripts/kernelconfig olddefconfig BUG=b:140645106 TEST=CONFIG_CROS_EC_TYPEC is now enabled All below tests were successful 1. Boot with DP (using DingDong) 2. Boot with HooToo hub connected with HDMI display behind the hub 3. Connect both #1 and #2 on both ports and boot 4. Swap the port in #3 5. Hotplug DP(DingDong) 6. Hotplug HooToo hub with HDMI behind 7. Boot with USB3 device 8. Hotplug USB3 device 9. Boot with USB3 device on port #1 and HooToo hub with HDMI connected on port#0 10. Boot with USB3 device on port #1 and DP(DingDong) connected on port#0 11. Swap ports on linux-surface#9 12. Swap port on linux-surface#10 13. Boot with Dell TBT Dock connected with DP behind it 14. Boot with Dell TBT dock connected with DP behind it on port#1 and HooToo with HDMI behind on port #0 15. Hotplug DP behing Dell TBT dock 16. Boot up with USB4 dock 17. Boot up with DP connected behind USB4 dock 18. Hotplug DP behind USB4 device 19. Hotplug TBT hard drive behind USB4 20. Hotplug Lenovo TBT dock and the hotplug TBT hard drive behind it Change-Id: I17147442e4dbf6737f5ec43e90d742e54d1012c9 Signed-off-by: Azhar Shaikh <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2324232 Tested-by: Chiranjeevi Rapolu <[email protected]> Reviewed-by: Prashant Malani <[email protected]>

[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 linux-surface#5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 linux-surface#6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 linux-surface#7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 linux-surface#8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 linux-surface#9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 linux-surface#10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 linux-surface#11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 linux-surface#12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 linux-surface#13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 linux-surface#14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 linux-surface#15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 linux-surface#16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 linux-surface#17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 linux-surface#18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 linux-surface#19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 linux-surface#20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 linux-surface#21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 linux-surface#22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 linux-surface#23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 linux-surface#24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 linux-surface#25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 linux-surface#26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <[email protected]> Reviewed-by: Steven Rostedt (VMware) <[email protected]> Link: https://lore.kernel.org/linux-trace-devel/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Sasha Levin <[email protected]>

#1: Port https://crrev.com/c/439873 to the 5.4 tree #2: Port https://crrev.com/c/440445 to the 5.4 tree #3: Port https://crrev.com/c/456118 to the 5.4 tree The first two patches are needed to correctly enable the backlight for Eve; they were submitted upstream in 2017 but were rolled back. The third patch was not rolled back upstream, but the code related to it changed quite a bit between 4.4 and 5.4: the write of DP_EDP_PWMGEN_BIT_COUNT used to happen on every screen enable via intel_dp_aux_set_pwm_freq (that's how it works in 4.4), but in 5.4 the write was refactored into intel_dp_aux_calc_max_backlight and only called once per screen setup. This patch brings back the old functionality of intel_dp_aux_set_pwm_freq to ensure we set that register on each screen enable, but this is at the cost of duplication between intel_dp_aux_calc_max_backlight. Unfortunately, I don't see a better way to do this without making the cros code diverge even further from 4.4 and 5.4. I'd like to submit this CL into the chromium tree, and use b/163412221 to track its upstreaming and the ultimate revert of this change. Test: * `emerge-eve-kernelnext sys-kernel/chromeos-kernel-5_4` * `./update_kernel.sh --remote=192.168.0.22 --remote_bootarg` * On DUT, set options `i915.enable_dpcd_backlight=1 i915.enable_dbc=1` (using e.g. `/usr/share/vboot/bin/make_dev_ssd.sh`) * reboot * Change brightness on DUT * Allow DUT screen to fall asleep * Wake DUT screen, change brightness again BUG=b:162255390 TEST=See above Change-Id: I9eafa28226f1c6b8332fcf9730259bea05cf7975 Signed-off-by: Kevin Chowski <[email protected]> Reviewed-on: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2344844 Reviewed-by: Puthikorn Voravootivat <[email protected]>

@jakub

Justin Iurman says: ==================== Correct the IOAM behavior for undefined trace type bits (@jakub @david: there will be a conflict for #2 when merging net->net-next, due to commit [1]. The conflict is only 5-10 lines for #2 (#1 should be fine) inside the file tools/testing/selftests/net/ioam6.sh, so quite short though possibly ugly. Sorry for that, I didn't expect to post this one... Had I known, I'd have made the opposite.) Modify both the input and output behaviors regarding the trace type when one of the undefined bits is set. The goal is to keep the interoperability when new fields (aka new bits inside the range 12-21) will be defined. The draft [2] says the following: --------------------------------------------------------------- "Bit 12-21 Undefined. These values are available for future assignment in the IOAM Trace-Type Registry (Section 8.2). Every future node data field corresponding to one of these bits MUST be 4-octets long. An IOAM encapsulating node MUST set the value of each undefined bit to 0. If an IOAM transit node receives a packet with one or more of these bits set to 1, it MUST either: 1. Add corresponding node data filled with the reserved value 0xFFFFFFFF, after the node data fields for the IOAM-Trace-Type bits defined above, such that the total node data added by this node in units of 4-octets is equal to NodeLen, or 2. Not add any node data fields to the packet, even for the IOAM-Trace-Type bits defined above." --------------------------------------------------------------- The output behavior has been modified to respect the fact that "an IOAM encap node MUST set the value of each undefined bit to 0" (i.e., undefined bits can't be set anymore). As for the input behavior, current implementation is based on the second choice (i.e., "not add any data fields to the packet [...]"). With this solution, any interoperability is lost (i.e., if a new bit is defined, then an "old" kernel implementation wouldn't fill IOAM data when such new bit is set inside the trace type). The input behavior is therefore relaxed and these undefined bits are now allowed to be set. It is only possible thanks to the sentence "every future node data field corresponding to one of these bits MUST be 4-octets long". Indeed, the default empty value (the one for 4-octet fields) is inserted whenever an undefined bit is set. [1] cfbe9b0 [2] https://datatracker.ietf.org/doc/html/draft-ietf-ippm-ioam-data#section-5.4.1 ==================== Signed-off-by: David S. Miller <[email protected]>

…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.15, take #2 - Properly refcount pages used as a concatenated stage-2 PGD - Fix missing unlock when detecting the use of MTE+VM_SHARED

On the preemption path when updating a Xen guest's runstate times, this lock is taken inside the scheduler rq->lock, which is a raw spinlock. This was shown in a lockdep warning: [ 89.138354] ============================= [ 89.138356] [ BUG: Invalid wait context ] [ 89.138358] 5.15.0-rc5+ #834 Tainted: G S I E [ 89.138360] ----------------------------- [ 89.138361] xen_shinfo_test/2575 is trying to lock: [ 89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138442] other info that might help us debug this: [ 89.138444] context-{5:5} [ 89.138445] 4 locks held by xen_shinfo_test/2575: [ 89.138447] #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm] [ 89.138483] #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm] [ 89.138526] #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0 [ 89.138534] #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm] ... [ 89.138695] get_kvmclock_ns+0x1f/0x130 [kvm] [ 89.138734] kvm_xen_update_runstate+0x14/0x90 [kvm] [ 89.138783] kvm_xen_update_runstate_guest+0x15/0xd0 [kvm] [ 89.138830] kvm_arch_vcpu_put+0xe6/0x170 [kvm] [ 89.138870] kvm_sched_out+0x2f/0x40 [kvm] [ 89.138900] __schedule+0x5de/0xbd0 Cc: [email protected] Reported-by: [email protected] Fixes: 30b5c85 ("KVM: x86/xen: Add support for vCPU runstate information") Signed-off-by: David Woodhouse <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>

i915 will soon gain an eviction path that trylock a whole lot of locks for eviction, getting dmesg failures like below: BUG: MAX_LOCK_DEPTH too low! turning off the locking correctness validator. depth: 48 max: 48! 48 locks held by i915_selftest/5776: #0: ffff888101a79240 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x88/0x160 #1: ffffc900009778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_vma_pin.constprop.63+0x39/0x1b0 [i915] #2: ffff88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_vma_pin.constprop.63+0x5f/0x1b0 [i915] #3: ffff88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c4/0x9d0 [i915] #4: ffff88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] linux-surface#5: ffff88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] ... linux-surface#46: ffff88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] linux-surface#47: ffff88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: i915_gem_evict_something+0x110/0x860 [i915] INFO: lockdep is turned off. Fixing eviction to nest into ww_class_acquire is a high priority, but it requires a rework of the entire driver, which can only be done one step at a time. As an intermediate solution, add an acquire context to ww_mutex_trylock, which allows us to do proper nesting annotations on the trylocks, making the above lockdep splat disappear. This is also useful in regulator_lock_nested, which may avoid dropping regulator_nesting_mutex in the uncontended path, so use it there. TTM may be another user for this, where we could lock a buffer in a fastpath with list locks held, without dropping all locks we hold. [peterz: rework actual ww_mutex_trylock() implementations] Signed-off-by: Maarten Lankhorst <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]

As reported by syzbot and experienced by Pavel, using cpus_read_lock() in wake_up_all_idle_cpus() generates lock inversion (against mmap_sem and possibly others). Instead, shrink the preempt disable region by iterating all CPUs and checking the online status for each individual CPU while having preemption disabled. Fixes: 8850cb6 ("sched: Simplify wake_up_*idle*()") Reported-by: [email protected] Reported-by: Pavel Machek <[email protected]> Reported-by: Qian Cai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Qian Cai <[email protected]>

We got the following lockdep splat while running fstests (specifically btrfs/003 and btrfs/020 in a row) with the new rc. This was uncovered by 87579e9 ("loop: use worker per cgroup instead of kworker") which converted loop to using workqueues, which comes with lockdep annotations that don't exist with kworkers. The lockdep splat is as follows: WARNING: possible circular locking dependency detected 5.14.0-rc2-custom+ linux-surface#34 Not tainted ------------------------------------------------------ losetup/156417 is trying to acquire lock: ffff9c7645b02d38 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x84/0x600 but task is already holding lock: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> linux-surface#5 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x28/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x163/0x3a0 path_openat+0x74d/0xa40 do_filp_open+0x9c/0x140 do_sys_openat2+0xb1/0x170 __x64_sys_openat+0x54/0x90 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #4 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 blkdev_get_by_dev.part.0+0xd1/0x3c0 blkdev_get_by_path+0xc0/0xd0 btrfs_scan_one_device+0x52/0x1f0 [btrfs] btrfs_control_ioctl+0xac/0x170 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #3 (uuid_mutex){+.+.}-{3:3}: __mutex_lock+0xba/0x7c0 btrfs_rm_device+0x48/0x6a0 [btrfs] btrfs_ioctl+0x2d1c/0x3110 [btrfs] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #2 (sb_writers#11){.+.+}-{0:0}: lo_write_bvec+0x112/0x290 [loop] loop_process_work+0x25f/0xcb0 [loop] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x266/0x5d0 worker_thread+0x55/0x3c0 kthread+0x140/0x170 ret_from_fork+0x22/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 flush_workqueue+0xae/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); *** DEADLOCK *** 1 lock held by losetup/156417: #0: ffff9c7647395468 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x650 [loop] stack backtrace: CPU: 8 PID: 156417 Comm: losetup Not tainted 5.14.0-rc2-custom+ linux-surface#34 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0x10a/0x120 __lock_acquire+0x1130/0x1dc0 lock_acquire+0xf5/0x320 ? flush_workqueue+0x84/0x600 flush_workqueue+0xae/0x600 ? flush_workqueue+0x84/0x600 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x650 [loop] lo_ioctl+0x29d/0x780 [loop] ? __lock_acquire+0x3a0/0x1dc0 ? update_dl_rq_load_avg+0x152/0x360 ? lock_is_held_type+0xa5/0x120 ? find_held_lock.constprop.0+0x2b/0x80 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f645884de6b Usually the uuid_mutex exists to protect the fs_devices that map together all of the devices that match a specific uuid. In rm_device we're messing with the uuid of a device, so it makes sense to protect that here. However in doing that it pulls in a whole host of lockdep dependencies, as we call mnt_may_write() on the sb before we grab the uuid_mutex, thus we end up with the dependency chain under the uuid_mutex being added under the normal sb write dependency chain, which causes problems with loop devices. We don't need the uuid mutex here however. If we call btrfs_scan_one_device() before we scratch the super block we will find the fs_devices and not find the device itself and return EBUSY because the fs_devices is open. If we call it after the scratch happens it will not appear to be a valid btrfs file system. We do not need to worry about other fs_devices modifying operations here because we're protected by the exclusive operations locking. So drop the uuid_mutex here in order to fix the lockdep splat. A more detailed explanation from the discussion: We are worried about rm and scan racing with each other, before this change we'll zero the device out under the UUID mutex so when scan does run it'll make sure that it can go through the whole device scan thing without rm messing with us. We aren't worried if the scratch happens first, because the result is we don't think this is a btrfs device and we bail out. The only case we are concerned with is we scratch _after_ scan is able to read the superblock and gets a seemingly valid super block, so lets consider this case. Scan will call device_list_add() with the device we're removing. We'll call find_fsid_with_metadata_uuid() and get our fs_devices for this UUID. At this point we lock the fs_devices->device_list_mutex. This is what protects us in this case, but we have two cases here. 1. We aren't to the device removal part of the RM. We found our device, and device name matches our path, we go down and we set total_devices to our super number of devices, which doesn't affect anything because we haven't done the remove yet. 2. We are past the device removal part, which is protected by the device_list_mutex. Scan doesn't find the device, it goes down and does the if (fs_devices->opened) return -EBUSY; check and we bail out. Nothing about this situation is ideal, but the lockdep splat is real, and the fix is safe, tho admittedly a bit scary looking. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Reviewed-by: David Sterba <[email protected]> [ copy more from the discussion ] Signed-off-by: David Sterba <[email protected]>

For device removal and replace we call btrfs_find_device_by_devspec, which if we give it a device path and nothing else will call btrfs_get_dev_args_from_path, which opens the block device and reads the super block and then looks up our device based on that. However at this point we're holding the sb write "lock", so reading the block device pulls in the dependency of ->open_mutex, which produces the following lockdep splat ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc2+ #405 Not tainted ------------------------------------------------------ losetup/11576 is trying to acquire lock: ffff9bbe8cded938 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0 but task is already holding lock: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&lo->lo_mutex){+.+.}-{3:3}: __mutex_lock+0x7d/0x750 lo_open+0x28/0x60 [loop] blkdev_get_whole+0x25/0xf0 blkdev_get_by_dev.part.0+0x168/0x3c0 blkdev_open+0xd2/0xe0 do_dentry_open+0x161/0x390 path_openat+0x3cc/0xa20 do_filp_open+0x96/0x120 do_sys_openat2+0x7b/0x130 __x64_sys_openat+0x46/0x70 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #3 (&disk->open_mutex){+.+.}-{3:3}: __mutex_lock+0x7d/0x750 blkdev_get_by_dev.part.0+0x56/0x3c0 blkdev_get_by_path+0x98/0xa0 btrfs_get_bdev_and_sb+0x1b/0xb0 btrfs_find_device_by_devspec+0x12b/0x1c0 btrfs_rm_device+0x127/0x610 btrfs_ioctl+0x2a31/0x2e70 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #2 (sb_writers#12){.+.+}-{0:0}: lo_write_bvec+0xc2/0x240 [loop] loop_process_work+0x238/0xd00 [loop] process_one_work+0x26b/0x560 worker_thread+0x55/0x3c0 kthread+0x140/0x160 ret_from_fork+0x1f/0x30 -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}: process_one_work+0x245/0x560 worker_thread+0x55/0x3c0 kthread+0x140/0x160 ret_from_fork+0x1f/0x30 -> #0 ((wq_completion)loop0){+.+.}-{0:0}: __lock_acquire+0x10ea/0x1d90 lock_acquire+0xb5/0x2b0 flush_workqueue+0x91/0x5e0 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x660 [loop] block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&lo->lo_mutex); lock(&disk->open_mutex); lock(&lo->lo_mutex); lock((wq_completion)loop0); *** DEADLOCK *** 1 lock held by losetup/11576: #0: ffff9bbe88e4fc68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop] stack backtrace: CPU: 0 PID: 11576 Comm: losetup Not tainted 5.14.0-rc2+ #405 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x72 check_noncircular+0xcf/0xf0 ? stack_trace_save+0x3b/0x50 __lock_acquire+0x10ea/0x1d90 lock_acquire+0xb5/0x2b0 ? flush_workqueue+0x67/0x5e0 ? lockdep_init_map_type+0x47/0x220 flush_workqueue+0x91/0x5e0 ? flush_workqueue+0x67/0x5e0 ? verify_cpu+0xf0/0x100 drain_workqueue+0xa0/0x110 destroy_workqueue+0x36/0x250 __loop_clr_fd+0x9a/0x660 [loop] ? blkdev_ioctl+0x8d/0x2a0 block_ioctl+0x3f/0x50 __x64_sys_ioctl+0x80/0xb0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f31b02404cb Instead what we want to do is populate our device lookup args before we grab any locks, and then pass these args into btrfs_rm_device(). From there we can find the device and do the appropriate removal. Suggested-by: Anand Jain <[email protected]> Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: David Sterba <[email protected]>

Attempting to defragment a Btrfs file containing a transparent huge page immediately deadlocks with the following stack trace: #0 context_switch (kernel/sched/core.c:4940:2) #1 __schedule (kernel/sched/core.c:6287:8) #2 schedule (kernel/sched/core.c:6366:3) #3 io_schedule (kernel/sched/core.c:8389:2) #4 wait_on_page_bit_common (mm/filemap.c:1356:4) linux-surface#5 __lock_page (mm/filemap.c:1648:2) linux-surface#6 lock_page (./include/linux/pagemap.h:625:3) linux-surface#7 pagecache_get_page (mm/filemap.c:1910:4) linux-surface#8 find_or_create_page (./include/linux/pagemap.h:420:9) linux-surface#9 defrag_prepare_one_page (fs/btrfs/ioctl.c:1068:9) linux-surface#10 defrag_one_range (fs/btrfs/ioctl.c:1326:14) linux-surface#11 defrag_one_cluster (fs/btrfs/ioctl.c:1421:9) linux-surface#12 btrfs_defrag_file (fs/btrfs/ioctl.c:1523:9) linux-surface#13 btrfs_ioctl_defrag (fs/btrfs/ioctl.c:3117:9) linux-surface#14 btrfs_ioctl (fs/btrfs/ioctl.c:4872:10) linux-surface#15 vfs_ioctl (fs/ioctl.c:51:10) linux-surface#16 __do_sys_ioctl (fs/ioctl.c:874:11) linux-surface#17 __se_sys_ioctl (fs/ioctl.c:860:1) linux-surface#18 __x64_sys_ioctl (fs/ioctl.c:860:1) linux-surface#19 do_syscall_x64 (arch/x86/entry/common.c:50:14) linux-surface#20 do_syscall_64 (arch/x86/entry/common.c:80:7) linux-surface#21 entry_SYSCALL_64+0x7c/0x15b (arch/x86/entry/entry_64.S:113) A huge page is represented by a compound page, which consists of a struct page for each PAGE_SIZE page within the huge page. The first struct page is the "head page", and the remaining are "tail pages". Defragmentation attempts to lock each page in the range. However, lock_page() on a tail page actually locks the corresponding head page. So, if defragmentation tries to lock more than one struct page in a compound page, it tries to lock the same head page twice and deadlocks with itself. Ideally, we should be able to defragment transparent huge pages. However, THP for filesystems is currently read-only, so a lot of code is not ready to use huge pages for I/O. For now, let's just return ETXTBUSY. This can be reproduced with the following on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y: $ cat create_thp_file.c #include <fcntl.h> #include <stdbool.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> static const char zeroes[1024 * 1024]; static const size_t FILE_SIZE = 2 * 1024 * 1024; int main(int argc, char **argv) { if (argc != 2) { fprintf(stderr, "usage: %s PATH\n", argv[0]); return EXIT_FAILURE; } int fd = creat(argv[1], 0777); if (fd == -1) { perror("creat"); return EXIT_FAILURE; } size_t written = 0; while (written < FILE_SIZE) { ssize_t ret = write(fd, zeroes, sizeof(zeroes) < FILE_SIZE - written ? sizeof(zeroes) : FILE_SIZE - written); if (ret < 0) { perror("write"); return EXIT_FAILURE; } written += ret; } close(fd); fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); return EXIT_FAILURE; } /* * Reserve some address space so that we can align the file mapping to * the huge page size. */ void *placeholder_map = mmap(NULL, FILE_SIZE * 2, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (placeholder_map == MAP_FAILED) { perror("mmap (placeholder)"); return EXIT_FAILURE; } void *aligned_address = (void *)(((uintptr_t)placeholder_map + FILE_SIZE - 1) & ~(FILE_SIZE - 1)); void *map = mmap(aligned_address, FILE_SIZE, PROT_READ | PROT_EXEC, MAP_SHARED | MAP_FIXED, fd, 0); if (map == MAP_FAILED) { perror("mmap"); return EXIT_FAILURE; } if (madvise(map, FILE_SIZE, MADV_HUGEPAGE) < 0) { perror("madvise"); return EXIT_FAILURE; } char *line = NULL; size_t line_capacity = 0; FILE *smaps_file = fopen("/proc/self/smaps", "r"); if (!smaps_file) { perror("fopen"); return EXIT_FAILURE; } for (;;) { for (size_t off = 0; off < FILE_SIZE; off += 4096) ((volatile char *)map)[off]; ssize_t ret; bool this_mapping = false; while ((ret = getline(&line, &line_capacity, smaps_file)) > 0) { unsigned long start, end, huge; if (sscanf(line, "%lx-%lx", &start, &end) == 2) { this_mapping = (start <= (uintptr_t)map && (uintptr_t)map < end); } else if (this_mapping && sscanf(line, "FilePmdMapped: %ld", &huge) == 1 && huge > 0) { return EXIT_SUCCESS; } } sleep(6); rewind(smaps_file); fflush(smaps_file); } } $ ./create_thp_file huge $ btrfs fi defrag -czstd ./huge Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

…ux/kernel/git/tip/tip Pull scheduler updates from Thomas Gleixner: - Revert the printk format based wchan() symbol resolution as it can leak the raw value in case that the symbol is not resolvable. - Make wchan() more robust and work with all kind of unwinders by enforcing that the task stays blocked while unwinding is in progress. - Prevent sched_fork() from accessing an invalid sched_task_group - Improve asymmetric packing logic - Extend scheduler statistics to RT and DL scheduling classes and add statistics for bandwith burst to the SCHED_FAIR class. - Properly account SCHED_IDLE entities - Prevent a potential deadlock when initial priority is assigned to a newly created kthread. A recent change to plug a race between cpuset and __sched_setscheduler() introduced a new lock dependency which is now triggered. Break the lock dependency chain by moving the priority assignment to the thread function. - Fix the idle time reporting in /proc/uptime for NOHZ enabled systems. - Improve idle balancing in general and especially for NOHZ enabled systems. - Provide proper interfaces for live patching so it does not have to fiddle with scheduler internals. - Add cluster aware scheduling support. - A small set of tweaks for RT (irqwork, wait_task_inactive(), various scheduler options and delaying mmdrop) - The usual small tweaks and improvements all over the place * tag 'sched-core-2021-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) sched/fair: Cleanup newidle_balance sched/fair: Remove sysctl_sched_migration_cost condition sched/fair: Wait before decaying max_newidle_lb_cost sched/fair: Skip update_blocked_averages if we are defering load balance sched/fair: Account update_blocked_averages in newidle_balance cost x86: Fix __get_wchan() for !STACKTRACE sched,x86: Fix L2 cache mask sched/core: Remove rq_relock() sched: Improve wake_up_all_idle_cpus() take #2 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. sched/rt: Annotate the RT balancing logic irqwork as IRQ_WORK_HARD_IRQ sched: Add cluster scheduler level for x86 sched: Add cluster scheduler level in core and related Kconfig for ARM64 topology: Represent clusters of CPUs within a die sched: Disable -Wunused-but-set-variable sched: Add wrapper for get_wchan() to keep task blocked x86: Fix get_wchan() to support the ORC unwinder proc: Use task_is_running() for wchan in /proc/$pid/stat ...

Often some test cases like btrfs/161 trigger lockdep splats that complain about possible unsafe lock scenario due to the fact that during mount, when reading the chunk tree we end up calling blkdev_get_by_path() while holding a read lock on a leaf of the chunk tree. That produces a lockdep splat like the following: [ 3653.683975] ====================================================== [ 3653.685148] WARNING: possible circular locking dependency detected [ 3653.686301] 5.15.0-rc7-btrfs-next-103 #1 Not tainted [ 3653.687239] ------------------------------------------------------ [ 3653.688400] mount/447465 is trying to acquire lock: [ 3653.689320] ffff8c6b0c76e528 (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.691054] but task is already holding lock: [ 3653.692155] ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.693978] which lock already depends on the new lock. [ 3653.695510] the existing dependency chain (in reverse order) is: [ 3653.696915] -> #3 (btrfs-chunk-00){++++}-{3:3}: [ 3653.698053] down_read_nested+0x4b/0x140 [ 3653.698893] __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.699988] btrfs_read_lock_root_node+0x31/0x40 [btrfs] [ 3653.701205] btrfs_search_slot+0x537/0xc00 [btrfs] [ 3653.702234] btrfs_insert_empty_items+0x32/0x70 [btrfs] [ 3653.703332] btrfs_init_new_device+0x563/0x15b0 [btrfs] [ 3653.704439] btrfs_ioctl+0x2110/0x3530 [btrfs] [ 3653.705405] __x64_sys_ioctl+0x83/0xb0 [ 3653.706215] do_syscall_64+0x3b/0xc0 [ 3653.706990] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.708040] -> #2 (sb_internal#2){.+.+}-{0:0}: [ 3653.708994] lock_release+0x13d/0x4a0 [ 3653.709533] up_write+0x18/0x160 [ 3653.710017] btrfs_sync_file+0x3f3/0x5b0 [btrfs] [ 3653.710699] __loop_update_dio+0xbd/0x170 [loop] [ 3653.711360] lo_ioctl+0x3b1/0x8a0 [loop] [ 3653.711929] block_ioctl+0x48/0x50 [ 3653.712442] __x64_sys_ioctl+0x83/0xb0 [ 3653.712991] do_syscall_64+0x3b/0xc0 [ 3653.713519] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.714233] -> #1 (&lo->lo_mutex){+.+.}-{3:3}: [ 3653.715026] __mutex_lock+0x92/0x900 [ 3653.715648] lo_open+0x28/0x60 [loop] [ 3653.716275] blkdev_get_whole+0x28/0x90 [ 3653.716867] blkdev_get_by_dev.part.0+0x142/0x320 [ 3653.717537] blkdev_open+0x5e/0xa0 [ 3653.718043] do_dentry_open+0x163/0x390 [ 3653.718604] path_openat+0x3f0/0xa80 [ 3653.719128] do_filp_open+0xa9/0x150 [ 3653.719652] do_sys_openat2+0x97/0x160 [ 3653.720197] __x64_sys_openat+0x54/0x90 [ 3653.720766] do_syscall_64+0x3b/0xc0 [ 3653.721285] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.721986] -> #0 (&disk->open_mutex){+.+.}-{3:3}: [ 3653.722775] __lock_acquire+0x130e/0x2210 [ 3653.723348] lock_acquire+0xd7/0x310 [ 3653.723867] __mutex_lock+0x92/0x900 [ 3653.724394] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.725041] blkdev_get_by_path+0xb8/0xd0 [ 3653.725614] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.726332] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.726999] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.727739] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.728384] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.729130] legacy_get_tree+0x30/0x50 [ 3653.729676] vfs_get_tree+0x28/0xc0 [ 3653.730192] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.730800] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.731427] legacy_get_tree+0x30/0x50 [ 3653.731970] vfs_get_tree+0x28/0xc0 [ 3653.732486] path_mount+0x2d4/0xbe0 [ 3653.732997] __x64_sys_mount+0x103/0x140 [ 3653.733560] do_syscall_64+0x3b/0xc0 [ 3653.734080] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.734782] other info that might help us debug this: [ 3653.735784] Chain exists of: &disk->open_mutex --> sb_internal#2 --> btrfs-chunk-00 [ 3653.737123] Possible unsafe locking scenario: [ 3653.737865] CPU0 CPU1 [ 3653.738435] ---- ---- [ 3653.739007] lock(btrfs-chunk-00); [ 3653.739449] lock(sb_internal#2); [ 3653.740193] lock(btrfs-chunk-00); [ 3653.740955] lock(&disk->open_mutex); [ 3653.741431] *** DEADLOCK *** [ 3653.742176] 3 locks held by mount/447465: [ 3653.742739] #0: ffff8c6acf85c0e8 (&type->s_umount_key#44/1){+.+.}-{3:3}, at: alloc_super+0xd5/0x3b0 [ 3653.744114] #1: ffffffffc0b28f70 (uuid_mutex){+.+.}-{3:3}, at: btrfs_read_chunk_tree+0x59/0x870 [btrfs] [ 3653.745563] #2: ffff8c6b0a9f39e0 (btrfs-chunk-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x24/0x110 [btrfs] [ 3653.747066] stack backtrace: [ 3653.747723] CPU: 4 PID: 447465 Comm: mount Not tainted 5.15.0-rc7-btrfs-next-103 #1 [ 3653.748873] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 3653.750592] Call Trace: [ 3653.750967] dump_stack_lvl+0x57/0x72 [ 3653.751526] check_noncircular+0xf3/0x110 [ 3653.752136] ? stack_trace_save+0x4b/0x70 [ 3653.752748] __lock_acquire+0x130e/0x2210 [ 3653.753356] lock_acquire+0xd7/0x310 [ 3653.753898] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.754596] ? lock_is_held_type+0xe8/0x140 [ 3653.755125] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.755729] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.756338] __mutex_lock+0x92/0x900 [ 3653.756794] ? blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.757400] ? do_raw_spin_unlock+0x4b/0xa0 [ 3653.757930] ? _raw_spin_unlock+0x29/0x40 [ 3653.758437] ? bd_prepare_to_claim+0x129/0x150 [ 3653.758999] ? trace_module_get+0x2b/0xd0 [ 3653.759508] ? try_module_get.part.0+0x50/0x80 [ 3653.760072] blkdev_get_by_dev.part.0+0xe7/0x320 [ 3653.760661] ? devcgroup_check_permission+0xc1/0x1f0 [ 3653.761288] blkdev_get_by_path+0xb8/0xd0 [ 3653.761797] btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs] [ 3653.762454] open_fs_devices+0xd7/0x2c0 [btrfs] [ 3653.763055] ? clone_fs_devices+0x8f/0x170 [btrfs] [ 3653.763689] btrfs_read_chunk_tree+0x3ad/0x870 [btrfs] [ 3653.764370] ? kvm_sched_clock_read+0x14/0x40 [ 3653.764922] open_ctree+0xb8e/0x17bf [btrfs] [ 3653.765493] ? super_setup_bdi_name+0x79/0xd0 [ 3653.766043] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 3653.766780] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.767488] ? kfree+0x1f2/0x3c0 [ 3653.767979] legacy_get_tree+0x30/0x50 [ 3653.768548] vfs_get_tree+0x28/0xc0 [ 3653.769076] vfs_kern_mount.part.0+0x71/0xb0 [ 3653.769718] btrfs_mount+0x11d/0x3a0 [btrfs] [ 3653.770381] ? rcu_read_lock_sched_held+0x3f/0x80 [ 3653.771086] ? kfree+0x1f2/0x3c0 [ 3653.771574] legacy_get_tree+0x30/0x50 [ 3653.772136] vfs_get_tree+0x28/0xc0 [ 3653.772673] path_mount+0x2d4/0xbe0 [ 3653.773201] __x64_sys_mount+0x103/0x140 [ 3653.773793] do_syscall_64+0x3b/0xc0 [ 3653.774333] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 3653.775094] RIP: 0033:0x7f648bc45aaa This happens because through btrfs_read_chunk_tree(), which is called only during mount, ends up acquiring the mutex open_mutex of a block device while holding a read lock on a leaf of the chunk tree while other paths need to acquire other locks before locking extent buffers of the chunk tree. Since at mount time when we call btrfs_read_chunk_tree() we know that we don't have other tasks running in parallel and modifying the chunk tree, we can simply skip locking of chunk tree extent buffers. So do that and move the assertion that checks the fs is not yet mounted to the top block of btrfs_read_chunk_tree(), with a comment before doing it. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>

Ido Schimmel says: ==================== mlxsw: Two small fixes Patch #1 fixes a recent regression that prevents the driver from loading with old firmware versions. Patch #2 protects the driver from a NULL pointer dereference when working on top of a buggy firmware. This was never observed in an actual system, only on top of an emulator during development. ==================== Signed-off-by: David S. Miller <[email protected]>

…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.16, take #2 - Fix constant sign extension affecting TCR_EL2 and preventing running on ARMv8.7 models due to spurious bits being set - Fix use of helpers using PSTATE early on exit by always sampling it as soon as the exit takes place - Move pkvm's 32bit handling into a common helper

commit 5648c07 upstream. Add the following Telit FD980 composition 0x1056: Cfg #1: mass storage Cfg #2: rndis, tty, adb, tty, tty, tty, tty Signed-off-by: Daniele Palmas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 23d2b94 upstream. I got below panic when doing fuzz test: Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 4056 Comm: syz-executor.3 Tainted: G B 5.14.0-rc1-00195-gcff5c4254439-dirty #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x7a/0x9b panic+0x2cd/0x5af end_report.cold+0x5a/0x5a kasan_report+0xec/0x110 ip_check_mc_rcu+0x556/0x5d0 __mkroute_output+0x895/0x1740 ip_route_output_key_hash_rcu+0x2d0/0x1050 ip_route_output_key_hash+0x182/0x2e0 ip_route_output_flow+0x28/0x130 udp_sendmsg+0x165d/0x2280 udpv6_sendmsg+0x121e/0x24f0 inet6_sendmsg+0xf7/0x140 sock_sendmsg+0xe9/0x180 ____sys_sendmsg+0x2b8/0x7a0 ___sys_sendmsg+0xf0/0x160 __sys_sendmmsg+0x17e/0x3c0 __x64_sys_sendmmsg+0x9e/0x100 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x462eb9 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f3df5af1c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462eb9 RDX: 0000000000000312 RSI: 0000000020001700 RDI: 0000000000000007 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3df5af26bc R13: 00000000004c372d R14: 0000000000700b10 R15: 00000000ffffffff It is one use-after-free in ip_check_mc_rcu. In ip_mc_del_src, the ip_sf_list of pmc has been freed under pmc->lock protection. But access to ip_sf_list in ip_check_mc_rcu is not protected by the lock. Signed-off-by: Liu Jian <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Lee Jones <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…_lock Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218f ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <[email protected]> Tested-by: Jeff Layton <[email protected]> cc: Jan Kara <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1

In line 800 (#1), nfp_cpp_area_alloc() allocates and initializes a CPP area structure. But in line 807 (#2), when the cache is allocated failed, this CPP area structure is not freed, which will result in memory leak. We can fix it by freeing the CPP area when the cache is allocated failed (#2). 792 int nfp_cpp_area_cache_add(struct nfp_cpp *cpp, size_t size) 793 { 794 struct nfp_cpp_area_cache *cache; 795 struct nfp_cpp_area *area; 800 area = nfp_cpp_area_alloc(cpp, NFP_CPP_ID(7, NFP_CPP_ACTION_RW, 0), 801 0, size); // #1: allocates and initializes 802 if (!area) 803 return -ENOMEM; 805 cache = kzalloc(sizeof(*cache), GFP_KERNEL); 806 if (!cache) 807 return -ENOMEM; // #2: missing free 817 return 0; 818 } Fixes: 4cb584e ("nfp: add CPP access core") Signed-off-by: Jianglei Nie <[email protected]> Acked-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Ido Schimmel says: ==================== mlxsw: MAC profiles occupancy fix Patch #1 fixes a router interface (RIF) MAC profiles occupancy bug that was merged in the last cycle. Patch #2 adds a selftest that fails without the fix. ==================== Signed-off-by: David S. Miller <[email protected]>

Line 1169 (#3) allocates a memory chunk for victim_name by kmalloc(), but when the function returns in line 1184 (#4) victim_name allocated by line 1169 (#3) is not freed, which will lead to a memory leak. There is a similar snippet of code in this function as allocating a memory chunk for victim_name in line 1104 (#1) as well as releasing the memory in line 1116 (#2). We should kfree() victim_name when the return value of backref_in_log() is less than zero and before the function returns in line 1184 (#4). 1057 static inline int __add_inode_ref(struct btrfs_trans_handle *trans, 1058 struct btrfs_root *root, 1059 struct btrfs_path *path, 1060 struct btrfs_root *log_root, 1061 struct btrfs_inode *dir, 1062 struct btrfs_inode *inode, 1063 u64 inode_objectid, u64 parent_objectid, 1064 u64 ref_index, char *name, int namelen, 1065 int *search_done) 1066 { 1104 victim_name = kmalloc(victim_name_len, GFP_NOFS); // #1: kmalloc (victim_name-1) 1105 if (!victim_name) 1106 return -ENOMEM; 1112 ret = backref_in_log(log_root, &search_key, 1113 parent_objectid, victim_name, 1114 victim_name_len); 1115 if (ret < 0) { 1116 kfree(victim_name); // #2: kfree (victim_name-1) 1117 return ret; 1118 } else if (!ret) { 1169 victim_name = kmalloc(victim_name_len, GFP_NOFS); // #3: kmalloc (victim_name-2) 1170 if (!victim_name) 1171 return -ENOMEM; 1180 ret = backref_in_log(log_root, &search_key, 1181 parent_objectid, victim_name, 1182 victim_name_len); 1183 if (ret < 0) { 1184 return ret; // #4: missing kfree (victim_name-2) 1185 } else if (!ret) { 1241 return 0; 1242 } Fixes: d3316c8 ("btrfs: Properly handle backref_in_log retval") CC: [email protected] # 5.10+ Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Jianglei Nie <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>

The fixed commit attempts to close inject.output even if it was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 48 iofclose.c: No such file or directory. (gdb) bt #0 0x00007eff8afeef5b in _IO_new_fclose (fp=0x0) at iofclose.c:48 #1 0x0000557fc7b74f92 in perf_data__close (data=data@entry=0x7ffcdafa6578) at util/data.c:376 #2 0x0000557fc7a6b807 in cmd_inject (argc=<optimized out>, argv=<optimized out>) at builtin-inject.c:1085 #3 0x0000557fc7ac4783 in run_builtin (p=0x557fc8074878 <commands+600>, argc=4, argv=0x7ffcdafb6a60) at perf.c:313 #4 0x0000557fc7a25d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 linux-surface#5 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 linux-surface#6 main (argc=4, argv=0x7ffcdafb6a60) at perf.c:539 (gdb) Fixes: 02e6246 ("perf inject: Close inject.output on exit") Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

The fixed commit attempts to get the output file descriptor even if the file was never opened e.g. $ perf record uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (7 samples) ] $ perf inject -i perf.data --vm-time-correlation=dry-run Segmentation fault (core dumped) $ gdb --quiet perf Reading symbols from perf... (gdb) r inject -i perf.data --vm-time-correlation=dry-run Starting program: /home/ahunter/bin/perf inject -i perf.data --vm-time-correlation=dry-run [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. __GI___fileno (fp=0x0) at fileno.c:35 35 fileno.c: No such file or directory. (gdb) bt #0 __GI___fileno (fp=0x0) at fileno.c:35 #1 0x00005621e48dd987 in perf_data__fd (data=0x7fff4c68bd08) at util/data.h:72 #2 perf_data__fd (data=0x7fff4c68bd08) at util/data.h:69 #3 cmd_inject (argc=<optimized out>, argv=0x7fff4c69c1f0) at builtin-inject.c:1017 #4 0x00005621e4936783 in run_builtin (p=0x5621e4ee6878 <commands+600>, argc=4, argv=0x7fff4c69c1f0) at perf.c:313 linux-surface#5 0x00005621e4897d5c in handle_internal_command (argv=<optimized out>, argc=<optimized out>) at perf.c:365 linux-surface#6 run_argv (argcp=<optimized out>, argv=<optimized out>) at perf.c:409 linux-surface#7 main (argc=4, argv=0x7fff4c69c1f0) at perf.c:539 (gdb) Fixes: 0ae0389 ("perf tools: Pass a fd to perf_file_header__read_pipe()") Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Riccardo Mancini <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

Hayes Wang says: ==================== r8152: fix bugs Patch #1 fix the issue of force speed mode for RTL8156. Patch #2 fix the issue of unexpected ocp_base. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

Add a test that sends large udp packet (which is fragmented) via a stateless nft nat rule, i.e. 'ip saddr set 10.2.3.4' and check that the datagram is received by peer. On kernels without commit 4e1860a ("netfilter: nft_payload: do not update layer 4 checksum when mangling fragments")', this will fail with: cmp: EOF on /tmp/tmp.V1q0iXJyQF which is empty -rw------- 1 root root 4096 Jan 24 22:03 /tmp/tmp.Aaqnq4rBKS -rw------- 1 root root 0 Jan 24 22:03 /tmp/tmp.V1q0iXJyQF ERROR: in and output file mismatch when checking udp with stateless nat FAIL: nftables v1.0.0 (Fearless Fosdick #2) On patched kernels, this will show: PASS: IP statless for ns2-PFp89amx Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>

Quota disable ioctl starts a transaction before waiting for the qgroup rescan worker completes. However, this wait can be infinite and results in deadlock because of circular dependency among the quota disable ioctl, the qgroup rescan worker and the other task with transaction such as block group relocation task. The deadlock happens with the steps following: 1) Task A calls ioctl to disable quota. It starts a transaction and waits for qgroup rescan worker completes. 2) Task B such as block group relocation task starts a transaction and joins to the transaction that task A started. Then task B commits to the transaction. In this commit, task B waits for a commit by task A. 3) Task C as the qgroup rescan worker starts its job and starts a transaction. In this transaction start, task C waits for completion of the transaction that task A started and task B committed. This deadlock was found with fstests test case btrfs/115 and a zoned null_blk device. The test case enables and disables quota, and the block group reclaim was triggered during the quota disable by chance. The deadlock was also observed by running quota enable and disable in parallel with 'btrfs balance' command on regular null_blk devices. An example report of the deadlock: [372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds. [372.479944] Not tainted 5.16.0-rc8 linux-surface#7 [372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [372.493898] task:kworker/u16:6 state:D stack: 0 pid: 103 ppid: 2 flags:0x00004000 [372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs] [372.510782] Call Trace: [372.514092] <TASK> [372.521684] __schedule+0xb56/0x4850 [372.530104] ? io_schedule_timeout+0x190/0x190 [372.538842] ? lockdep_hardirqs_on+0x7e/0x100 [372.547092] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [372.555591] schedule+0xe0/0x270 [372.561894] btrfs_commit_transaction+0x18bb/0x2610 [btrfs] [372.570506] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs] [372.578875] ? free_unref_page+0x3f2/0x650 [372.585484] ? finish_wait+0x270/0x270 [372.591594] ? release_extent_buffer+0x224/0x420 [btrfs] [372.599264] btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs] [372.607157] ? lock_release+0x3a9/0x6d0 [372.613054] ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs] [372.620960] ? do_raw_spin_lock+0x11e/0x250 [372.627137] ? rwlock_bug.part.0+0x90/0x90 [372.633215] ? lock_is_held_type+0xe4/0x140 [372.639404] btrfs_work_helper+0x1ae/0xa90 [btrfs] [372.646268] process_one_work+0x7e9/0x1320 [372.652321] ? lock_release+0x6d0/0x6d0 [372.658081] ? pwq_dec_nr_in_flight+0x230/0x230 [372.664513] ? rwlock_bug.part.0+0x90/0x90 [372.670529] worker_thread+0x59e/0xf90 [372.676172] ? process_one_work+0x1320/0x1320 [372.682440] kthread+0x3b9/0x490 [372.687550] ? _raw_spin_unlock_irq+0x24/0x50 [372.693811] ? set_kthread_struct+0x100/0x100 [372.700052] ret_from_fork+0x22/0x30 [372.705517] </TASK> [372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds. [372.729827] Not tainted 5.16.0-rc8 linux-surface#7 [372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [372.767106] task:btrfs-transacti state:D stack: 0 pid: 2347 ppid: 2 flags:0x00004000 [372.787776] Call Trace: [372.801652] <TASK> [372.812961] __schedule+0xb56/0x4850 [372.830011] ? io_schedule_timeout+0x190/0x190 [372.852547] ? lockdep_hardirqs_on+0x7e/0x100 [372.871761] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [372.886792] schedule+0xe0/0x270 [372.901685] wait_current_trans+0x22c/0x310 [btrfs] [372.919743] ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs] [372.938923] ? finish_wait+0x270/0x270 [372.959085] ? join_transaction+0xc75/0xe30 [btrfs] [372.977706] start_transaction+0x938/0x10a0 [btrfs] [372.997168] transaction_kthread+0x19d/0x3c0 [btrfs] [373.013021] ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs] [373.031678] kthread+0x3b9/0x490 [373.047420] ? _raw_spin_unlock_irq+0x24/0x50 [373.064645] ? set_kthread_struct+0x100/0x100 [373.078571] ret_from_fork+0x22/0x30 [373.091197] </TASK> [373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds. [373.114147] Not tainted 5.16.0-rc8 linux-surface#7 [373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [373.130393] task:btrfs state:D stack: 0 pid: 3145 ppid: 3141 flags:0x00004000 [373.140998] Call Trace: [373.145501] <TASK> [373.149654] __schedule+0xb56/0x4850 [373.155306] ? io_schedule_timeout+0x190/0x190 [373.161965] ? lockdep_hardirqs_on+0x7e/0x100 [373.168469] ? _raw_spin_unlock_irqrestore+0x3e/0x60 [373.175468] schedule+0xe0/0x270 [373.180814] wait_for_commit+0x104/0x150 [btrfs] [373.187643] ? test_and_set_bit+0x20/0x20 [btrfs] [373.194772] ? kmem_cache_free+0x124/0x550 [373.201191] ? btrfs_put_transaction+0x69/0x3d0 [btrfs] [373.208738] ? finish_wait+0x270/0x270 [373.214704] ? __btrfs_end_transaction+0x347/0x7b0 [btrfs] [373.222342] btrfs_commit_transaction+0x44d/0x2610 [btrfs] [373.230233] ? join_transaction+0x255/0xe30 [btrfs] [373.237334] ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs] [373.245251] ? btrfs_apply_pending_changes+0x50/0x50 [btrfs] [373.253296] relocate_block_group+0x105/0xc20 [btrfs] [373.260533] ? mutex_lock_io_nested+0x1270/0x1270 [373.267516] ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs] [373.275155] ? merge_reloc_roots+0x710/0x710 [btrfs] [373.283602] ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs] [373.291934] ? kmem_cache_free+0x124/0x550 [373.298180] btrfs_relocate_block_group+0x35c/0x930 [btrfs] [373.306047] btrfs_relocate_chunk+0x85/0x210 [btrfs] [373.313229] btrfs_balance+0x12f4/0x2d20 [btrfs] [373.320227] ? lock_release+0x3a9/0x6d0 [373.326206] ? btrfs_relocate_chunk+0x210/0x210 [btrfs] [373.333591] ? lock_is_held_type+0xe4/0x140 [373.340031] ? rcu_read_lock_sched_held+0x3f/0x70 [373.346910] btrfs_ioctl_balance+0x548/0x700 [btrfs] [373.354207] btrfs_ioctl+0x7f2/0x71b0 [btrfs] [373.360774] ? lockdep_hardirqs_on_prepare+0x410/0x410 [373.367957] ? lockdep_hardirqs_on_prepare+0x410/0x410 [373.375327] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs] [373.383841] ? find_held_lock+0x2c/0x110 [373.389993] ? lock_release+0x3a9/0x6d0 [373.395828] ? mntput_no_expire+0xf7/0xad0 [373.402083] ? lock_is_held_type+0xe4/0x140 [373.408249] ? vfs_fileattr_set+0x9f0/0x9f0 [373.414486] ? selinux_file_ioctl+0x349/0x4e0 [373.420938] ? trace_raw_output_lock+0xb4/0xe0 [373.427442] ? selinux_inode_getsecctx+0x80/0x80 [373.434224] ? lockdep_hardirqs_on+0x7e/0x100 [373.440660] ? force_qs_rnp+0x2a0/0x6b0 [373.446534] ? lock_is_held_type+0x9b/0x140 [373.452763] ? __blkcg_punt_bio_submit+0x1b0/0x1b0 [373.459732] ? security_file_ioctl+0x50/0x90 [373.466089] __x64_sys_ioctl+0x127/0x190 [373.472022] do_syscall_64+0x3b/0x90 [373.477513] entry_SYSCALL_64_after_hwframe+0x44/0xae [373.484823] RIP: 0033:0x7f8f4af7e2bb [373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb [373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003 [373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0 [373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001 [373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002 [373.546506] </TASK> [373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds. [373.559383] Not tainted 5.16.0-rc8 linux-surface#7 [373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [373.575748] task:btrfs state:D stack: 0 pid: 3146 ppid: 2168 flags:0x00000000 [373.586314] Call Trace: [373.590846] <TASK> [373.595121] __schedule+0xb56/0x4850 [373.600901] ? __lock_acquire+0x23db/0x5030 [373.607176] ? io_schedule_timeout+0x190/0x190 [373.613954] schedule+0xe0/0x270 [373.619157] schedule_timeout+0x168/0x220 [373.625170] ? usleep_range_state+0x150/0x150 [373.631653] ? mark_held_locks+0x9e/0xe0 [373.637767] ? do_raw_spin_lock+0x11e/0x250 [373.643993] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [373.651267] ? _raw_spin_unlock_irq+0x24/0x50 [373.657677] ? lockdep_hardirqs_on+0x7e/0x100 [373.664103] wait_for_completion+0x163/0x250 [373.670437] ? bit_wait_timeout+0x160/0x160 [373.676585] btrfs_quota_disable+0x176/0x9a0 [btrfs] [373.683979] ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs] [373.691340] ? down_write+0xd0/0x130 [373.696880] ? down_write_killable+0x150/0x150 [373.703352] btrfs_ioctl+0x3945/0x71b0 [btrfs] [373.710061] ? find_held_lock+0x2c/0x110 [373.716192] ? lock_release+0x3a9/0x6d0 [373.722047] ? __handle_mm_fault+0x23cd/0x3050 [373.728486] ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs] [373.737032] ? set_pte+0x6a/0x90 [373.742271] ? do_raw_spin_unlock+0x55/0x1f0 [373.748506] ? lock_is_held_type+0xe4/0x140 [373.754792] ? vfs_fileattr_set+0x9f0/0x9f0 [373.761083] ? selinux_file_ioctl+0x349/0x4e0 [373.767521] ? selinux_inode_getsecctx+0x80/0x80 [373.774247] ? __up_read+0x182/0x6e0 [373.780026] ? count_memcg_events.constprop.0+0x46/0x60 [373.787281] ? up_write+0x460/0x460 [373.792932] ? security_file_ioctl+0x50/0x90 [373.799232] __x64_sys_ioctl+0x127/0x190 [373.805237] do_syscall_64+0x3b/0x90 [373.810947] entry_SYSCALL_64_after_hwframe+0x44/0xae [373.818102] RIP: 0033:0x7f1383ea02bb [373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb [373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003 [373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078 [373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a [373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000 [373.879838] </TASK> [373.884018] Showing all locks held in the system: [373.894250] 3 locks held by kworker/4:1/58: [373.900356] 1 lock held by khungtaskd/63: [373.906333] #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 [373.917307] 3 locks held by kworker/u16:6/103: [373.923938] #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320 [373.936555] #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320 [373.951109] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs] [373.964027] 2 locks held by less/1803: [373.969982] #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80 [373.981295] #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060 [373.992969] 1 lock held by btrfs-transacti/2347: [373.999893] #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs] [374.015872] 3 locks held by btrfs/3145: [374.022298] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs] [374.034456] #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs] [374.047646] #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs] [374.063295] 4 locks held by btrfs/3146: [374.069647] #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs] [374.081601] #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs] [374.094283] #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs] [374.106885] #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs] [374.126780] ============================================= To avoid the deadlock, wait for the qgroup rescan worker to complete before starting the transaction for the quota disable ioctl. Clear BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to request the worker to complete. On transaction start failure, set the BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag changes can be done safely since the function btrfs_quota_disable is not called concurrently because of fs_info->subvol_sem. Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid another qgroup rescan worker to start after the previous qgroup worker completed. CC: [email protected] # 5.4+ Suggested-by: Nikolay Borisov <[email protected]> Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Signed-off-by: David Sterba <[email protected]>

…egulator The interrupt pin of the external ethernet phy is used, instead of the enable-gpio pin of the tf-io regulator. The GPIOE_2 pin is located in the gpio_ao bank. This causes phy interrupt problems at system startup. [ 76.645190] irq 36: nobody cared (try booting with the "irqpoll" option) [ 76.649617] CPU: 0 PID: 1416 Comm: irq/36-0.0:00 Not tainted 5.16.0 #2 [ 76.649629] Hardware name: Hardkernel ODROID-HC4 (DT) [ 76.649635] Call trace: [ 76.649638] dump_backtrace+0x0/0x1c8 [ 76.649658] show_stack+0x14/0x60 [ 76.649667] dump_stack_lvl+0x64/0x7c [ 76.649676] dump_stack+0x14/0x2c [ 76.649683] __report_bad_irq+0x38/0xe8 [ 76.649695] note_interrupt+0x220/0x3a0 [ 76.649704] handle_irq_event_percpu+0x58/0x88 [ 76.649713] handle_irq_event+0x44/0xd8 [ 76.649721] handle_fasteoi_irq+0xa8/0x130 [ 76.649730] generic_handle_domain_irq+0x38/0x58 [ 76.649738] gic_handle_irq+0x9c/0xb8 [ 76.649747] call_on_irq_stack+0x28/0x38 [ 76.649755] do_interrupt_handler+0x7c/0x80 [ 76.649763] el1_interrupt+0x34/0x80 [ 76.649772] el1h_64_irq_handler+0x14/0x20 [ 76.649781] el1h_64_irq+0x74/0x78 [ 76.649788] irq_finalize_oneshot.part.56+0x68/0xf8 [ 76.649796] irq_thread_fn+0x5c/0x98 [ 76.649804] irq_thread+0x13c/0x260 [ 76.649812] kthread+0x144/0x178 [ 76.649822] ret_from_fork+0x10/0x20 [ 76.649830] handlers: [ 76.653170] [<0000000025a6cd31>] irq_default_primary_handler threaded [<0000000093580eb7>] phy_interrupt [ 76.661256] Disabling IRQ linux-surface#36 Fixes: 1f80a5c ("arm64: dts: meson-sm1-odroid: add missing enable gpio and supply for tf_io regulator") Signed-off-by: Lutz Koschorreck <[email protected]> Reviewed-by: Neil Armstrong <[email protected]> Signed-off-by: Neil Armstrong <[email protected]> [narmstrong: removed spurious invalid & blank lines from commit message] Link: https://lore.kernel.org/r/20220127130537.GA187347@odroid-VirtualBox

…/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 5.17, take #2 - A couple of fixes when handling an exception while a SError has been delivered - Workaround for Cortex-A510's single-step[ erratum

When using the flushoncommit mount option, during almost every transaction commit we trigger a warning from __writeback_inodes_sb_nr(): $ cat fs/fs-writeback.c: (...) static void __writeback_inodes_sb_nr(struct super_block *sb, ... { (...) WARN_ON(!rwsem_is_locked(&sb->s_umount)); (...) } (...) The trace produced in dmesg looks like the following: [947.473890] WARNING: CPU: 5 PID: 930 at fs/fs-writeback.c:2610 __writeback_inodes_sb_nr+0x7e/0xb3 [947.481623] Modules linked in: nfsd nls_cp437 cifs asn1_decoder cifs_arc4 fscache cifs_md4 ipmi_ssif [947.489571] CPU: 5 PID: 930 Comm: btrfs-transacti Not tainted 95.16.3-srb-asrock-00001-g36437ad63879 #186 [947.497969] RIP: 0010:__writeback_inodes_sb_nr+0x7e/0xb3 [947.502097] Code: 24 10 4c 89 44 24 18 c6 (...) [947.519760] RSP: 0018:ffffc90000777e10 EFLAGS: 00010246 [947.523818] RAX: 0000000000000000 RBX: 0000000000963300 RCX: 0000000000000000 [947.529765] RDX: 0000000000000000 RSI: 000000000000fa51 RDI: ffffc90000777e50 [947.535740] RBP: ffff888101628a90 R08: ffff888100955800 R09: ffff888100956000 [947.541701] R10: 0000000000000002 R11: 0000000000000001 R12: ffff888100963488 [947.547645] R13: ffff888100963000 R14: ffff888112fb7200 R15: ffff888100963460 [947.553621] FS: 0000000000000000(0000) GS:ffff88841fd40000(0000) knlGS:0000000000000000 [947.560537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [947.565122] CR2: 0000000008be50c4 CR3: 000000000220c000 CR4: 00000000001006e0 [947.571072] Call Trace: [947.572354] <TASK> [947.573266] btrfs_commit_transaction+0x1f1/0x998 [947.576785] ? start_transaction+0x3ab/0x44e [947.579867] ? schedule_timeout+0x8a/0xdd [947.582716] transaction_kthread+0xe9/0x156 [947.585721] ? btrfs_cleanup_transaction.isra.0+0x407/0x407 [947.590104] kthread+0x131/0x139 [947.592168] ? set_kthread_struct+0x32/0x32 [947.595174] ret_from_fork+0x22/0x30 [947.597561] </TASK> [947.598553] ---[ end trace 644721052755541c ]--- This is because we started using writeback_inodes_sb() to flush delalloc when committing a transaction (when using -o flushoncommit), in order to avoid deadlocks with filesystem freeze operations. This change was made by commit ce8ea7c ("btrfs: don't call btrfs_start_delalloc_roots in flushoncommit"). After that change we started producing that warning, and every now and then a user reports this since the warning happens too often, it spams dmesg/syslog, and a user is unsure if this reflects any problem that might compromise the filesystem's reliability. We can not just lock the sb->s_umount semaphore before calling writeback_inodes_sb(), because that would at least deadlock with filesystem freezing, since at fs/super.c:freeze_super() sync_filesystem() is called while we are holding that semaphore in write mode, and that can trigger a transaction commit, resulting in a deadlock. It would also trigger the same type of deadlock in the unmount path. Possibly, it could also introduce some other locking dependencies that lockdep would report. To fix this call try_to_writeback_inodes_sb() instead of writeback_inodes_sb(), because that will try to read lock sb->s_umount and then will only call writeback_inodes_sb() if it was able to lock it. This is fine because the cases where it can't read lock sb->s_umount are during a filesystem unmount or during a filesystem freeze - in those cases sb->s_umount is write locked and sync_filesystem() is called, which calls writeback_inodes_sb(). In other words, in all cases where we can't take a read lock on sb->s_umount, writeback is already being triggered elsewhere. An alternative would be to call btrfs_start_delalloc_roots() with a number of pages different from LONG_MAX, for example matching the number of delalloc bytes we currently have, in which case we would end up starting all delalloc with filemap_fdatawrite_wbc() and not with an async flush via filemap_flush() - that is only possible after the rather recent commit e076ab2 ("btrfs: shrink delalloc pages instead of full inodes"). However that creates a whole new can of worms due to new lock dependencies, which lockdep complains, like for example: [ 8948.247280] ====================================================== [ 8948.247823] WARNING: possible circular locking dependency detected [ 8948.248353] 5.17.0-rc1-btrfs-next-111 #1 Not tainted [ 8948.248786] ------------------------------------------------------ [ 8948.249320] kworker/u16:18/933570 is trying to acquire lock: [ 8948.249812] ffff9b3de1591690 (sb_internal#2){.+.+}-{0:0}, at: find_free_extent+0x141e/0x1590 [btrfs] [ 8948.250638] but task is already holding lock: [ 8948.251140] ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.252018] which lock already depends on the new lock. [ 8948.252710] the existing dependency chain (in reverse order) is: [ 8948.253343] -> #2 (&root->delalloc_mutex){+.+.}-{3:3}: [ 8948.253950] __mutex_lock+0x90/0x900 [ 8948.254354] start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.254859] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.255408] btrfs_commit_transaction+0x32f/0xc00 [btrfs] [ 8948.255942] btrfs_mksubvol+0x380/0x570 [btrfs] [ 8948.256406] btrfs_mksnapshot+0x81/0xb0 [btrfs] [ 8948.256870] __btrfs_ioctl_snap_create+0x17f/0x190 [btrfs] [ 8948.257413] btrfs_ioctl_snap_create_v2+0xbb/0x140 [btrfs] [ 8948.257961] btrfs_ioctl+0x1196/0x3630 [btrfs] [ 8948.258418] __x64_sys_ioctl+0x83/0xb0 [ 8948.258793] do_syscall_64+0x3b/0xc0 [ 8948.259146] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8948.259709] -> #1 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}: [ 8948.260330] __mutex_lock+0x90/0x900 [ 8948.260692] btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs] [ 8948.261234] btrfs_commit_transaction+0x32f/0xc00 [btrfs] [ 8948.261766] btrfs_set_free_space_cache_v1_active+0x38/0x60 [btrfs] [ 8948.262379] btrfs_start_pre_rw_mount+0x119/0x180 [btrfs] [ 8948.262909] open_ctree+0x1511/0x171e [btrfs] [ 8948.263359] btrfs_mount_root.cold+0x12/0xde [btrfs] [ 8948.263863] legacy_get_tree+0x30/0x50 [ 8948.264242] vfs_get_tree+0x28/0xc0 [ 8948.264594] vfs_kern_mount.part.0+0x71/0xb0 [ 8948.265017] btrfs_mount+0x11d/0x3a0 [btrfs] [ 8948.265462] legacy_get_tree+0x30/0x50 [ 8948.265851] vfs_get_tree+0x28/0xc0 [ 8948.266203] path_mount+0x2d4/0xbe0 [ 8948.266554] __x64_sys_mount+0x103/0x140 [ 8948.266940] do_syscall_64+0x3b/0xc0 [ 8948.267300] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 8948.267790] -> #0 (sb_internal#2){.+.+}-{0:0}: [ 8948.268322] __lock_acquire+0x12e8/0x2260 [ 8948.268733] lock_acquire+0xd7/0x310 [ 8948.269092] start_transaction+0x44c/0x6e0 [btrfs] [ 8948.269591] find_free_extent+0x141e/0x1590 [btrfs] [ 8948.270087] btrfs_reserve_extent+0x14b/0x280 [btrfs] [ 8948.270588] cow_file_range+0x17e/0x490 [btrfs] [ 8948.271051] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs] [ 8948.271586] writepage_delalloc+0xb5/0x170 [btrfs] [ 8948.272071] __extent_writepage+0x156/0x3c0 [btrfs] [ 8948.272579] extent_write_cache_pages+0x263/0x460 [btrfs] [ 8948.273113] extent_writepages+0x76/0x130 [btrfs] [ 8948.273573] do_writepages+0xd2/0x1c0 [ 8948.273942] filemap_fdatawrite_wbc+0x68/0x90 [ 8948.274371] start_delalloc_inodes+0x17f/0x400 [btrfs] [ 8948.274876] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.275417] flush_space+0x1f2/0x630 [btrfs] [ 8948.275863] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [ 8948.276438] process_one_work+0x252/0x5a0 [ 8948.276829] worker_thread+0x55/0x3b0 [ 8948.277189] kthread+0xf2/0x120 [ 8948.277506] ret_from_fork+0x22/0x30 [ 8948.277868] other info that might help us debug this: [ 8948.278548] Chain exists of: sb_internal#2 --> &fs_info->delalloc_root_mutex --> &root->delalloc_mutex [ 8948.279601] Possible unsafe locking scenario: [ 8948.280102] CPU0 CPU1 [ 8948.280508] ---- ---- [ 8948.280915] lock(&root->delalloc_mutex); [ 8948.281271] lock(&fs_info->delalloc_root_mutex); [ 8948.281915] lock(&root->delalloc_mutex); [ 8948.282487] lock(sb_internal#2); [ 8948.282800] *** DEADLOCK *** [ 8948.283333] 4 locks held by kworker/u16:18/933570: [ 8948.283750] #0: ffff9b3dc00a9d48 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0 [ 8948.284609] #1: ffffa90349dafe70 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1d2/0x5a0 [ 8948.285637] #2: ffff9b3e14db5040 (&fs_info->delalloc_root_mutex){+.+.}-{3:3}, at: btrfs_start_delalloc_roots+0x97/0x2a0 [btrfs] [ 8948.286674] #3: ffff9b3e09c717d8 (&root->delalloc_mutex){+.+.}-{3:3}, at: start_delalloc_inodes+0x78/0x400 [btrfs] [ 8948.287596] stack backtrace: [ 8948.287975] CPU: 3 PID: 933570 Comm: kworker/u16:18 Not tainted 5.17.0-rc1-btrfs-next-111 #1 [ 8948.288677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 8948.289649] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [ 8948.290298] Call Trace: [ 8948.290517] <TASK> [ 8948.290700] dump_stack_lvl+0x59/0x73 [ 8948.291026] check_noncircular+0xf3/0x110 [ 8948.291375] ? start_transaction+0x228/0x6e0 [btrfs] [ 8948.291826] __lock_acquire+0x12e8/0x2260 [ 8948.292241] lock_acquire+0xd7/0x310 [ 8948.292714] ? find_free_extent+0x141e/0x1590 [btrfs] [ 8948.293241] ? lock_is_held_type+0xea/0x140 [ 8948.293601] start_transaction+0x44c/0x6e0 [btrfs] [ 8948.294055] ? find_free_extent+0x141e/0x1590 [btrfs] [ 8948.294518] find_free_extent+0x141e/0x1590 [btrfs] [ 8948.294957] ? _raw_spin_unlock+0x29/0x40 [ 8948.295312] ? btrfs_get_alloc_profile+0x124/0x290 [btrfs] [ 8948.295813] btrfs_reserve_extent+0x14b/0x280 [btrfs] [ 8948.296270] cow_file_range+0x17e/0x490 [btrfs] [ 8948.296691] btrfs_run_delalloc_range+0x345/0x7a0 [btrfs] [ 8948.297175] ? find_lock_delalloc_range+0x247/0x270 [btrfs] [ 8948.297678] writepage_delalloc+0xb5/0x170 [btrfs] [ 8948.298123] __extent_writepage+0x156/0x3c0 [btrfs] [ 8948.298570] extent_write_cache_pages+0x263/0x460 [btrfs] [ 8948.299061] extent_writepages+0x76/0x130 [btrfs] [ 8948.299495] do_writepages+0xd2/0x1c0 [ 8948.299817] ? sched_clock_cpu+0xd/0x110 [ 8948.300160] ? lock_release+0x155/0x4a0 [ 8948.300494] filemap_fdatawrite_wbc+0x68/0x90 [ 8948.300874] ? do_raw_spin_unlock+0x4b/0xa0 [ 8948.301243] start_delalloc_inodes+0x17f/0x400 [btrfs] [ 8948.301706] ? lock_release+0x155/0x4a0 [ 8948.302055] btrfs_start_delalloc_roots+0x194/0x2a0 [btrfs] [ 8948.302564] flush_space+0x1f2/0x630 [btrfs] [ 8948.302970] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs] [ 8948.303510] process_one_work+0x252/0x5a0 [ 8948.303860] ? process_one_work+0x5a0/0x5a0 [ 8948.304221] worker_thread+0x55/0x3b0 [ 8948.304543] ? process_one_work+0x5a0/0x5a0 [ 8948.304904] kthread+0xf2/0x120 [ 8948.305184] ? kthread_complete_and_exit+0x20/0x20 [ 8948.305598] ret_from_fork+0x22/0x30 [ 8948.305921] </TASK> It all comes from the fact that btrfs_start_delalloc_roots() takes the delalloc_root_mutex, in the transaction commit path we are holding a read lock on one of the superblock's freeze semaphores (via sb_start_intwrite()), the async reclaim task can also do a call to btrfs_start_delalloc_roots(), which ends up triggering writeback with calls to filemap_fdatawrite_wbc(), resulting in extent allocation which in turn can call btrfs_start_transaction(), which will result in taking the freeze semaphore via sb_start_intwrite(), forming a nasty dependency on all those locks which can be taken in different orders by different code paths. So just adopt the simple approach of calling try_to_writeback_inodes_sb() at btrfs_start_delalloc_flush(). Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: Filipe Manana <[email protected]> [ add more link reports ] Signed-off-by: David Sterba <[email protected]>

Yonghong Song says: ==================== The patch [1] exposed a bpf_timer initialization bug in function check_and_init_map_value(). With bug fix here, the patch [1] can be applied with all selftests passed. Please see individual patches for fix details. [1] https://lore.kernel.org/bpf/[email protected]/ Changelog: v3 -> v4: . move header file in patch #1 to avoid bpf-next merge conflict v2 -> v3: . switch patch #1 and patch #2 for better bisecting v1 -> v2: . add Fixes tag for patch #1 . rebase against bpf tree ==================== Signed-off-by: Alexei Starovoitov <[email protected]>

This driver, like several others, uses a chained IRQ for each GPIO bank, and forwards .irq_set_wake to the GPIO bank's upstream IRQ. As a result, a call to irq_set_irq_wake() needs to lock both the upstream and downstream irq_desc's. Lockdep considers this to be a possible deadlock when the irq_desc's share lockdep classes, which they do by default: ============================================ WARNING: possible recursive locking detected 5.17.0-rc3-00394-gc849047c2473 #1 Not tainted -------------------------------------------- init/307 is trying to acquire lock: c2dfe27c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0 but task is already holding lock: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by init/307: #0: c1f29f18 (system_transition_mutex){+.+.}-{3:3}, at: __do_sys_reboot+0x90/0x23c #1: c20f7760 (&dev->mutex){....}-{3:3}, at: device_shutdown+0xf4/0x224 #2: c2e804d8 (&dev->mutex){....}-{3:3}, at: device_shutdown+0x104/0x224 #3: c3c0ac7c (&irq_desc_lock_class){-.-.}-{2:2}, at: __irq_get_desc_lock+0x58/0xa0 stack backtrace: CPU: 0 PID: 307 Comm: init Not tainted 5.17.0-rc3-00394-gc849047c2473 #1 Hardware name: Allwinner sun8i Family unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x90 dump_stack_lvl from __lock_acquire+0x1680/0x31a0 __lock_acquire from lock_acquire+0x148/0x3dc lock_acquire from _raw_spin_lock_irqsave+0x50/0x6c _raw_spin_lock_irqsave from __irq_get_desc_lock+0x58/0xa0 __irq_get_desc_lock from irq_set_irq_wake+0x2c/0x19c irq_set_irq_wake from irq_set_irq_wake+0x13c/0x19c [tail call from sunxi_pinctrl_irq_set_wake] irq_set_irq_wake from gpio_keys_suspend+0x80/0x1a4 gpio_keys_suspend from gpio_keys_shutdown+0x10/0x2c gpio_keys_shutdown from device_shutdown+0x180/0x224 device_shutdown from __do_sys_reboot+0x134/0x23c __do_sys_reboot from ret_fast_syscall+0x0/0x1c However, this can never deadlock because the upstream and downstream IRQs are never the same (nor do they even involve the same irqchip). Silence this erroneous lockdep splat by applying what appears to be the usual fix of moving the GPIO IRQs to separate lockdep classes. Fixes: a59c99d ("pinctrl: sunxi: Forward calls to irq_set_irq_wake") Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Samuel Holland <[email protected]> Reviewed-by: Jernej Skrabec <[email protected]> Tested-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>

Ido Schimmel says: ==================== selftests: mlxsw: A couple of fixes Patch #1 fixes a breakage due to a change in iproute2 output. The real problem is not iproute2, but the fact that the check was not strict enough. Fixed by using JSON output instead. Targeting at net so that the test will pass as part of old and new kernels regardless of iproute2 version. Patch #2 fixes an issue uncovered by the first one. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

in tunnel mode, if outer interface(ipv4) is less, it is easily to let inner IPV6 mtu be less than 1280. If so, a Packet Too Big ICMPV6 message is received. When send again, packets are fragmentized with 1280, they are still rejected with ICMPV6(Packet Too Big) by xfrmi_xmit2(). According to RFC4213 Section3.2.2: if (IPv4 path MTU - 20) is less than 1280 if packet is larger than 1280 bytes Send ICMPv6 "packet too big" with MTU=1280 Drop packet else Encapsulate but do not set the Don't Fragment flag in the IPv4 header. The resulting IPv4 packet might be fragmented by the IPv4 layer on the encapsulator or by some router along the IPv4 path. endif else if packet is larger than (IPv4 path MTU - 20) Send ICMPv6 "packet too big" with MTU = (IPv4 path MTU - 20). Drop packet. else Encapsulate and set the Don't Fragment flag in the IPv4 header. endif endif Packets should be fragmentized with ipv4 outer interface, so change it. After it is fragemtized with ipv4, there will be double fragmenation. No.48 & No.51 are ipv6 fragment packets, No.48 is double fragmentized, then tunneled with IPv4(No.49& No.50), which obey spec. And received peer cannot decrypt it rightly. 48 2002::10 2002::11 1296(length) IPv6 fragment (off=0 more=y ident=0xa20da5bc nxt=50) 49 0x0000 (0) 2002::10 2002::11 1304 IPv6 fragment (off=0 more=y ident=0x7448042c nxt=44) 50 0x0000 (0) 2002::10 2002::11 200 ESP (SPI=0x00035000) 51 2002::10 2002::11 180 Echo (ping) request 52 0x56dc 2002::10 2002::11 248 IPv6 fragment (off=1232 more=n ident=0xa20da5bc nxt=50) xfrm6_noneed_fragment has fixed above issues. Finally, it acted like below: 1 0x6206 192.168.1.138 192.168.1.1 1316 Fragmented IP protocol (proto=Encap Security Payload 50, off=0, ID=6206) [Reassembled in #2] 2 0x6206 2002::10 2002::11 88 IPv6 fragment (off=0 more=y ident=0x1f440778 nxt=50) 3 0x0000 2002::10 2002::11 248 ICMPv6 Echo (ping) request Signed-off-by: Lina Wang <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>

Revert "test: Update v4.19 surface"

4a3bef9

kitakar5525 merged commit 3b82099 into test/v4.19-surface May 16, 2020

kitakar5525 deleted the revert-1-update/v4.19-surface branch May 16, 2020 10:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: Revert "test: Update v4.19 surface" #2

test: Revert "test: Update v4.19 surface" #2

kitakar5525 commented May 16, 2020

test: Revert "test: Update v4.19 surface" #2

test: Revert "test: Update v4.19 surface" #2

Conversation

kitakar5525 commented May 16, 2020