-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Closed, see #29) OEMB-issue: add quirk for Surface 3 with bad DMI table #28
Closed
kitakar5525
wants to merge
2
commits into
linux-surface:v5.3-surface-devel
from
kitakar5525:Surface3-OEMB
Closed
(Closed, see #29) OEMB-issue: add quirk for Surface 3 with bad DMI table #28
kitakar5525
wants to merge
2
commits into
linux-surface:v5.3-surface-devel
from
kitakar5525:Surface3-OEMB
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(am from http://git.osdn.net/view?p=android-x86/kernel.git;a=commitdiff;h=18e2e857c57633b25b3b4120f212224a108cd883) Patch for Surface 3 which is suffering from the "OEMB" problem. On the affected systems, DMI table is broken and breaks DMI matching used for quirk. This patch will add the (broken) DMI info for the affected Surface 3. Note here that this issue will not necessarily happen after playing around with Android-x86. I heard a report from a person on the IRC channel that on the affected system, only Manjaro was used. On affected systems, dmidecode will look like this: $ sudo dmidecode [...] BIOS Information Vendor: American Megatrends Inc. [...] System Information Manufacturer: OEMB Product Name: OEMB [...] Expected: $ sudo dmidecode [...] BIOS Information Vendor: (???, I think something like "Microsoft Corporation") [...] System Information Manufacturer: Microsoft Corporation Product Name: Surface 3 [...] === (Original commit comment) Some Microsoft Surface 3 owners including me encountered a strange issue when play Android-x86 on the tablet. The DMI table was erased due to unknown reason and sound doesn't work in Android-x86 (but it's normal in Windows). See more details: https://groups.google.com/d/msg/android-x86/z6GDuvV2oWk/mzyg0RQiCAAJ Since the DMI table is incorrect, kernel won't enable quirk for it. To workaround such an issue, add quirk for the bad data "OEMB". It should not affect any product with correct DMI data.
(made referring to http://git.osdn.net/view?p=android-x86/kernel.git;a=commitdiff;h=18e2e857c57633b25b3b4120f212224a108cd883) Patch for Surface 3 which is suffering from the "OEMB" problem. On the affected systems, DMI table is broken and breaks DMI matching used for quirk. This patch will add the (broken) DMI info for the affected Surface 3. Note here that this issue will not necessarily happen after playing around with Android-x86. I heard a report from a person on the IRC channel that on the affected system, only Manjaro was used. On affected systems, dmidecode will look like this: $ sudo dmidecode [...] BIOS Information Vendor: American Megatrends Inc. [...] System Information Manufacturer: OEMB Product Name: OEMB [...] Expected: $ sudo dmidecode [...] BIOS Information Vendor: (???, I think something like "Microsoft Corporation") [...] System Information Manufacturer: Microsoft Corporation Product Name: Surface 3 [...]
kitakar5525
changed the title
OEMB-issue: add quirk for Surface 3 with bad DMI table
(retracted, add author name to commit message instead) OEMB-issue: add quirk for Surface 3 with bad DMI table
Jan 11, 2020
kitakar5525
changed the title
(retracted, add author name to commit message instead) OEMB-issue: add quirk for Surface 3 with bad DMI table
(Closed, see #29) OEMB-issue: add quirk for Surface 3 with bad DMI table
Jan 12, 2020
qzed
pushed a commit
that referenced
this pull request
Jul 21, 2020
We should not trigger a warning when a memory allocation fails. Remove the WARN_ON(). The warning is constantly triggered by syzkaller when it is injecting faults: [ 2230.758664] FAULT_INJECTION: forcing a failure. [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0 [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 ... [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0 [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ... [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Fixes: 3057224 ("mlxsw: spectrum_router: Implement FIB offload in deferred work") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
kitakar5525
pushed a commit
that referenced
this pull request
Jul 24, 2020
[ Upstream commit d9d5420 ] We should not trigger a warning when a memory allocation fails. Remove the WARN_ON(). The warning is constantly triggered by syzkaller when it is injecting faults: [ 2230.758664] FAULT_INJECTION: forcing a failure. [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0 [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 ... [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0 [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ... [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Fixes: 3057224 ("mlxsw: spectrum_router: Implement FIB offload in deferred work") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
kitakar5525
pushed a commit
that referenced
this pull request
Jul 24, 2020
[ Upstream commit d9d5420 ] We should not trigger a warning when a memory allocation fails. Remove the WARN_ON(). The warning is constantly triggered by syzkaller when it is injecting faults: [ 2230.758664] FAULT_INJECTION: forcing a failure. [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0 [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 ... [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0 [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ... [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Fixes: 3057224 ("mlxsw: spectrum_router: Implement FIB offload in deferred work") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
kitakar5525
pushed a commit
that referenced
this pull request
Jul 24, 2020
[ Upstream commit d9d5420 ] We should not trigger a warning when a memory allocation fails. Remove the WARN_ON(). The warning is constantly triggered by syzkaller when it is injecting faults: [ 2230.758664] FAULT_INJECTION: forcing a failure. [ 2230.758664] name failslab, interval 1, probability 0, space 0, times 0 [ 2230.762329] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 ... [ 2230.898175] WARNING: CPU: 3 PID: 1407 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6265 mlxsw_sp_router_fib_event+0xfad/0x13e0 [ 2230.898179] Kernel panic - not syncing: panic_on_warn set ... [ 2230.898183] CPU: 3 PID: 1407 Comm: syz-executor.0 Not tainted 5.8.0-rc2+ #28 [ 2230.898190] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Fixes: 3057224 ("mlxsw: spectrum_router: Implement FIB offload in deferred work") Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Oct 1, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Oct 1, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
May 23, 2021
…xtent When cloning an inline extent there are a few cases, such as when we have an implicit hole at file offset 0, where we start a transaction while holding a read lock on a leaf. Starting the transaction results in a call to sb_start_intwrite(), which results in doing a read lock on a percpu semaphore. Lockdep doesn't like this and complains about it: [46.580704] ====================================================== [46.580752] WARNING: possible circular locking dependency detected [46.580799] 5.13.0-rc1 #28 Not tainted [46.580832] ------------------------------------------------------ [46.580877] cloner/3835 is trying to acquire lock: [46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0 [46.581167] [46.581167] but task is already holding lock: [46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.581293] [46.581293] which lock already depends on the new lock. [46.581293] [46.581351] [46.581351] the existing dependency chain (in reverse order) is: [46.581410] [46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}: [46.581464] down_read_nested+0x68/0x200 [46.581536] __btrfs_tree_read_lock+0x70/0x1d0 [46.581577] btrfs_read_lock_root_node+0x88/0x200 [46.581623] btrfs_search_slot+0x298/0xb70 [46.581665] btrfs_set_inode_index+0xfc/0x260 [46.581708] btrfs_new_inode+0x26c/0x950 [46.581749] btrfs_create+0xf4/0x2b0 [46.581782] lookup_open.isra.57+0x55c/0x6a0 [46.581855] path_openat+0x418/0xd20 [46.581888] do_filp_open+0x9c/0x130 [46.581920] do_sys_openat2+0x2ec/0x430 [46.581961] do_sys_open+0x90/0xc0 [46.581993] system_call_exception+0x3d4/0x410 [46.582037] system_call_common+0xec/0x278 [46.582078] [46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}: [46.582135] __lock_acquire+0x1e90/0x2c50 [46.582176] lock_acquire+0x2b4/0x5b0 [46.582263] start_transaction+0x3cc/0x950 [46.582308] clone_copy_inline_extent+0xe4/0x5a0 [46.582353] btrfs_clone+0x5fc/0x880 [46.582388] btrfs_clone_files+0xd8/0x1c0 [46.582434] btrfs_remap_file_range+0x3d8/0x590 [46.582481] do_clone_file_range+0x10c/0x270 [46.582558] vfs_clone_file_range+0x1b0/0x310 [46.582605] ioctl_file_clone+0x90/0x130 [46.582651] do_vfs_ioctl+0x874/0x1ac0 [46.582697] sys_ioctl+0x6c/0x120 [46.582733] system_call_exception+0x3d4/0x410 [46.582777] system_call_common+0xec/0x278 [46.582822] [46.582822] other info that might help us debug this: [46.582822] [46.582888] Possible unsafe locking scenario: [46.582888] [46.582942] CPU0 CPU1 [46.582984] ---- ---- [46.583028] lock(btrfs-tree-00); [46.583062] lock(sb_internal#2); [46.583119] lock(btrfs-tree-00); [46.583174] lock(sb_internal#2); [46.583212] [46.583212] *** DEADLOCK *** [46.583212] [46.583266] 6 locks held by cloner/3835: [46.583299] #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130 [46.583382] #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0 [46.583477] #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0 [46.583574] #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590 [46.583657] #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590 [46.583743] #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.583828] [46.583828] stack backtrace: [46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28 [46.583931] Call Trace: [46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable) [46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400 [46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190 [46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50 [46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0 [46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950 [46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0 [46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880 [46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0 [46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590 [46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270 [46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310 [46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130 [46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0 [46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120 [46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410 [46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278 [46.585114] --- interrupt: c00 at 0x7ffff7e22990 [46.585160] NIP: 00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000 [46.585224] REGS: c0000000167c7e80 TRAP: 0c00 Not tainted (5.13.0-rc1) [46.585280] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000244 XER: 00000000 [46.585374] IRQMASK: 0 [46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004 [46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000 [46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000 [46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000 [46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004 [46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f [46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990 [46.585964] LR [00000001000010ec] 0x1000010ec [46.586010] --- interrupt: c00 This should be a false positive, as both locks are acquired in read mode. Nevertheless, we don't need to hold a leaf locked when we start the transaction, so just release the leaf (path) before starting it. Reported-by: Ritesh Harjani <[email protected]> Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
qzed
pushed a commit
that referenced
this pull request
May 24, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 #45 0x00007f8990968b85 in __getgrnam_r [...] #46 0x00007f89909d6b77 in grantpt [...] #47 0x00007f8990a9394e in __GI_openpty [...] #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-By: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
May 24, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 #45 0x00007f8990968b85 in __getgrnam_r [...] #46 0x00007f89909d6b77 in grantpt [...] #47 0x00007f8990a9394e in __GI_openpty [...] #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-By: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
May 29, 2021
[ Upstream commit d5027ca ] Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... #26 0x0000000060015b5d in sem_init () at ipc/sem.c:268 #27 0x00007f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 #28 0x00007f8990ab8fb2 in call_init (...) at dl-init.c:72 ... #40 0x00007f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... #44 0x00007f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 #45 0x00007f8990968b85 in __getgrnam_r [...] #46 0x00007f89909d6b77 in grantpt [...] #47 0x00007f8990a9394e in __GI_openpty [...] #48 0x00000000604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 #49 0x00000000604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 #50 0x0000000060004a3d in start_uml () at arch/um/kernel/skas/process.c:45 #51 0x00000000600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 #52 0x000000006000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Acked-By: Anton Ivanov <[email protected]> Tested-By: Ritesh Raj Sarraf <[email protected]> Signed-off-by: Richard Weinberger <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jun 3, 2021
…xtent [ Upstream commit 6416954 ] When cloning an inline extent there are a few cases, such as when we have an implicit hole at file offset 0, where we start a transaction while holding a read lock on a leaf. Starting the transaction results in a call to sb_start_intwrite(), which results in doing a read lock on a percpu semaphore. Lockdep doesn't like this and complains about it: [46.580704] ====================================================== [46.580752] WARNING: possible circular locking dependency detected [46.580799] 5.13.0-rc1 #28 Not tainted [46.580832] ------------------------------------------------------ [46.580877] cloner/3835 is trying to acquire lock: [46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0 [46.581167] [46.581167] but task is already holding lock: [46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.581293] [46.581293] which lock already depends on the new lock. [46.581293] [46.581351] [46.581351] the existing dependency chain (in reverse order) is: [46.581410] [46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}: [46.581464] down_read_nested+0x68/0x200 [46.581536] __btrfs_tree_read_lock+0x70/0x1d0 [46.581577] btrfs_read_lock_root_node+0x88/0x200 [46.581623] btrfs_search_slot+0x298/0xb70 [46.581665] btrfs_set_inode_index+0xfc/0x260 [46.581708] btrfs_new_inode+0x26c/0x950 [46.581749] btrfs_create+0xf4/0x2b0 [46.581782] lookup_open.isra.57+0x55c/0x6a0 [46.581855] path_openat+0x418/0xd20 [46.581888] do_filp_open+0x9c/0x130 [46.581920] do_sys_openat2+0x2ec/0x430 [46.581961] do_sys_open+0x90/0xc0 [46.581993] system_call_exception+0x3d4/0x410 [46.582037] system_call_common+0xec/0x278 [46.582078] [46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}: [46.582135] __lock_acquire+0x1e90/0x2c50 [46.582176] lock_acquire+0x2b4/0x5b0 [46.582263] start_transaction+0x3cc/0x950 [46.582308] clone_copy_inline_extent+0xe4/0x5a0 [46.582353] btrfs_clone+0x5fc/0x880 [46.582388] btrfs_clone_files+0xd8/0x1c0 [46.582434] btrfs_remap_file_range+0x3d8/0x590 [46.582481] do_clone_file_range+0x10c/0x270 [46.582558] vfs_clone_file_range+0x1b0/0x310 [46.582605] ioctl_file_clone+0x90/0x130 [46.582651] do_vfs_ioctl+0x874/0x1ac0 [46.582697] sys_ioctl+0x6c/0x120 [46.582733] system_call_exception+0x3d4/0x410 [46.582777] system_call_common+0xec/0x278 [46.582822] [46.582822] other info that might help us debug this: [46.582822] [46.582888] Possible unsafe locking scenario: [46.582888] [46.582942] CPU0 CPU1 [46.582984] ---- ---- [46.583028] lock(btrfs-tree-00); [46.583062] lock(sb_internal#2); [46.583119] lock(btrfs-tree-00); [46.583174] lock(sb_internal#2); [46.583212] [46.583212] *** DEADLOCK *** [46.583212] [46.583266] 6 locks held by cloner/3835: [46.583299] #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130 [46.583382] #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0 [46.583477] #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0 [46.583574] #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590 [46.583657] #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590 [46.583743] #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.583828] [46.583828] stack backtrace: [46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28 [46.583931] Call Trace: [46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable) [46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400 [46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190 [46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50 [46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0 [46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950 [46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0 [46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880 [46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0 [46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590 [46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270 [46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310 [46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130 [46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0 [46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120 [46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410 [46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278 [46.585114] --- interrupt: c00 at 0x7ffff7e22990 [46.585160] NIP: 00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000 [46.585224] REGS: c0000000167c7e80 TRAP: 0c00 Not tainted (5.13.0-rc1) [46.585280] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000244 XER: 00000000 [46.585374] IRQMASK: 0 [46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004 [46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000 [46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000 [46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000 [46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004 [46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f [46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990 [46.585964] LR [00000001000010ec] 0x1000010ec [46.586010] --- interrupt: c00 This should be a false positive, as both locks are acquired in read mode. Nevertheless, we don't need to hold a leaf locked when we start the transaction, so just release the leaf (path) before starting it. Reported-by: Ritesh Harjani <[email protected]> Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jun 3, 2021
…xtent [ Upstream commit 6416954 ] When cloning an inline extent there are a few cases, such as when we have an implicit hole at file offset 0, where we start a transaction while holding a read lock on a leaf. Starting the transaction results in a call to sb_start_intwrite(), which results in doing a read lock on a percpu semaphore. Lockdep doesn't like this and complains about it: [46.580704] ====================================================== [46.580752] WARNING: possible circular locking dependency detected [46.580799] 5.13.0-rc1 #28 Not tainted [46.580832] ------------------------------------------------------ [46.580877] cloner/3835 is trying to acquire lock: [46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0 [46.581167] [46.581167] but task is already holding lock: [46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.581293] [46.581293] which lock already depends on the new lock. [46.581293] [46.581351] [46.581351] the existing dependency chain (in reverse order) is: [46.581410] [46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}: [46.581464] down_read_nested+0x68/0x200 [46.581536] __btrfs_tree_read_lock+0x70/0x1d0 [46.581577] btrfs_read_lock_root_node+0x88/0x200 [46.581623] btrfs_search_slot+0x298/0xb70 [46.581665] btrfs_set_inode_index+0xfc/0x260 [46.581708] btrfs_new_inode+0x26c/0x950 [46.581749] btrfs_create+0xf4/0x2b0 [46.581782] lookup_open.isra.57+0x55c/0x6a0 [46.581855] path_openat+0x418/0xd20 [46.581888] do_filp_open+0x9c/0x130 [46.581920] do_sys_openat2+0x2ec/0x430 [46.581961] do_sys_open+0x90/0xc0 [46.581993] system_call_exception+0x3d4/0x410 [46.582037] system_call_common+0xec/0x278 [46.582078] [46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}: [46.582135] __lock_acquire+0x1e90/0x2c50 [46.582176] lock_acquire+0x2b4/0x5b0 [46.582263] start_transaction+0x3cc/0x950 [46.582308] clone_copy_inline_extent+0xe4/0x5a0 [46.582353] btrfs_clone+0x5fc/0x880 [46.582388] btrfs_clone_files+0xd8/0x1c0 [46.582434] btrfs_remap_file_range+0x3d8/0x590 [46.582481] do_clone_file_range+0x10c/0x270 [46.582558] vfs_clone_file_range+0x1b0/0x310 [46.582605] ioctl_file_clone+0x90/0x130 [46.582651] do_vfs_ioctl+0x874/0x1ac0 [46.582697] sys_ioctl+0x6c/0x120 [46.582733] system_call_exception+0x3d4/0x410 [46.582777] system_call_common+0xec/0x278 [46.582822] [46.582822] other info that might help us debug this: [46.582822] [46.582888] Possible unsafe locking scenario: [46.582888] [46.582942] CPU0 CPU1 [46.582984] ---- ---- [46.583028] lock(btrfs-tree-00); [46.583062] lock(sb_internal#2); [46.583119] lock(btrfs-tree-00); [46.583174] lock(sb_internal#2); [46.583212] [46.583212] *** DEADLOCK *** [46.583212] [46.583266] 6 locks held by cloner/3835: [46.583299] #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130 [46.583382] #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0 [46.583477] #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0 [46.583574] #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590 [46.583657] #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590 [46.583743] #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.583828] [46.583828] stack backtrace: [46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28 [46.583931] Call Trace: [46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable) [46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400 [46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190 [46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50 [46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0 [46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950 [46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0 [46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880 [46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0 [46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590 [46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270 [46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310 [46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130 [46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0 [46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120 [46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410 [46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278 [46.585114] --- interrupt: c00 at 0x7ffff7e22990 [46.585160] NIP: 00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000 [46.585224] REGS: c0000000167c7e80 TRAP: 0c00 Not tainted (5.13.0-rc1) [46.585280] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000244 XER: 00000000 [46.585374] IRQMASK: 0 [46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004 [46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000 [46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000 [46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000 [46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004 [46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f [46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990 [46.585964] LR [00000001000010ec] 0x1000010ec [46.586010] --- interrupt: c00 This should be a false positive, as both locks are acquired in read mode. Nevertheless, we don't need to hold a leaf locked when we start the transaction, so just release the leaf (path) before starting it. Reported-by: Ritesh Harjani <[email protected]> Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/ Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jul 20, 2021
[ Upstream commit b5befe8 ] An srcu_struct structure that is initialized before rcu_init_geometry() will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once rcu_init_geometry() is called, this hierarchy is compressed as needed for the actual maximum number of CPUs for this system. Later on, that srcu_struct structure is confused, sometimes referring to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead to the new num_possible_cpus() hierarchy. For example, each of its ->mynode fields continues to reference the original leaf rcu_node structures, some of which might no longer exist. On the other hand, srcu_for_each_node_breadth_first() traverses to the new node hierarchy. There are at least two bad possible outcomes to this: 1) a) A callback enqueued early on an srcu_data structure (call it *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf (say 3 levels). b) The grace period ends after rcu_init_geometry() shrinks the nodes level to a single one. srcu_gp_end() walks through the new srcu_node hierarchy without ever reaching the old leaves so the callback is never executed. This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests verification never completes and the boot hangs: [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds. [ 5413.147564] Not tainted 5.12.0-rc4+ #28 [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000 [ 5413.168099] Call Trace: [ 5413.170555] __schedule+0x36c/0x930 [ 5413.174057] ? wait_for_completion+0x88/0x110 [ 5413.178423] schedule+0x46/0xf0 [ 5413.181575] schedule_timeout+0x284/0x380 [ 5413.185591] ? wait_for_completion+0x88/0x110 [ 5413.189957] ? mark_held_locks+0x61/0x80 [ 5413.193882] ? mark_held_locks+0x61/0x80 [ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50 [ 5413.202173] ? wait_for_completion+0x88/0x110 [ 5413.206535] wait_for_completion+0xb4/0x110 [ 5413.210724] ? srcu_torture_stats_print+0x110/0x110 [ 5413.215610] srcu_barrier+0x187/0x200 [ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50 [ 5413.224244] ? rdinit_setup+0x2b/0x2b [ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40 [ 5413.232700] do_one_initcall+0x63/0x310 [ 5413.236541] ? rdinit_setup+0x2b/0x2b [ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80 [ 5413.244912] kernel_init_freeable+0x253/0x28f [ 5413.249273] ? rest_init+0x250/0x250 [ 5413.252846] kernel_init+0xa/0x110 [ 5413.256257] ret_from_fork+0x22/0x30 2) An srcu_struct structure that is initialized before rcu_init_geometry() and used afterward will always have stale rdp->mynode references, resulting in callbacks to be missed in srcu_gp_end(), just like in the previous scenario. This commit therefore causes init_srcu_struct_nodes to initialize the geometry, if needed. This ensures that the srcu_node hierarchy is properly built and distributed from the get-go. Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Uladzislau Rezki <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jul 20, 2021
[ Upstream commit b5befe8 ] An srcu_struct structure that is initialized before rcu_init_geometry() will have its srcu_node hierarchy based on CONFIG_NR_CPUS. Once rcu_init_geometry() is called, this hierarchy is compressed as needed for the actual maximum number of CPUs for this system. Later on, that srcu_struct structure is confused, sometimes referring to its initial CONFIG_NR_CPUS-based hierarchy, and sometimes instead to the new num_possible_cpus() hierarchy. For example, each of its ->mynode fields continues to reference the original leaf rcu_node structures, some of which might no longer exist. On the other hand, srcu_for_each_node_breadth_first() traverses to the new node hierarchy. There are at least two bad possible outcomes to this: 1) a) A callback enqueued early on an srcu_data structure (call it *sdp) is recorded pending on sdp->mynode->srcu_data_have_cbs in srcu_funnel_gp_start() with sdp->mynode pointing to a deep leaf (say 3 levels). b) The grace period ends after rcu_init_geometry() shrinks the nodes level to a single one. srcu_gp_end() walks through the new srcu_node hierarchy without ever reaching the old leaves so the callback is never executed. This is easily reproduced on an 8 CPUs machine with CONFIG_NR_CPUS >= 32 and "rcupdate.rcu_self_test=1". The srcu_barrier() after early tests verification never completes and the boot hangs: [ 5413.141029] INFO: task swapper/0:1 blocked for more than 4915 seconds. [ 5413.147564] Not tainted 5.12.0-rc4+ #28 [ 5413.151927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5413.159753] task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00004000 [ 5413.168099] Call Trace: [ 5413.170555] __schedule+0x36c/0x930 [ 5413.174057] ? wait_for_completion+0x88/0x110 [ 5413.178423] schedule+0x46/0xf0 [ 5413.181575] schedule_timeout+0x284/0x380 [ 5413.185591] ? wait_for_completion+0x88/0x110 [ 5413.189957] ? mark_held_locks+0x61/0x80 [ 5413.193882] ? mark_held_locks+0x61/0x80 [ 5413.197809] ? _raw_spin_unlock_irq+0x24/0x50 [ 5413.202173] ? wait_for_completion+0x88/0x110 [ 5413.206535] wait_for_completion+0xb4/0x110 [ 5413.210724] ? srcu_torture_stats_print+0x110/0x110 [ 5413.215610] srcu_barrier+0x187/0x200 [ 5413.219277] ? rcu_tasks_verify_self_tests+0x50/0x50 [ 5413.224244] ? rdinit_setup+0x2b/0x2b [ 5413.227907] rcu_verify_early_boot_tests+0x2d/0x40 [ 5413.232700] do_one_initcall+0x63/0x310 [ 5413.236541] ? rdinit_setup+0x2b/0x2b [ 5413.240207] ? rcu_read_lock_sched_held+0x52/0x80 [ 5413.244912] kernel_init_freeable+0x253/0x28f [ 5413.249273] ? rest_init+0x250/0x250 [ 5413.252846] kernel_init+0xa/0x110 [ 5413.256257] ret_from_fork+0x22/0x30 2) An srcu_struct structure that is initialized before rcu_init_geometry() and used afterward will always have stale rdp->mynode references, resulting in callbacks to be missed in srcu_gp_end(), just like in the previous scenario. This commit therefore causes init_srcu_struct_nodes to initialize the geometry, if needed. This ensures that the srcu_node hierarchy is properly built and distributed from the get-go. Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Uladzislau Rezki <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
kitakar5525
added a commit
to kitakar5525/linux-kernel
that referenced
this pull request
Aug 5, 2021
Somehow it's not working: kern :info : [ 6.613853] ov5693 i2c-INT33BE:00: camera pdata: port: 0 lanes: 2 order: 00000002 kern :info : [ 7.192186] ov5693 i2c-INT33BE:00: register atomisp i2c module type 1 kern :warn : [ 7.192197] ------------[ cut here ]------------ kern :warn : [ 7.192206] WARNING: CPU: 0 PID: 222 at fs/proc/generic.c:160 __xlate_proc_name+0xbd/0x110() kern :warn : [ 7.192208] name 'camera/frontvid' kern :warn : [ 7.192210] Modules linked in: mousedev input_leds hid_generic iTCO_wdt hid_multitouch pmic_whiskey_cove nls_iso8859_1 nls_cp437 vfat fat intel_rapl intel_soc_dts_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr usbhid atomisp_css2401a0_v21(+) videobuf_vmalloc videobuf_core mei_txe mei processor_thermal_device lpc_ich shpchp intel_soc_dts_iosf thermal atomisp_ov5693(+) v4l2_common videodev battery i2c_hid hid media snd_intel_sst_acpi snd_intel_sst_core snd_soc_sst_mfld_platform snd_soc_core snd_compress dw_dmac snd_pcm_dmaengine fjes dw_dmac_core ac97_bus snd_pcm snd_timer snd i2c_designware_platform soundcore i2c_designware_core 8250_dw spi_pxa2xx_platform tpm_crb kern :warn : [ 7.192270] int3400_thermal int3403_thermal acpi_thermal_rel int340x_thermal_zone ac acpi_pad pinctrl_cherryview evdev tpm_tis tpm mac_hid processor sch_fq_codel sg scsi_mod crypto_user fuse ip_tables x_tables ext4 crc16 mbcache jbd2 mmc_block crc32c_intel xhci_pci xhci_hcd usbcore usb_common sdhci_acpi sdhci led_class mmc_core i915 video button i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt kern :warn : [ 7.192311] CPU: 0 PID: 222 Comm: systemd-udevd Not tainted 4.4.275-1-intel-aero44-lts linux-surface#28 kern :warn : [ 7.192313] Hardware name: Xiaomi Inc Mipad2/Mipad, BIOS MIPad-P4.X64.0043.R03.1603071414 03/07/2016 kern :warn : [ 7.192316] 0000000000000286 044a54b37a3a6ee9 ffff88007865f940 ffffffff815c4d3b kern :warn : [ 7.192321] ffff88007865f988 ffffffff8172e35d ffff88007865f978 ffffffff8107b1c6 kern :warn : [ 7.192324] 0000000000000006 ffff880074a40180 ffffffffa04bd3d6 0000000000000000 kern :warn : [ 7.192329] Call Trace: kern :warn : [ 7.192337] [<ffffffff815c4d3b>] dump_stack+0x6d/0x8b kern :warn : [ 7.192343] [<ffffffff8107b1c6>] warn_slowpath_common+0x86/0xd0 kern :warn : [ 7.192348] [<ffffffff8107b26c>] warn_slowpath_fmt+0x5c/0x80 kern :warn : [ 7.192352] [<ffffffff812578dd>] __xlate_proc_name+0xbd/0x110 kern :warn : [ 7.192356] [<ffffffff8125797c>] __proc_create+0x4c/0x270 kern :warn : [ 7.192359] [<ffffffff81257f96>] proc_create_data+0x56/0xd0 kern :warn : [ 7.192367] [<ffffffffa04bc32d>] ov5693_probe+0x90d/0xd80 [atomisp_ov5693] kern :warn : [ 7.192372] [<ffffffffa04bba20>] ? power_up+0x190/0x190 [atomisp_ov5693] kern :warn : [ 7.192377] [<ffffffff81452854>] i2c_device_probe+0x174/0x1f0 kern :warn : [ 7.192382] [<ffffffff81407376>] driver_probe_device+0x206/0x420 kern :warn : [ 7.192386] [<ffffffff8140760e>] __driver_attach+0x7e/0x80 kern :warn : [ 7.192389] [<ffffffff81407590>] ? driver_probe_device+0x420/0x420 kern :warn : [ 7.192392] [<ffffffff81404bea>] bus_for_each_dev+0x8a/0xd0 kern :warn : [ 7.192396] [<ffffffff81406ade>] driver_attach+0x1e/0x20 kern :warn : [ 7.192399] [<ffffffff8140669c>] bus_add_driver+0x1dc/0x240 kern :warn : [ 7.192402] [<ffffffff81407f22>] driver_register+0x92/0xe0 kern :warn : [ 7.192406] [<ffffffff814548a4>] i2c_register_driver+0x44/0xd0 kern :warn : [ 7.192410] [<ffffffffa040c000>] ? 0xffffffffa040c000 kern :warn : [ 7.192415] [<ffffffffa040c017>] ov5693_driver_init+0x17/0x1000 [atomisp_ov5693] kern :warn : [ 7.192419] [<ffffffff810020b6>] do_one_initcall+0xb6/0x1a0 kern :warn : [ 7.192424] [<ffffffff810fcaf2>] do_init_module+0x62/0x220 kern :warn : [ 7.192427] [<ffffffff810ff321>] load_module+0x2631/0x2790 kern :warn : [ 7.192432] [<ffffffff810fb600>] ? ref_module+0x190/0x190 kern :warn : [ 7.192436] [<ffffffff810ff5c6>] SyS_init_module+0x146/0x1a0 kern :warn : [ 7.192442] [<ffffffff815df2a1>] entry_SYSCALL_64_fastpath+0x1e/0x9a kern :warn : [ 7.192445] ---[ end trace ca96e8a34f2d815d ]--- kern :err : [ 7.192449] ov5693 i2c-INT33BE:00: Can't create file /proc/camera/frontvid kern :warn : [ 7.192738] ------------[ cut here ]------------ kern :warn : [ 7.192743] WARNING: CPU: 0 PID: 222 at fs/proc/generic.c:565 remove_proc_entry+0x173/0x1d0() kern :warn : [ 7.192745] name 'camera' kern :warn : [ 7.192747] Modules linked in: mousedev input_leds hid_generic iTCO_wdt hid_multitouch pmic_whiskey_cove nls_iso8859_1 nls_cp437 vfat fat intel_rapl intel_soc_dts_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd pcspkr usbhid atomisp_css2401a0_v21(+) videobuf_vmalloc videobuf_core mei_txe mei processor_thermal_device lpc_ich shpchp intel_soc_dts_iosf thermal atomisp_ov5693(+) v4l2_common videodev battery i2c_hid hid media snd_intel_sst_acpi snd_intel_sst_core snd_soc_sst_mfld_platform snd_soc_core snd_compress dw_dmac snd_pcm_dmaengine fjes dw_dmac_core ac97_bus snd_pcm snd_timer snd i2c_designware_platform soundcore i2c_designware_core 8250_dw spi_pxa2xx_platform tpm_crb kern :warn : [ 7.192792] int3400_thermal int3403_thermal acpi_thermal_rel int340x_thermal_zone ac acpi_pad pinctrl_cherryview evdev tpm_tis tpm mac_hid processor sch_fq_codel sg scsi_mod crypto_user fuse ip_tables x_tables ext4 crc16 mbcache jbd2 mmc_block crc32c_intel xhci_pci xhci_hcd usbcore usb_common sdhci_acpi sdhci led_class mmc_core i915 video button i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm intel_agp intel_gtt kern :warn : [ 7.192825] CPU: 0 PID: 222 Comm: systemd-udevd Tainted: G W 4.4.275-1-intel-aero44-lts linux-surface#28 kern :warn : [ 7.192827] Hardware name: Xiaomi Inc Mipad2/Mipad, BIOS MIPad-P4.X64.0043.R03.1603071414 03/07/2016 kern :warn : [ 7.192829] 0000000000000286 044a54b37a3a6ee9 ffff88007865f9b0 ffffffff815c4d3b kern :warn : [ 7.192833] ffff88007865f9f8 ffffffff8172e35d ffff88007865f9e8 ffffffff8107b1c6 kern :warn : [ 7.192837] 0000000000000006 0000000000000006 0000000000000000 ffffffffa04bd3e6 kern :warn : [ 7.192841] Call Trace: kern :warn : [ 7.192846] [<ffffffff815c4d3b>] dump_stack+0x6d/0x8b kern :warn : [ 7.192850] [<ffffffff8107b1c6>] warn_slowpath_common+0x86/0xd0 kern :warn : [ 7.192854] [<ffffffff8107b26c>] warn_slowpath_fmt+0x5c/0x80 kern :warn : [ 7.192859] [<ffffffff812587a3>] remove_proc_entry+0x173/0x1d0 kern :warn : [ 7.192865] [<ffffffffa04bc3b4>] ov5693_probe+0x994/0xd80 [atomisp_ov5693] kern :warn : [ 7.192870] [<ffffffffa04bba20>] ? power_up+0x190/0x190 [atomisp_ov5693] kern :warn : [ 7.192874] [<ffffffff81452854>] i2c_device_probe+0x174/0x1f0 kern :warn : [ 7.192878] [<ffffffff81407376>] driver_probe_device+0x206/0x420 kern :warn : [ 7.192881] [<ffffffff8140760e>] __driver_attach+0x7e/0x80 kern :warn : [ 7.192884] [<ffffffff81407590>] ? driver_probe_device+0x420/0x420 kern :warn : [ 7.192888] [<ffffffff81404bea>] bus_for_each_dev+0x8a/0xd0 kern :warn : [ 7.192891] [<ffffffff81406ade>] driver_attach+0x1e/0x20 kern :warn : [ 7.192894] [<ffffffff8140669c>] bus_add_driver+0x1dc/0x240 kern :warn : [ 7.192897] [<ffffffff81407f22>] driver_register+0x92/0xe0 kern :warn : [ 7.192901] [<ffffffff814548a4>] i2c_register_driver+0x44/0xd0 kern :warn : [ 7.192904] [<ffffffffa040c000>] ? 0xffffffffa040c000 kern :warn : [ 7.192909] [<ffffffffa040c017>] ov5693_driver_init+0x17/0x1000 [atomisp_ov5693] kern :warn : [ 7.192913] [<ffffffff810020b6>] do_one_initcall+0xb6/0x1a0 kern :warn : [ 7.192917] [<ffffffff810fcaf2>] do_init_module+0x62/0x220 kern :warn : [ 7.192920] [<ffffffff810ff321>] load_module+0x2631/0x2790 kern :warn : [ 7.192924] [<ffffffff810fb600>] ? ref_module+0x190/0x190 kern :warn : [ 7.192928] [<ffffffff810ff5c6>] SyS_init_module+0x146/0x1a0 kern :warn : [ 7.192932] [<ffffffff815df2a1>] entry_SYSCALL_64_fastpath+0x1e/0x9a kern :warn : [ 7.192935] ---[ end trace ca96e8a34f2d815e ]--- kern :err : [ 7.192939] ov5693 i2c-INT33BE:00: ov5693_probe Failed to proc fs kern :warn : [ 7.193242] ov5693: probe of i2c-INT33BE:00 failed with error -1 Signed-off-by: Tsuchiya Yuto <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Nov 20, 2021
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Nov 29, 2021
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jul 4, 2022
kasan detects access beyond the end of the xibm->bitmap allocation: BUG: KASAN: slab-out-of-bounds in _find_first_zero_bit+0x40/0x140 Read of size 8 at addr c00000001d1d0118 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc2-00001-g90df023b36dd #28 Call Trace: [c00000001d98f770] [c0000000012baab8] dump_stack_lvl+0xac/0x108 (unreliable) [c00000001d98f7b0] [c00000000068faac] print_report+0x37c/0x710 [c00000001d98f880] [c0000000006902c0] kasan_report+0x110/0x354 [c00000001d98f950] [c000000000692324] __asan_load8+0xa4/0xe0 [c00000001d98f970] [c0000000011c6ed0] _find_first_zero_bit+0x40/0x140 [c00000001d98f9b0] [c0000000000dbfbc] xive_spapr_get_ipi+0xcc/0x260 [c00000001d98fa70] [c0000000000d6d28] xive_setup_cpu_ipi+0x1e8/0x450 [c00000001d98fb30] [c000000004032a20] pSeries_smp_probe+0x5c/0x118 [c00000001d98fb60] [c000000004018b44] smp_prepare_cpus+0x944/0x9ac [c00000001d98fc90] [c000000004009f9c] kernel_init_freeable+0x2d4/0x640 [c00000001d98fd90] [c0000000000131e8] kernel_init+0x28/0x1d0 [c00000001d98fe10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64 Allocated by task 0: kasan_save_stack+0x34/0x70 __kasan_kmalloc+0xb4/0xf0 __kmalloc+0x268/0x540 xive_spapr_init+0x4d0/0x77c pseries_init_irq+0x40/0x27c init_IRQ+0x44/0x84 start_kernel+0x2a4/0x538 start_here_common+0x1c/0x20 The buggy address belongs to the object at c00000001d1d0118 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes inside of 8-byte region [c00000001d1d0118, c00000001d1d0120) The buggy address belongs to the physical page: page:c00c000000074740 refcount:1 mapcount:0 mapping:0000000000000000 index:0xc00000001d1d0558 pfn:0x1d1d flags: 0x7ffff000000200(slab|node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000200 c00000001d0003c8 c00000001d0003c8 c00000001d010480 raw: c00000001d1d0558 0000000001e1000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: c00000001d1d0000: fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc c00000001d1d0080: fc fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc >c00000001d1d0100: fc fc fc 02 fc fc fc fc fc fc fc fc fc fc fc fc ^ c00000001d1d0180: fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc fc c00000001d1d0200: fc fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc This happens because the allocation uses the wrong unit (bits) when it should pass (BITS_TO_LONGS(count) * sizeof(long)) or equivalent. With small numbers of bits, the allocated object can be smaller than sizeof(long), which results in invalid accesses. Use bitmap_zalloc() to allocate and initialize the irq bitmap, paired with bitmap_free() for consistency. Signed-off-by: Nathan Lynch <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
qzed
pushed a commit
that referenced
this pull request
Aug 4, 2022
[ Upstream commit 19fc5bb ] kasan detects access beyond the end of the xibm->bitmap allocation: BUG: KASAN: slab-out-of-bounds in _find_first_zero_bit+0x40/0x140 Read of size 8 at addr c00000001d1d0118 by task swapper/0/1 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc2-00001-g90df023b36dd #28 Call Trace: [c00000001d98f770] [c0000000012baab8] dump_stack_lvl+0xac/0x108 (unreliable) [c00000001d98f7b0] [c00000000068faac] print_report+0x37c/0x710 [c00000001d98f880] [c0000000006902c0] kasan_report+0x110/0x354 [c00000001d98f950] [c000000000692324] __asan_load8+0xa4/0xe0 [c00000001d98f970] [c0000000011c6ed0] _find_first_zero_bit+0x40/0x140 [c00000001d98f9b0] [c0000000000dbfbc] xive_spapr_get_ipi+0xcc/0x260 [c00000001d98fa70] [c0000000000d6d28] xive_setup_cpu_ipi+0x1e8/0x450 [c00000001d98fb30] [c000000004032a20] pSeries_smp_probe+0x5c/0x118 [c00000001d98fb60] [c000000004018b44] smp_prepare_cpus+0x944/0x9ac [c00000001d98fc90] [c000000004009f9c] kernel_init_freeable+0x2d4/0x640 [c00000001d98fd90] [c0000000000131e8] kernel_init+0x28/0x1d0 [c00000001d98fe10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64 Allocated by task 0: kasan_save_stack+0x34/0x70 __kasan_kmalloc+0xb4/0xf0 __kmalloc+0x268/0x540 xive_spapr_init+0x4d0/0x77c pseries_init_irq+0x40/0x27c init_IRQ+0x44/0x84 start_kernel+0x2a4/0x538 start_here_common+0x1c/0x20 The buggy address belongs to the object at c00000001d1d0118 which belongs to the cache kmalloc-8 of size 8 The buggy address is located 0 bytes inside of 8-byte region [c00000001d1d0118, c00000001d1d0120) The buggy address belongs to the physical page: page:c00c000000074740 refcount:1 mapcount:0 mapping:0000000000000000 index:0xc00000001d1d0558 pfn:0x1d1d flags: 0x7ffff000000200(slab|node=0|zone=0|lastcpupid=0x7ffff) raw: 007ffff000000200 c00000001d0003c8 c00000001d0003c8 c00000001d010480 raw: c00000001d1d0558 0000000001e1000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: c00000001d1d0000: fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc c00000001d1d0080: fc fc 00 fc fc fc fc fc fc fc fc fc fc fc fc fc >c00000001d1d0100: fc fc fc 02 fc fc fc fc fc fc fc fc fc fc fc fc ^ c00000001d1d0180: fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc fc c00000001d1d0200: fc fc fc fc fc 04 fc fc fc fc fc fc fc fc fc fc This happens because the allocation uses the wrong unit (bits) when it should pass (BITS_TO_LONGS(count) * sizeof(long)) or equivalent. With small numbers of bits, the allocated object can be smaller than sizeof(long), which results in invalid accesses. Use bitmap_zalloc() to allocate and initialize the irq bitmap, paired with bitmap_free() for consistency. Signed-off-by: Nathan Lynch <[email protected]> Reviewed-by: Cédric Le Goater <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Aug 4, 2022
commit d0be834 upstream. This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: [email protected] Reported-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Tested-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Aug 4, 2022
This fixes the following trace which is caused by hci_rx_work starting up *after* the final channel reference has been put() during sock_close() but *before* the references to the channel have been destroyed, so instead the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed. refcount_t: increment on 0; use-after-free. BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0 Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705 CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S W 4.14.234-00003-g1fb6d0bd49a4-dirty #28 Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 Google Inc. MSM sm8150 Flame DVT (DT) Workqueue: hci0 hci_rx_work Call trace: dump_backtrace+0x0/0x378 show_stack+0x20/0x2c dump_stack+0x124/0x148 print_address_description+0x80/0x2e8 __kasan_report+0x168/0x188 kasan_report+0x10/0x18 __asan_load4+0x84/0x8c refcount_dec_and_test+0x20/0xd0 l2cap_chan_put+0x48/0x12c l2cap_recv_frame+0x4770/0x6550 l2cap_recv_acldata+0x44c/0x7a4 hci_acldata_packet+0x100/0x188 hci_rx_work+0x178/0x23c process_one_work+0x35c/0x95c worker_thread+0x4cc/0x960 kthread+0x1a8/0x1c4 ret_from_fork+0x10/0x18 Cc: [email protected] Reported-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Tested-by: Lee Jones <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Sep 2, 2022
commit 9c80e79 upstream. The assumption in __disable_kprobe() is wrong, and it could try to disarm an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can easily reproduce this issue. 1. Write 0 to /sys/kernel/debug/kprobes/enabled. # echo 0 > /sys/kernel/debug/kprobes/enabled 2. Run execsnoop. At this time, one kprobe is disabled. # /usr/share/bcc/tools/execsnoop & [1] 2460 PCOMM PID PPID RET ARGS # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes kprobes_all_disarmed to false but does not arm the disabled kprobe. # echo 1 > /sys/kernel/debug/kprobes/enabled # cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE] 4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace(). # fg /usr/share/bcc/tools/execsnoop ^C Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses some cleanups and leaves the aggregated kprobe in the hash table. Then, __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an infinite loop like this. aggregated kprobe.list -> kprobe.list -. ^ | '.__.' In this situation, these commands fall into the infinite loop and result in RCU stall or soft lockup. cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the infinite loop with RCU. /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex, and __get_valid_kprobe() is stuck in the loop. To avoid the issue, make sure we don't call disarm_kprobe() for disabled kprobes. [0] Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2) WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Modules linked in: ena CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28 Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94 RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40 R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000 FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> __disable_kprobe (kernel/kprobes.c:1716) disable_kprobe (kernel/kprobes.c:2392) __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340) disable_trace_kprobe (kernel/trace/trace_kprobe.c:429) perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168) perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295) _free_event (kernel/events/core.c:4971) perf_event_release_kernel (kernel/events/core.c:5176) perf_release (kernel/events/core.c:5186) __fput (fs/file_table.c:321) task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1)) exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201) syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:87) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7fe7ff210654 Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008 RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30 R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560 </TASK> Link: https://lkml.kernel.org/r/[email protected] Fixes: 69d54b9 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Reported-by: Ayushman Dutta <[email protected]> Cc: "Naveen N. Rao" <[email protected]> Cc: Anil S Keshavamurthy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Wang Nan <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Cc: Ayushman Dutta <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Nov 3, 2022
[ Upstream commit 16bbdfe ] Syzkaller produced the below call trace: BUG: KASAN: null-ptr-deref in io_msg_ring+0x3cb/0x9f0 Write of size 8 at addr 0000000000000070 by task repro/16399 CPU: 0 PID: 16399 Comm: repro Not tainted 6.1.0-rc1 #28 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 Call Trace: <TASK> dump_stack_lvl+0xcd/0x134 ? io_msg_ring+0x3cb/0x9f0 kasan_report+0xbc/0xf0 ? io_msg_ring+0x3cb/0x9f0 kasan_check_range+0x140/0x190 io_msg_ring+0x3cb/0x9f0 ? io_msg_ring_prep+0x300/0x300 io_issue_sqe+0x698/0xca0 io_submit_sqes+0x92f/0x1c30 __do_sys_io_uring_enter+0xae4/0x24b0 .... RIP: 0033:0x7f2eaf8f8289 RSP: 002b:00007fff40939718 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2eaf8f8289 RDX: 0000000000000000 RSI: 0000000000006f71 RDI: 0000000000000004 RBP: 00007fff409397a0 R08: 0000000000000000 R09: 0000000000000039 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004006d0 R13: 00007fff40939880 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: panic_on_warn set ... We don't have a NULL check on file_ptr in io_msg_send_fd() function, so when file_ptr is NUL src_file is also NULL and get_file() dereferences a NULL pointer and leads to above crash. Add a NULL check to fix this issue. Fixes: e6130eb ("io_uring: add support for passing fixed file descriptors") Reported-by: syzkaller <[email protected]> Signed-off-by: Harshit Mogalapalli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jan 5, 2023
[ Upstream commit c1ca8ef ] This adds a sanity check for the i_op pointer of the inode which is returned after reading Root directory MFT record. We should check the i_op is valid before trying to create the root dentry, otherwise we may encounter a NPD while mounting a image with a funny Root directory MFT record. [ 114.484325] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 114.484811] #PF: supervisor read access in kernel mode [ 114.485084] #PF: error_code(0x0000) - not-present page [ 114.485606] PGD 0 P4D 0 [ 114.485975] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 114.486570] CPU: 0 PID: 237 Comm: mount Tainted: G B 6.0.0-rc4 #28 [ 114.486977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 114.488169] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.488816] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.490326] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.490695] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.490986] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.491364] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.491675] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.491954] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.492397] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.492797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.493150] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0 [ 114.493671] Call Trace: [ 114.493890] <TASK> [ 114.494075] __d_instantiate+0x24/0x1c0 [ 114.494505] d_instantiate.part.0+0x35/0x50 [ 114.494754] d_make_root+0x53/0x80 [ 114.494998] ntfs_fill_super+0x1232/0x1b50 [ 114.495260] ? put_ntfs+0x1d0/0x1d0 [ 114.495499] ? vsprintf+0x20/0x20 [ 114.495723] ? set_blocksize+0x95/0x150 [ 114.495964] get_tree_bdev+0x232/0x370 [ 114.496272] ? put_ntfs+0x1d0/0x1d0 [ 114.496502] ntfs_fs_get_tree+0x15/0x20 [ 114.496859] vfs_get_tree+0x4c/0x130 [ 114.497099] path_mount+0x654/0xfe0 [ 114.497507] ? putname+0x80/0xa0 [ 114.497933] ? finish_automount+0x2e0/0x2e0 [ 114.498362] ? putname+0x80/0xa0 [ 114.498571] ? kmem_cache_free+0x1c4/0x440 [ 114.498819] ? putname+0x80/0xa0 [ 114.499069] do_mount+0xd6/0xf0 [ 114.499343] ? path_mount+0xfe0/0xfe0 [ 114.499683] ? __kasan_check_write+0x14/0x20 [ 114.500133] __x64_sys_mount+0xca/0x110 [ 114.500592] do_syscall_64+0x3b/0x90 [ 114.500930] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 114.501294] RIP: 0033:0x7fdc898e948a [ 114.501542] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 114.502716] RSP: 002b:00007ffd793e58f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 114.503175] RAX: ffffffffffffffda RBX: 0000564b2228f060 RCX: 00007fdc898e948a [ 114.503588] RDX: 0000564b2228f260 RSI: 0000564b2228f2e0 RDI: 0000564b22297ce0 [ 114.504925] RBP: 0000000000000000 R08: 0000564b2228f280 R09: 0000000000000020 [ 114.505484] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564b22297ce0 [ 114.505823] R13: 0000564b2228f260 R14: 0000000000000000 R15: 00000000ffffffff [ 114.506562] </TASK> [ 114.506887] Modules linked in: [ 114.507648] CR2: 0000000000000008 [ 114.508884] ---[ end trace 0000000000000000 ]--- [ 114.509675] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.510140] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.511762] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.512401] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.513103] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.513512] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.513831] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.514757] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.515411] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.515794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.516208] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0 Signed-off-by: Edward Lo <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jan 5, 2023
[ Upstream commit 4f1dc7d ] Although the attribute name length is checked before comparing it to some common names (e.g., $I30), the offset isn't. This adds a sanity check for the attribute name offset, guarantee the validity and prevent possible out-of-bound memory accesses. [ 191.720056] BUG: unable to handle page fault for address: ffffebde00000008 [ 191.721060] #PF: supervisor read access in kernel mode [ 191.721586] #PF: error_code(0x0000) - not-present page [ 191.722079] PGD 0 P4D 0 [ 191.722571] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 191.723179] CPU: 0 PID: 244 Comm: mount Not tainted 6.0.0-rc4 #28 [ 191.723749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 191.724832] RIP: 0010:kfree+0x56/0x3b0 [ 191.725870] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.727375] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.727897] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.728531] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.729183] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.729628] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.730158] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.730645] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.731328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.731667] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 [ 191.732568] Call Trace: [ 191.733231] <TASK> [ 191.733860] kvfree+0x2c/0x40 [ 191.734632] ni_clear+0x180/0x290 [ 191.735085] ntfs_evict_inode+0x45/0x70 [ 191.735495] evict+0x199/0x280 [ 191.735996] iput.part.0+0x286/0x320 [ 191.736438] iput+0x32/0x50 [ 191.736811] iget_failed+0x23/0x30 [ 191.737270] ntfs_iget5+0x337/0x1890 [ 191.737629] ? ntfs_clear_mft_tail+0x20/0x260 [ 191.738201] ? ntfs_get_block_bmap+0x70/0x70 [ 191.738482] ? ntfs_objid_init+0xf6/0x140 [ 191.738779] ? ntfs_reparse_init+0x140/0x140 [ 191.739266] ntfs_fill_super+0x121b/0x1b50 [ 191.739623] ? put_ntfs+0x1d0/0x1d0 [ 191.739984] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 191.740466] ? put_ntfs+0x1d0/0x1d0 [ 191.740787] ? sb_set_blocksize+0x6a/0x80 [ 191.741272] get_tree_bdev+0x232/0x370 [ 191.741829] ? put_ntfs+0x1d0/0x1d0 [ 191.742669] ntfs_fs_get_tree+0x15/0x20 [ 191.743132] vfs_get_tree+0x4c/0x130 [ 191.743457] path_mount+0x654/0xfe0 [ 191.743938] ? putname+0x80/0xa0 [ 191.744271] ? finish_automount+0x2e0/0x2e0 [ 191.744582] ? putname+0x80/0xa0 [ 191.745053] ? kmem_cache_free+0x1c4/0x440 [ 191.745403] ? putname+0x80/0xa0 [ 191.745616] do_mount+0xd6/0xf0 [ 191.745887] ? path_mount+0xfe0/0xfe0 [ 191.746287] ? __kasan_check_write+0x14/0x20 [ 191.746582] __x64_sys_mount+0xca/0x110 [ 191.746850] do_syscall_64+0x3b/0x90 [ 191.747122] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 191.747517] RIP: 0033:0x7f351fee948a [ 191.748332] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 191.749341] RSP: 002b:00007ffd51cf3af8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 191.749960] RAX: ffffffffffffffda RBX: 000055b903733060 RCX: 00007f351fee948a [ 191.750589] RDX: 000055b903733260 RSI: 000055b9037332e0 RDI: 000055b90373bce0 [ 191.751115] RBP: 0000000000000000 R08: 000055b903733280 R09: 0000000000000020 [ 191.751537] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b90373bce0 [ 191.751946] R13: 000055b903733260 R14: 0000000000000000 R15: 00000000ffffffff [ 191.752519] </TASK> [ 191.752782] Modules linked in: [ 191.753785] CR2: ffffebde00000008 [ 191.754937] ---[ end trace 0000000000000000 ]--- [ 191.755429] RIP: 0010:kfree+0x56/0x3b0 [ 191.755725] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.756744] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.757218] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.757580] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.758016] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.758570] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.758957] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.759317] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.759711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.760118] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 Signed-off-by: Edward Lo <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jan 5, 2023
[ Upstream commit c1ca8ef ] This adds a sanity check for the i_op pointer of the inode which is returned after reading Root directory MFT record. We should check the i_op is valid before trying to create the root dentry, otherwise we may encounter a NPD while mounting a image with a funny Root directory MFT record. [ 114.484325] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 114.484811] #PF: supervisor read access in kernel mode [ 114.485084] #PF: error_code(0x0000) - not-present page [ 114.485606] PGD 0 P4D 0 [ 114.485975] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 114.486570] CPU: 0 PID: 237 Comm: mount Tainted: G B 6.0.0-rc4 #28 [ 114.486977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 114.488169] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.488816] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.490326] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.490695] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.490986] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.491364] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.491675] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.491954] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.492397] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.492797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.493150] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0 [ 114.493671] Call Trace: [ 114.493890] <TASK> [ 114.494075] __d_instantiate+0x24/0x1c0 [ 114.494505] d_instantiate.part.0+0x35/0x50 [ 114.494754] d_make_root+0x53/0x80 [ 114.494998] ntfs_fill_super+0x1232/0x1b50 [ 114.495260] ? put_ntfs+0x1d0/0x1d0 [ 114.495499] ? vsprintf+0x20/0x20 [ 114.495723] ? set_blocksize+0x95/0x150 [ 114.495964] get_tree_bdev+0x232/0x370 [ 114.496272] ? put_ntfs+0x1d0/0x1d0 [ 114.496502] ntfs_fs_get_tree+0x15/0x20 [ 114.496859] vfs_get_tree+0x4c/0x130 [ 114.497099] path_mount+0x654/0xfe0 [ 114.497507] ? putname+0x80/0xa0 [ 114.497933] ? finish_automount+0x2e0/0x2e0 [ 114.498362] ? putname+0x80/0xa0 [ 114.498571] ? kmem_cache_free+0x1c4/0x440 [ 114.498819] ? putname+0x80/0xa0 [ 114.499069] do_mount+0xd6/0xf0 [ 114.499343] ? path_mount+0xfe0/0xfe0 [ 114.499683] ? __kasan_check_write+0x14/0x20 [ 114.500133] __x64_sys_mount+0xca/0x110 [ 114.500592] do_syscall_64+0x3b/0x90 [ 114.500930] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 114.501294] RIP: 0033:0x7fdc898e948a [ 114.501542] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 114.502716] RSP: 002b:00007ffd793e58f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 114.503175] RAX: ffffffffffffffda RBX: 0000564b2228f060 RCX: 00007fdc898e948a [ 114.503588] RDX: 0000564b2228f260 RSI: 0000564b2228f2e0 RDI: 0000564b22297ce0 [ 114.504925] RBP: 0000000000000000 R08: 0000564b2228f280 R09: 0000000000000020 [ 114.505484] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564b22297ce0 [ 114.505823] R13: 0000564b2228f260 R14: 0000000000000000 R15: 00000000ffffffff [ 114.506562] </TASK> [ 114.506887] Modules linked in: [ 114.507648] CR2: 0000000000000008 [ 114.508884] ---[ end trace 0000000000000000 ]--- [ 114.509675] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.510140] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.511762] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.512401] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.513103] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.513512] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.513831] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.514757] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.515411] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.515794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.516208] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0 Signed-off-by: Edward Lo <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jan 5, 2023
[ Upstream commit 4f1dc7d ] Although the attribute name length is checked before comparing it to some common names (e.g., $I30), the offset isn't. This adds a sanity check for the attribute name offset, guarantee the validity and prevent possible out-of-bound memory accesses. [ 191.720056] BUG: unable to handle page fault for address: ffffebde00000008 [ 191.721060] #PF: supervisor read access in kernel mode [ 191.721586] #PF: error_code(0x0000) - not-present page [ 191.722079] PGD 0 P4D 0 [ 191.722571] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 191.723179] CPU: 0 PID: 244 Comm: mount Not tainted 6.0.0-rc4 #28 [ 191.723749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 191.724832] RIP: 0010:kfree+0x56/0x3b0 [ 191.725870] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.727375] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.727897] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.728531] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.729183] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.729628] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.730158] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.730645] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.731328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.731667] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 [ 191.732568] Call Trace: [ 191.733231] <TASK> [ 191.733860] kvfree+0x2c/0x40 [ 191.734632] ni_clear+0x180/0x290 [ 191.735085] ntfs_evict_inode+0x45/0x70 [ 191.735495] evict+0x199/0x280 [ 191.735996] iput.part.0+0x286/0x320 [ 191.736438] iput+0x32/0x50 [ 191.736811] iget_failed+0x23/0x30 [ 191.737270] ntfs_iget5+0x337/0x1890 [ 191.737629] ? ntfs_clear_mft_tail+0x20/0x260 [ 191.738201] ? ntfs_get_block_bmap+0x70/0x70 [ 191.738482] ? ntfs_objid_init+0xf6/0x140 [ 191.738779] ? ntfs_reparse_init+0x140/0x140 [ 191.739266] ntfs_fill_super+0x121b/0x1b50 [ 191.739623] ? put_ntfs+0x1d0/0x1d0 [ 191.739984] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 191.740466] ? put_ntfs+0x1d0/0x1d0 [ 191.740787] ? sb_set_blocksize+0x6a/0x80 [ 191.741272] get_tree_bdev+0x232/0x370 [ 191.741829] ? put_ntfs+0x1d0/0x1d0 [ 191.742669] ntfs_fs_get_tree+0x15/0x20 [ 191.743132] vfs_get_tree+0x4c/0x130 [ 191.743457] path_mount+0x654/0xfe0 [ 191.743938] ? putname+0x80/0xa0 [ 191.744271] ? finish_automount+0x2e0/0x2e0 [ 191.744582] ? putname+0x80/0xa0 [ 191.745053] ? kmem_cache_free+0x1c4/0x440 [ 191.745403] ? putname+0x80/0xa0 [ 191.745616] do_mount+0xd6/0xf0 [ 191.745887] ? path_mount+0xfe0/0xfe0 [ 191.746287] ? __kasan_check_write+0x14/0x20 [ 191.746582] __x64_sys_mount+0xca/0x110 [ 191.746850] do_syscall_64+0x3b/0x90 [ 191.747122] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 191.747517] RIP: 0033:0x7f351fee948a [ 191.748332] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 191.749341] RSP: 002b:00007ffd51cf3af8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 191.749960] RAX: ffffffffffffffda RBX: 000055b903733060 RCX: 00007f351fee948a [ 191.750589] RDX: 000055b903733260 RSI: 000055b9037332e0 RDI: 000055b90373bce0 [ 191.751115] RBP: 0000000000000000 R08: 000055b903733280 R09: 0000000000000020 [ 191.751537] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b90373bce0 [ 191.751946] R13: 000055b903733260 R14: 0000000000000000 R15: 00000000ffffffff [ 191.752519] </TASK> [ 191.752782] Modules linked in: [ 191.753785] CR2: ffffebde00000008 [ 191.754937] ---[ end trace 0000000000000000 ]--- [ 191.755429] RIP: 0010:kfree+0x56/0x3b0 [ 191.755725] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.756744] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.757218] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.757580] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.758016] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.758570] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.758957] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.759317] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.759711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.760118] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 Signed-off-by: Edward Lo <[email protected]> Signed-off-by: Konstantin Komarov <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Jan 29, 2023
commit 272970b upstream. The driver shutdown callback (which sends EDL_SOC_RESET to the device over serdev) should not be invoked when HCI device is not open (e.g. if hci_dev_open_sync() failed), because the serdev and its TTY are not open either. Also skip this step if device is powered off (qca_power_shutdown()). The shutdown callback causes use-after-free during system reboot with Qualcomm Atheros Bluetooth: Unable to handle kernel paging request at virtual address 0072662f67726fd7 ... CPU: 6 PID: 1 Comm: systemd-shutdow Tainted: G W 6.1.0-rt5-00325-g8a5f56bcfcca #8 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: tty_driver_flush_buffer+0x4/0x30 serdev_device_write_flush+0x24/0x34 qca_serdev_shutdown+0x80/0x130 [hci_uart] device_shutdown+0x15c/0x260 kernel_restart+0x48/0xac KASAN report: BUG: KASAN: use-after-free in tty_driver_flush_buffer+0x1c/0x50 Read of size 8 at addr ffff16270c2e0018 by task systemd-shutdow/1 CPU: 7 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-next-20221220-00014-gb85aaf97fb01-dirty #28 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: dump_backtrace.part.0+0xdc/0xf0 show_stack+0x18/0x30 dump_stack_lvl+0x68/0x84 print_report+0x188/0x488 kasan_report+0xa4/0xf0 __asan_load8+0x80/0xac tty_driver_flush_buffer+0x1c/0x50 ttyport_write_flush+0x34/0x44 serdev_device_write_flush+0x48/0x60 qca_serdev_shutdown+0x124/0x274 device_shutdown+0x1e8/0x350 kernel_restart+0x48/0xb0 __do_sys_reboot+0x244/0x2d0 __arm64_sys_reboot+0x54/0x70 invoke_syscall+0x60/0x190 el0_svc_common.constprop.0+0x7c/0x160 do_el0_svc+0x44/0xf0 el0_svc+0x2c/0x6c el0t_64_sync_handler+0xbc/0x140 el0t_64_sync+0x190/0x194 Fixes: 7e7bbdd ("Bluetooth: hci_qca: Fix qca6390 enable failure after warm reboot") Cc: <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Feb 1, 2023
The driver shutdown callback (which sends EDL_SOC_RESET to the device over serdev) should not be invoked when HCI device is not open (e.g. if hci_dev_open_sync() failed), because the serdev and its TTY are not open either. Also skip this step if device is powered off (qca_power_shutdown()). The shutdown callback causes use-after-free during system reboot with Qualcomm Atheros Bluetooth: Unable to handle kernel paging request at virtual address 0072662f67726fd7 ... CPU: 6 PID: 1 Comm: systemd-shutdow Tainted: G W 6.1.0-rt5-00325-g8a5f56bcfcca #8 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: tty_driver_flush_buffer+0x4/0x30 serdev_device_write_flush+0x24/0x34 qca_serdev_shutdown+0x80/0x130 [hci_uart] device_shutdown+0x15c/0x260 kernel_restart+0x48/0xac KASAN report: BUG: KASAN: use-after-free in tty_driver_flush_buffer+0x1c/0x50 Read of size 8 at addr ffff16270c2e0018 by task systemd-shutdow/1 CPU: 7 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-next-20221220-00014-gb85aaf97fb01-dirty #28 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: dump_backtrace.part.0+0xdc/0xf0 show_stack+0x18/0x30 dump_stack_lvl+0x68/0x84 print_report+0x188/0x488 kasan_report+0xa4/0xf0 __asan_load8+0x80/0xac tty_driver_flush_buffer+0x1c/0x50 ttyport_write_flush+0x34/0x44 serdev_device_write_flush+0x48/0x60 qca_serdev_shutdown+0x124/0x274 device_shutdown+0x1e8/0x350 kernel_restart+0x48/0xb0 __do_sys_reboot+0x244/0x2d0 __arm64_sys_reboot+0x54/0x70 invoke_syscall+0x60/0x190 el0_svc_common.constprop.0+0x7c/0x160 do_el0_svc+0x44/0xf0 el0_svc+0x2c/0x6c el0t_64_sync_handler+0xbc/0x140 el0t_64_sync+0x190/0x194 Fixes: 7e7bbdd ("Bluetooth: hci_qca: Fix qca6390 enable failure after warm reboot") Cc: <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Apr 20, 2023
[ Upstream commit e8b7445 ] For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b42961 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <[email protected]> Co-developed-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
Apr 20, 2023
[ Upstream commit e8b7445 ] For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b42961 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <[email protected]> Co-developed-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Sasha Levin <[email protected]>
qzed
pushed a commit
that referenced
this pull request
May 3, 2023
For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options. Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma: [ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]--- [ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]--- Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry. The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure. The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used. Fixes: 7b42961 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin <[email protected]> Co-developed-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Lars-Peter Clausen <[email protected]> Acked-by: Nicolas Ferre <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As a request from a person who is also suffering from "OEMB" issue on IRC channel.
Please port this into also 4.19 and 5.4 kernels when approved this PR, should apply with no conflicts.
By the way, the first patch (ASoC) is from Android-x86 project. The second patch is made by me referring to the first patch (they don't have a similar patch for surface3-wmi, so I made it).
Patch set for Surface 3 which is suffering from the "OEMB" problem.
On the affected systems, DMI table is broken and breaks
DMI matching used for quirk.
This patch will add the (broken) DMI info for the affected Surface 3.
Fixes broken Sound and surface3-wmi drivers on the affected systems.
On affected systems, dmidecode will look like this:
Expected: