rm redundant empty line #49

arjun024 · 2013-10-02T08:23:57Z

a completely trivial patch to remove a completely redundant blank line.

arjun024 · 2013-10-02T17:33:48Z

Ignore

Quit from splice_write if pipe->nrbufs is 0 for avoiding oops in virtio-serial. When an application was doing splice from a kernel buffer to virtio-serial on a guest, the application received signal(SIGINT). This situation will normally happen, but the kernel executed a kernel panic by oops as follows: BUG: unable to handle kernel paging request at ffff882071c8ef28 IP: [<ffffffff812de48f>] sg_init_table+0x2f/0x50 PGD 1fac067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd microcode virtio_balloon virtio_net pcspkr soundcore i2c_piix4 i2c_core uinput floppy CPU: 1 PID: 908 Comm: trace-cmd Not tainted 3.10.0+ torvalds#49 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880071c64650 ti: ffff88007bf24000 task.ti: ffff88007bf24000 RIP: 0010:[<ffffffff812de48f>] [<ffffffff812de48f>] sg_init_table+0x2f/0x50 RSP: 0018:ffff88007bf25dd8 EFLAGS: 00010286 RAX: 0000001fffffffe0 RBX: ffff882071c8ef28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880071c8ef48 RBP: ffff88007bf25de8 R08: ffff88007fd15d40 R09: ffff880071c8ef48 R10: ffffea0001c71040 R11: ffffffff8139c555 R12: 0000000000000000 R13: ffff88007506a3c0 R14: ffff88007c862500 R15: ffff880071c8ef00 FS: 00007f0a3646c740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff882071c8ef28 CR3: 000000007acbb000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880071c8ef48 ffff88007bf25e20 ffff88007bf25e88 ffffffff8139d6fa ffff88007bf25e28 ffffffff8127a3f4 0000000000000000 0000000000000000 ffff880071c8ef48 0000100000000000 0000000000000003 ffff88007bf25e08 Call Trace: [<ffffffff8139d6fa>] port_fops_splice_write+0xaa/0x130 [<ffffffff8127a3f4>] ? selinux_file_permission+0xc4/0x120 [<ffffffff8139d650>] ? wait_port_writable+0x1b0/0x1b0 [<ffffffff811a6fe0>] do_splice_from+0xa0/0x110 [<ffffffff811a951f>] SyS_splice+0x5ff/0x6b0 [<ffffffff8161f8c2>] system_call_fastpath+0x16/0x1b Code: c1 e2 05 48 89 e5 48 83 ec 10 4c 89 65 f8 41 89 f4 31 f6 48 89 5d f0 48 89 fb e8 8d ce ff ff 41 8d 44 24 ff 48 c1 e0 05 48 01 c3 <48> 8b 03 48 83 e0 fe 48 83 c8 02 48 89 03 48 8b 5d f0 4c 8b 65 RIP [<ffffffff812de48f>] sg_init_table+0x2f/0x50 RSP <ffff88007bf25dd8> CR2: ffff882071c8ef28 ---[ end trace 86323505eb42ea8f ]--- It seems to induce pagefault in sg_init_tabel() when pipe->nrbufs is equal to zero. This may happen in a following situation: (1) The application normally does splice(read) from a kernel buffer, then does splice(write) to virtio-serial. (2) The application receives SIGINT when is doing splice(read), so splice(read) is failed by EINTR. However, the application does not finish the operation. (3) The application tries to do splice(write) without pipe->nrbufs. (4) The virtio-console driver tries to touch scatterlist structure sgl in sg_init_table(), but the region is out of bound. To avoid the case, a kernel should check whether pipe->nrbufs is empty or not when splice_write is executed in the virtio-console driver. V3: Add Reviewed-by lines and stable@ line in sign-off area. Signed-off-by: Yoshihiro YUNOMAE <[email protected]> Reviewed-by: Amit Shah <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Amit Shah <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Signed-off-by: Rusty Russell <[email protected]>

As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.644008] smpboot: Booting Node 1, Processors: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 OK [ 1.245006] smpboot: Booting Node 2, Processors: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 OK [ 1.864005] smpboot: Booting Node 3, Processors: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 OK [ 2.489005] smpboot: Booting Node 4, Processors: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 OK [ 3.093005] smpboot: Booting Node 5, Processors: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 OK [ 3.698005] smpboot: Booting Node 6, Processors: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 OK [ 4.304005] smpboot: Booting Node 7, Processors: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 torvalds#6 torvalds#7 [ 0.603005] .... node #1, CPUs: torvalds#8 torvalds#9 torvalds#10 torvalds#11 torvalds#12 torvalds#13 torvalds#14 torvalds#15 [ 1.200005] .... node #2, CPUs: torvalds#16 torvalds#17 torvalds#18 torvalds#19 torvalds#20 torvalds#21 torvalds#22 torvalds#23 [ 1.796005] .... node #3, CPUs: torvalds#24 torvalds#25 torvalds#26 torvalds#27 torvalds#28 torvalds#29 torvalds#30 torvalds#31 [ 2.393005] .... node #4, CPUs: torvalds#32 torvalds#33 torvalds#34 torvalds#35 torvalds#36 torvalds#37 torvalds#38 torvalds#39 [ 2.996005] .... node #5, CPUs: torvalds#40 torvalds#41 torvalds#42 torvalds#43 torvalds#44 torvalds#45 torvalds#46 torvalds#47 [ 3.600005] .... node torvalds#6, CPUs: torvalds#48 torvalds#49 torvalds#50 torvalds#51 #52 #53 torvalds#54 torvalds#55 [ 4.202005] .... node torvalds#7, CPUs: torvalds#56 torvalds#57 #58 torvalds#59 torvalds#60 torvalds#61 torvalds#62 torvalds#63 [ 4.811005] .... node torvalds#8, CPUs: torvalds#64 torvalds#65 torvalds#66 torvalds#67 torvalds#68 torvalds#69 #70 torvalds#71 [ 5.421006] .... node torvalds#9, CPUs: torvalds#72 torvalds#73 torvalds#74 torvalds#75 torvalds#76 torvalds#77 torvalds#78 torvalds#79 [ 6.032005] .... node torvalds#10, CPUs: torvalds#80 torvalds#81 torvalds#82 torvalds#83 torvalds#84 torvalds#85 torvalds#86 torvalds#87 [ 6.648006] .... node torvalds#11, CPUs: torvalds#88 torvalds#89 torvalds#90 torvalds#91 torvalds#92 torvalds#93 torvalds#94 torvalds#95 [ 7.262005] .... node torvalds#12, CPUs: torvalds#96 torvalds#97 torvalds#98 torvalds#99 torvalds#100 torvalds#101 torvalds#102 torvalds#103 [ 7.865005] .... node torvalds#13, CPUs: torvalds#104 torvalds#105 torvalds#106 torvalds#107 torvalds#108 torvalds#109 torvalds#110 torvalds#111 [ 8.466005] .... node torvalds#14, CPUs: torvalds#112 torvalds#113 torvalds#114 torvalds#115 torvalds#116 torvalds#117 torvalds#118 torvalds#119 [ 9.073006] .... node torvalds#15, CPUs: torvalds#120 torvalds#121 torvalds#122 torvalds#123 torvalds#124 torvalds#125 torvalds#126 torvalds#127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora has converted to WARNING) in shmem_getpage_gfp(): WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ torvalds#49 Call Trace: warn_slowpath_common+0x7f/0xc0 warn_slowpath_null+0x1a/0x20 shmem_getpage_gfp+0xa5c/0xa70 shmem_fault+0x4f/0xa0 __do_fault+0x71/0x5c0 handle_pte_fault+0x97/0xae0 handle_mm_fault+0x289/0x350 __do_page_fault+0x18e/0x530 do_page_fault+0x2b/0x50 page_fault+0x28/0x30 tracesys+0xe1/0xe6 Thanks to Johannes for pointing to truncation: free_swap_and_cache() only does a trylock on the page, so the page lock we've held since before confirming swap is not enough to protect against truncation. What cleanup is needed in this case? Just delete_from_swap_cache(), which takes care of the memcg uncharge. Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Dave Jones <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

commit 68c034f upstream. Quit from splice_write if pipe->nrbufs is 0 for avoiding oops in virtio-serial. When an application was doing splice from a kernel buffer to virtio-serial on a guest, the application received signal(SIGINT). This situation will normally happen, but the kernel executed a kernel panic by oops as follows: BUG: unable to handle kernel paging request at ffff882071c8ef28 IP: [<ffffffff812de48f>] sg_init_table+0x2f/0x50 PGD 1fac067 PUD 0 Oops: 0000 [#1] SMP Modules linked in: lockd sunrpc bnep bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd microcode virtio_balloon virtio_net pcspkr soundcore i2c_piix4 i2c_core uinput floppy CPU: 1 PID: 908 Comm: trace-cmd Not tainted 3.10.0+ #49 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 task: ffff880071c64650 ti: ffff88007bf24000 task.ti: ffff88007bf24000 RIP: 0010:[<ffffffff812de48f>] [<ffffffff812de48f>] sg_init_table+0x2f/0x50 RSP: 0018:ffff88007bf25dd8 EFLAGS: 00010286 RAX: 0000001fffffffe0 RBX: ffff882071c8ef28 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880071c8ef48 RBP: ffff88007bf25de8 R08: ffff88007fd15d40 R09: ffff880071c8ef48 R10: ffffea0001c71040 R11: ffffffff8139c555 R12: 0000000000000000 R13: ffff88007506a3c0 R14: ffff88007c862500 R15: ffff880071c8ef00 FS: 00007f0a3646c740(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff882071c8ef28 CR3: 000000007acbb000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff880071c8ef48 ffff88007bf25e20 ffff88007bf25e88 ffffffff8139d6fa ffff88007bf25e28 ffffffff8127a3f4 0000000000000000 0000000000000000 ffff880071c8ef48 0000100000000000 0000000000000003 ffff88007bf25e08 Call Trace: [<ffffffff8139d6fa>] port_fops_splice_write+0xaa/0x130 [<ffffffff8127a3f4>] ? selinux_file_permission+0xc4/0x120 [<ffffffff8139d650>] ? wait_port_writable+0x1b0/0x1b0 [<ffffffff811a6fe0>] do_splice_from+0xa0/0x110 [<ffffffff811a951f>] SyS_splice+0x5ff/0x6b0 [<ffffffff8161f8c2>] system_call_fastpath+0x16/0x1b Code: c1 e2 05 48 89 e5 48 83 ec 10 4c 89 65 f8 41 89 f4 31 f6 48 89 5d f0 48 89 fb e8 8d ce ff ff 41 8d 44 24 ff 48 c1 e0 05 48 01 c3 <48> 8b 03 48 83 e0 fe 48 83 c8 02 48 89 03 48 8b 5d f0 4c 8b 65 RIP [<ffffffff812de48f>] sg_init_table+0x2f/0x50 RSP <ffff88007bf25dd8> CR2: ffff882071c8ef28 ---[ end trace 86323505eb42ea8f ]--- It seems to induce pagefault in sg_init_tabel() when pipe->nrbufs is equal to zero. This may happen in a following situation: (1) The application normally does splice(read) from a kernel buffer, then does splice(write) to virtio-serial. (2) The application receives SIGINT when is doing splice(read), so splice(read) is failed by EINTR. However, the application does not finish the operation. (3) The application tries to do splice(write) without pipe->nrbufs. (4) The virtio-console driver tries to touch scatterlist structure sgl in sg_init_table(), but the region is out of bound. To avoid the case, a kernel should check whether pipe->nrbufs is empty or not when splice_write is executed in the virtio-console driver. V3: Add Reviewed-by lines and stable@ line in sign-off area. Signed-off-by: Yoshihiro YUNOMAE <[email protected]> Reviewed-by: Amit Shah <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Amit Shah <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Rusty Russell <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]>

It's no good setting vga_base after the VGA console has been initialised, because if we do that we get this: Unable to handle kernel paging request at virtual address 000b8000 pgd = c0004000 [000b8000] *pgd=07ffc831, *pte=00000000, *ppte=00000000 0Internal error: Oops: 5017 [#1] ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ #49 task: c03e2974 ti: c03d8000 task.ti: c03d8000 PC is at vgacon_startup+0x258/0x39c LR is at request_resource+0x10/0x1c pc : [<c01725d0>] lr : [<c0022b50>] psr: 60000053 sp : c03d9f68 ip : 000b8000 fp : c03d9f8c r10: 000055aa r9 : 4401a103 r8 : ffffaa55 r7 : c03e357c r6 : c051b460 r5 : 000000ff r4 : 000c0000 r3 : 000b8000 r2 : c03e0514 r1 : 00000000 r0 : c0304971 Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel which is an access to the 0xb8000 without the PCI offset required to make it work. Fixes: cc22b4c ("ARM: set vga memory base at run-time") Signed-off-by: Russell King <[email protected]> Cc: <[email protected]>

commit 4365922 upstream. It's no good setting vga_base after the VGA console has been initialised, because if we do that we get this: Unable to handle kernel paging request at virtual address 000b8000 pgd = c0004000 [000b8000] *pgd=07ffc831, *pte=00000000, *ppte=00000000 0Internal error: Oops: 5017 [#1] ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ torvalds#49 task: c03e2974 ti: c03d8000 task.ti: c03d8000 PC is at vgacon_startup+0x258/0x39c LR is at request_resource+0x10/0x1c pc : [<c01725d0>] lr : [<c0022b50>] psr: 60000053 sp : c03d9f68 ip : 000b8000 fp : c03d9f8c r10: 000055aa r9 : 4401a103 r8 : ffffaa55 r7 : c03e357c r6 : c051b460 r5 : 000000ff r4 : 000c0000 r3 : 000b8000 r2 : c03e0514 r1 : 00000000 r0 : c0304971 Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel which is an access to the 0xb8000 without the PCI offset required to make it work. Fixes: cc22b4c ("ARM: set vga memory base at run-time") Signed-off-by: Russell King <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

commit 4365922 upstream. It's no good setting vga_base after the VGA console has been initialised, because if we do that we get this: Unable to handle kernel paging request at virtual address 000b8000 pgd = c0004000 [000b8000] *pgd=07ffc831, *pte=00000000, *ppte=00000000 0Internal error: Oops: 5017 [#1] ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ torvalds#49 task: c03e2974 ti: c03d8000 task.ti: c03d8000 PC is at vgacon_startup+0x258/0x39c LR is at request_resource+0x10/0x1c pc : [<c01725d0>] lr : [<c0022b50>] psr: 60000053 sp : c03d9f68 ip : 000b8000 fp : c03d9f8c r10: 000055aa r9 : 4401a103 r8 : ffffaa55 r7 : c03e357c r6 : c051b460 r5 : 000000ff r4 : 000c0000 r3 : 000b8000 r2 : c03e0514 r1 : 00000000 r0 : c0304971 Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel which is an access to the 0xb8000 without the PCI offset required to make it work. Fixes: cc22b4c ("ARM: set vga memory base at run-time") Signed-off-by: Russell King <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]>

commit 4365922 upstream. It's no good setting vga_base after the VGA console has been initialised, because if we do that we get this: Unable to handle kernel paging request at virtual address 000b8000 pgd = c0004000 [000b8000] *pgd=07ffc831, *pte=00000000, *ppte=00000000 0Internal error: Oops: 5017 [#1] ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ #49 task: c03e2974 ti: c03d8000 task.ti: c03d8000 PC is at vgacon_startup+0x258/0x39c LR is at request_resource+0x10/0x1c pc : [<c01725d0>] lr : [<c0022b50>] psr: 60000053 sp : c03d9f68 ip : 000b8000 fp : c03d9f8c r10: 000055aa r9 : 4401a103 r8 : ffffaa55 r7 : c03e357c r6 : c051b460 r5 : 000000ff r4 : 000c0000 r3 : 000b8000 r2 : c03e0514 r1 : 00000000 r0 : c0304971 Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel which is an access to the 0xb8000 without the PCI offset required to make it work. Fixes: cc22b4c ("ARM: set vga memory base at run-time") Signed-off-by: Russell King <[email protected]> Signed-off-by: Kamal Mostafa <[email protected]>

commit 4365922 upstream. It's no good setting vga_base after the VGA console has been initialised, because if we do that we get this: Unable to handle kernel paging request at virtual address 000b8000 pgd = c0004000 [000b8000] *pgd=07ffc831, *pte=00000000, *ppte=00000000 0Internal error: Oops: 5017 [#1] ARM Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.0+ #49 task: c03e2974 ti: c03d8000 task.ti: c03d8000 PC is at vgacon_startup+0x258/0x39c LR is at request_resource+0x10/0x1c pc : [<c01725d0>] lr : [<c0022b50>] psr: 60000053 sp : c03d9f68 ip : 000b8000 fp : c03d9f8c r10: 000055aa r9 : 4401a103 r8 : ffffaa55 r7 : c03e357c r6 : c051b460 r5 : 000000ff r4 : 000c0000 r3 : 000b8000 r2 : c03e0514 r1 : 00000000 r0 : c0304971 Flags: nZCv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel which is an access to the 0xb8000 without the PCI offset required to make it work. Fixes: cc22b4c ("ARM: set vga memory base at run-time") Signed-off-by: Russell King <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]> Cc: Yang Yingliang <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

…robe While working on a different issue, I noticed an annoying use after free bug on my machine when unloading the ixgbe driver: [ 8642.318797] ixgbe 0000:02:00.1: removed PHC on p2p2 [ 8642.742716] ixgbe 0000:02:00.1: complete [ 8642.743784] BUG: unable to handle kernel paging request at ffff8807d3740a90 [ 8642.744828] IP: [<ffffffffa01c77dc>] ixgbe_remove+0xfc/0x1b0 [ixgbe] [ 8642.745886] PGD 20c6067 PUD 81c1f6067 PMD 81c15a067 PTE 80000007d3740060 [ 8642.746956] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [ 8642.748039] Modules linked in: [...] [ 8642.752929] CPU: 1 PID: 1225 Comm: rmmod Not tainted 3.18.0-rc2+ torvalds#49 [ 8642.754203] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 1.1b 11/01/2013 [ 8642.755505] task: ffff8807e34d3fe0 ti: ffff8807b7204000 task.ti: ffff8807b7204000 [ 8642.756831] RIP: 0010:[<ffffffffa01c77dc>] [<ffffffffa01c77dc>] ixgbe_remove+0xfc/0x1b0 [ixgbe] [...] [ 8642.774335] Stack: [ 8642.775805] ffff8807ee824098 ffff8807ee824098 ffffffffa01f3000 ffff8807ee824000 [ 8642.777326] ffff8807b7207e18 ffffffff8137720f ffff8807ee824098 ffff8807ee824098 [ 8642.778848] ffffffffa01f3068 ffff8807ee8240f8 ffff8807b7207e38 ffffffff8144180f [ 8642.780365] Call Trace: [ 8642.781869] [<ffffffff8137720f>] pci_device_remove+0x3f/0xc0 [ 8642.783395] [<ffffffff8144180f>] __device_release_driver+0x7f/0xf0 [ 8642.784876] [<ffffffff814421f8>] driver_detach+0xb8/0xc0 [ 8642.786352] [<ffffffff814414a9>] bus_remove_driver+0x59/0xe0 [ 8642.787783] [<ffffffff814429d0>] driver_unregister+0x30/0x70 [ 8642.789202] [<ffffffff81375c65>] pci_unregister_driver+0x25/0xa0 [ 8642.790657] [<ffffffffa01eb38e>] ixgbe_exit_module+0x1c/0xc8e [ixgbe] [ 8642.792064] [<ffffffff810f93a2>] SyS_delete_module+0x132/0x1c0 [ 8642.793450] [<ffffffff81012c61>] ? do_notify_resume+0x61/0xa0 [ 8642.794837] [<ffffffff816d2029>] system_call_fastpath+0x12/0x17 The issue is that test_and_set_bit() done on adapter->state is being performed *after* the netdevice has been freed via free_netdev(). When netdev is being allocated on initialization time, it allocates a private area, here struct ixgbe_adapter, that resides after the net_device structure. In ixgbe_probe(), the device init routine, we set up the adapter after alloc_etherdev_mq() on the private area and add a reference for the pci_dev as well via pci_set_drvdata(). Both in the error path of ixgbe_probe(), but also on module unload when ixgbe_remove() is being called, commit 41c6284 ("ixgbe: Fix rcu warnings induced by LER") accesses adapter after free_netdev(). The patch stores the result in a bool and thus fixes above oops on my side. Fixes: 41c6284 ("ixgbe: Fix rcu warnings induced by LER") Cc: stable <[email protected]> Cc: Mark Rustad <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>

GIT 30a727e4802f48182d33e2bdaca586bfb6a62868 commit 6d4e81ed89c092cb156c8d19cb68c8733cd502b3 Author: Tomeu Vizoso <[email protected]> Date: Mon Nov 24 10:08:03 2014 +0100 cpufreq: Ref the policy object sooner Do it before it's assigned to cpufreq_cpu_data, otherwise when a driver tries to get the cpu frequency during initialization the policy kobj is referenced and we get this warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 64 at include/linux/kref.h:47 kobject_get+0x64/0x70() Modules linked in: CPU: 1 PID: 64 Comm: irq/77-tegra-ac Not tainted 3.18.0-rc4-next-20141114ccu-00050-g3eff942 #326 [<c0016fac>] (unwind_backtrace) from [<c001272c>] (show_stack+0x10/0x14) [<c001272c>] (show_stack) from [<c06085d8>] (dump_stack+0x98/0xd8) [<c06085d8>] (dump_stack) from [<c002892c>] (warn_slowpath_common+0x84/0xb4) [<c002892c>] (warn_slowpath_common) from [<c00289f8>] (warn_slowpath_null+0x1c/0x24) [<c00289f8>] (warn_slowpath_null) from [<c0220290>] (kobject_get+0x64/0x70) [<c0220290>] (kobject_get) from [<c03e944c>] (cpufreq_cpu_get+0x88/0xc8) [<c03e944c>] (cpufreq_cpu_get) from [<c03e9500>] (cpufreq_get+0xc/0x64) [<c03e9500>] (cpufreq_get) from [<c0285288>] (actmon_thread_isr+0x134/0x198) [<c0285288>] (actmon_thread_isr) from [<c0069008>] (irq_thread_fn+0x1c/0x40) [<c0069008>] (irq_thread_fn) from [<c0069324>] (irq_thread+0x134/0x174) [<c0069324>] (irq_thread) from [<c0040290>] (kthread+0xdc/0xf4) [<c0040290>] (kthread) from [<c000f4b8>] (ret_from_fork+0x14/0x3c) ---[ end trace b7bd64a81b340c59 ]--- Signed-off-by: Tomeu Vizoso <[email protected]> Acked-by: Viresh Kumar <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> commit e874bf5f7647a9fdf14d72dbb376ec95327e3a81 Author: Lars-Peter Clausen <[email protected]> Date: Tue Nov 25 21:41:03 2014 +0100 ASoC: Disable regmap helpers if regmap is disabled If regmap is disabled there will be no users of the ASoC regmap helpers. Furthermore regmap_exit() will no be defined causing the following compile error: sound/soc/soc-core.c: In function 'snd_soc_component_exit_regmap': sound/soc/soc-core.c:2645:2: error: implicit declaration of function 'regmap_exit' [-Werror=implicit-function-declaration] So disable the helpers if regmap is disabled. Reported-by: kbuild test robot <[email protected]> Fixes: 20feb881988c ASoC: Add helper functions for deferred regmap setup") Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Mark Brown <[email protected]> commit a5e9ab291c608c62691b9d565104a30d931998bf Author: Greg Kroah-Hartman <[email protected]> Date: Tue Nov 25 12:46:39 2014 -0800 Revert "serial: of-serial: add PM suspend/resume support" This reverts commit 2dea53bf57783f243c892e99c10c6921e956aa7e. Turns out to be broken :( Cc: Jingchang Lu <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit 86261fdd65ce076c0aa05dbf3f5f5fe10aab1bcf Author: Petr Cvek <[email protected]> Date: Tue Nov 25 06:05:33 2014 +0100 i2c: pxa: add support for SCCB devices Add support for SCCB by implementing I2C_M_IGNORE_NAK and I2C_M_STOP flags and advertising functionality flag I2C_FUNC_PROTOCOL_MANGLING. Also fixed missing functionality flag I2C_FUNC_NOSTART. Signed-off-by: Petr Cvek <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> commit 23173eae7b9a5389d3f7031b77cde34f63b814a2 Author: Alexander Kochetkov <[email protected]> Date: Tue Nov 25 02:20:55 2014 +0400 omap: i2c: don't check bus state IP rev3.3 and earlier Commit 0f5768bf894f ("i2c: omap: implement workaround for handling invalid BB-bit values") introduce the error result in boot test fault on OMAP3530 boards. The patch fix the error (disable i2c bus test for OMAP3530). Reported-by: Kevin Hilman <[email protected]> Signed-off-by: Alexander Kochetkov <[email protected]> Fixes: 0f5768bf894f ("i2c: omap: implement workaround for handling invalid BB-bit values") Tested-by: Tony Lindgren <[email protected]> Tested-by: Kevin Hilman <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> commit a620a6bc1c94c22d6c312892be1e0ae171523125 Author: Thadeu Lima de Souza Cascardo <[email protected]> Date: Tue Nov 25 14:21:11 2014 -0200 tg3: fix ring init when there are more TX than RX channels If TX channels are set to 4 and RX channels are set to less than 4, using ethtool -L, the driver will try to initialize more RX channels than it has allocated, causing an oops. This fix only initializes the RX ring if it has been allocated. Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]> Signed-off-by: David S. Miller <[email protected]> commit 69b93607a9862b1db2f0f2e078e1396f8e20fa9b Author: Lars-Peter Clausen <[email protected]> Date: Tue Nov 25 20:29:40 2014 +0100 ASoC: qi_lb60: Pass flags to gpiod_get() Pass flags to gpiod_get() to automatically configure the GPIOs. This is shorter and not passing any flags to gpiod_get() will eventually no longer be supported. Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Mark Brown <[email protected]> commit c3658e8d0f10147fc86018be7f11668246c156d3 Author: Eric Dumazet <[email protected]> Date: Tue Nov 25 07:40:04 2014 -0800 tcp: fix possible NULL dereference in tcp_vX_send_reset() After commit ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode") we have to relax check against skb dst in tcp_v[46]_send_reset() if prequeue dropped the dst. If a socket is provided, a full lookup was done to find this socket, so the dst test can be skipped. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191 Reported-by: Jaša Bartelj <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Daniel Borkmann <[email protected]> Fixes: ca777eff51f7 ("tcp: remove dst refcount false sharing for prequeue mode") Signed-off-by: David S. Miller <[email protected]> commit 7d63a5f9b25ba6b130da8eb2d32a72b1462d0249 Author: Larry Finger <[email protected]> Date: Tue Nov 25 10:32:07 2014 -0600 rtlwifi: Change order in device startup The existing order of steps when starting the PCI devices works for 2.4G devices, but fails to initialize the 5G section of the RTL8821AE hardware. This patch is needed to fix the regression reported in Bug #88811 (https://bugzilla.kernel.org/show_bug.cgi?id=88811). Reported-by: Valerio Passini <[email protected]> Tested-by: Valerio Passini <[email protected]> Signed-off-by: Larry Finger <[email protected]> Cc: Valerio Passini <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit a91ed1901a80b401afa1b718d941d3450d868151 Author: Larry Finger <[email protected]> Date: Tue Nov 25 10:32:06 2014 -0600 rtlwifi: rtl8821ae: Fix 5G detection problem The changes associated with moving this driver from staging to the regular tree missed one section setting the allowable rates for the 5GHz band. This patch is needed to fix the regression reported in Bug #88811 (https://bugzilla.kernel.org/show_bug.cgi?id=88811). Reported-by: Valerio Passini <[email protected]> Tested-by: Valerio Passini <[email protected]> Signed-off-by: Larry Finger <[email protected]> Cc: Valerio Passini <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 43612d7c04f1a4f5e60104143918fcdf018b66ee Author: Pablo Neira <[email protected]> Date: Tue Nov 25 19:54:47 2014 +0100 Revert "netfilter: conntrack: fix race in __nf_conntrack_confirm against get_next_corpse" This reverts commit 5195c14c8b27cc0b18220ddbf0e5ad3328a04187. If the conntrack clashes with an existing one, it is left out of the unconfirmed list, thus, crashing when dropping the packet and releasing the conntrack since golden rule is that conntracks are always placed in any of the existing lists for traceability reasons. Reported-by: Daniel Borkmann <[email protected]> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=88841 Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]> commit 3dc2b6a8d38cf6c7604ec25f3d50d6ec8da04435 Author: Alexander Duyck <[email protected]> Date: Mon Nov 24 20:08:38 2014 -0800 vxlan: Fix boolean flip in VXLAN_F_UDP_ZERO_CSUM6_[TX|RX] In "vxlan: Call udp_sock_create" there was a logic error that resulted in the default for IPv6 VXLAN tunnels going from using checksums to not using checksums. Since there is currently no support in iproute2 for setting these values it means that a kernel after the change cannot talk over a IPv6 VXLAN tunnel to a kernel prior the change. Fixes: 3ee64f3 ("vxlan: Call udp_sock_create") Cc: Tom Herbert <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]> commit f3750817a9034bd8cbb5ba583cd544e7cf14c7d8 Author: Alexander Duyck <[email protected]> Date: Mon Nov 24 20:08:32 2014 -0800 ip6_udp_tunnel: Fix checksum calculation The UDP checksum calculation for VXLAN tunnels is currently using the socket addresses instead of the actual packet source and destination addresses. As a result the checksum calculated is incorrect in some cases. Also uh->check was being set twice, first it was set to 0, and then it is set again in udp6_set_csum. This change removes the redundant assignment to 0. Fixes: acbf74a7 ("vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.") Cc: Andy Zhou <[email protected]> Signed-off-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]> commit 18ca43823f3ce111c6efb8cc90d9f35246527727 Author: Amitkumar Karwar <[email protected]> Date: Tue Nov 25 06:43:06 2014 -0800 mwifiex: add Tx status support for ACTION frames ACK status (0/1) for ACTION frames is informed to cfg80211. We will extend existing logic used for EAPOL frames. The cfg80211 API is different here. Also, we need to explicitly free cloned skb. Signed-off-by: Cathy Luo <[email protected]> Signed-off-by: Avinash Patil <[email protected]> Signed-off-by: Amitkumar Karwar <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 808bbebcc8fcbcb2b93aefd8b181a0fdccb407c6 Author: Amitkumar Karwar <[email protected]> Date: Tue Nov 25 06:43:05 2014 -0800 mwifiex: add Tx status support for EAPOL packets Firmware notifies the driver through event if EAPOL data packet has been acked or not. We will inform this status to userspace listening on a socket. Signed-off-by: Cathy Luo <[email protected]> Signed-off-by: Avinash Patil <[email protected]> Signed-off-by: Amitkumar Karwar <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 381e9fffe6b8343c2479939178ef7ded50bf32d3 Author: Amitkumar Karwar <[email protected]> Date: Tue Nov 25 06:43:04 2014 -0800 mwifiex: skip delay main work logic for USB interface. We had introduced delay main work logic to avoid processing interrupts when Rx pending packet count reaches high threshold. interrupt processing is restarted later when packet count reduces lower threashold. This helped to reduce unnecessary overhead and improve throughput for SD and PCIe chipsets. As there are no interrupts for USB, we will skip this logic for USB chipsets. Signed-off-by: Cathy Luo <[email protected]> Signed-off-by: Avinash Patil <[email protected]> Signed-off-by: Amitkumar Karwar <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 798ea8eec16d33e0553b6be7175a23e8ddf60eee Author: Amitkumar Karwar <[email protected]> Date: Tue Nov 25 06:43:03 2014 -0800 mwifiex: fix scan problem on big endian platforms This patch adds missing endian conversion for beacon size while processing scan response. Reported-by: Daniel Mosquera <[email protected]> Tested-by: Daniel Mosquera <[email protected]> Signed-off-by: Avinash Patil <[email protected]> Signed-off-by: Amitkumar Karwar <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 51974611154038f1aaf6ce843bdc6445d5684ee3 Author: Amitkumar Karwar <[email protected]> Date: Tue Nov 25 06:43:02 2014 -0800 mwifiex: fix sparse warning This patch fixes following sparse warnings: drivers/net/wireless/mwifiex/util.c:152:19: warning: cast from restricted __le16 drivers/net/wireless/mwifiex/util.c:152:19: warning: restricted __le16 degrades to integer Signed-off-by: Amitkumar Karwar <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 8b537686a116b060475d94b6f548c78289935fef Author: Lorenzo Bianconi <[email protected]> Date: Tue Nov 25 00:21:41 2014 +0100 ath9k: add TPC capability to TX descriptor path Add TPC capability to TX descriptor path. Cap per-packet TX power according to TX power per-rate tables. Currently TPC is supported just by AR9003 based chips Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 23f53dd3062628d2215cab810e4cfc22c29d47ee Author: Lorenzo Bianconi <[email protected]> Date: Tue Nov 25 00:21:40 2014 +0100 ath9k: add TX power per-rate tables Add TX power per-rate tables for different MIMO modes (e.g STBC) in order to cap the maximum TX power value per-rate in the TX descriptor path. Cap TX power for self generated frames (ACK, RTS/CTS). Currently TPC is supported just by AR9003 based chips Signed-off-by: Lorenzo Bianconi <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 4f3fff148679d5850994e53c57eb798cd201c88c Author: Arend van Spriel <[email protected]> Date: Thu Nov 20 22:27:02 2014 +0100 brcmfmac: correct .disconnect() callback while connecting When the driver has sent a join iovar to the firmware it waits for the events to report result of the connection. However, the wpa_supplicant will request a .disconnect() after a timeout. So upon calling .disconnect() the interface state may still be CONNECTING. Clear the CONNECTING bit as well. Reviewed-by: Hante Meuleman <[email protected]> Reviewed-by: Pieter-Paul Giesberts <[email protected]> Signed-off-by: Arend van Spriel <[email protected]> Signed-off-by: John W. Linville <[email protected]> commit 3fedeab10b3bb09744a6467fe7cd157f055137c3 Author: Hariprasad Shenai <[email protected]> Date: Tue Nov 25 08:33:58 2014 +0530 cxgb4/cxgb4vf/csiostor: Add T4/T5 PCI ID Table Add a new file t4_pci_id_tbl.h that contains T4/T5 PCI ID Table so that for all drivers that uses T4/T5 PCI functions changes can be done in one place. checkpatch.pl script reports following error, which if tried to fix ends up in compilation error. ERROR: Macros with complex values should be enclosed in parentheses +#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END \ + { 0, } \ + } WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? new file mode 100644 ERROR: Macros with complex values should be enclosed in parentheses +#define CH_PCI_ID_TABLE_FENTRY(devid) \ + CH_PCI_ID_TABLE_ENTRY((devid) | \ + ((CH_PCI_DEVICE_ID_FUNCTION) << 8)), \ + CH_PCI_ID_TABLE_ENTRY((devid) | \ + ((CH_PCI_DEVICE_ID_FUNCTION2) << 8)) ERROR: Macros with complex values should be enclosed in parentheses +#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END { 0, } } ERROR: Macros with complex values should be enclosed in parentheses +#define CH_PCI_DEVICE_ID_TABLE_DEFINE_END { 0, } } Signed-off-by: Hariprasad Shenai <[email protected]> Signed-off-by: David S. Miller <[email protected]> commit 138a7f49270fde7547afe976a01cef2b9fbf3a0e Author: Andrew Lutomirski <[email protected]> Date: Mon Nov 24 12:02:29 2014 -0800 net-timestamp: Fix a documentation typo SOF_TIMESTAMPING_OPT_ID puts the id in ee_data, not ee_info. Cc: Willem de Bruijn <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]> commit c2d00875c151087c41e23a5afd6bd807019c9a53 Author: Dmitry Monakhov <[email protected]> Date: Tue Nov 25 13:18:39 2014 -0500 ext4: fix potential use after free during resize We need some sort of synchronization while updating ->s_group_desc because there are a lot of users which can access old ->s_group_desc array after it was released. It is reasonable to use lightweight seqcount_t here instead of RCU. Signed-off-by: Dmitry Monakhov <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit d6d12b85f7daa15a02cabfc9026225b6e16d9aad Author: Dmitry Monakhov <[email protected]> Date: Tue Nov 25 13:08:04 2014 -0500 ext4: cleanup GFP flags inside resize path We must use GFP_NOFS instead GFP_KERNEL inside ext4_mb_add_groupinfo and ext4_calculate_overhead() because they are called from inside a journal transaction. Call trace: ioctl ->ext4_group_add ->journal_start ->ext4_setup_new_descs ->ext4_mb_add_groupinfo -> GFP_KERNEL ->ext4_flex_group_add ->ext4_update_super ->ext4_calculate_overhead -> GFP_KERNEL ->journal_stop Signed-off-by: Dmitry Monakhov <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit 2193dda5eec60373c7a061c129c6ab9d658f78e9 Author: Alan Stern <[email protected]> Date: Tue Nov 25 12:28:46 2014 +0100 USB: host: Remove ehci-octeon and ohci-octeon drivers Remove special-purpose octeon drivers and instead use ehci-platform and ohci-platform as suggested with http://marc.info/?l=linux-mips&m=140139694721623&w=2 [andreas.herrmann: fixed compile error] Cc: David Daney <[email protected]> Cc: Alex Smith <[email protected]> Signed-off-by: Alan Stern <[email protected]> Signed-off-by: Andreas Herrmann <[email protected]> Acked-by: Ralf Baechle <[email protected]> Tested-by: Aaro Koskinen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> commit f83b050b3c223278c4299d8736ffdafceb2c6c7c Author: Jan Kara <[email protected]> Date: Tue Nov 25 11:55:24 2014 -0500 ext4: introduce aging to extent status tree Introduce a simple aging to extent status tree. Each extent has a REFERENCED bit which gets set when the extent is used. Shrinker then skips entries with referenced bit set and clears the bit. Thus frequently used extents have higher chances of staying in memory. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit ee4bb47d2bc8149b11e4d1dc2378730cb166d5a8 Author: Jan Kara <[email protected]> Date: Tue Nov 25 11:53:47 2014 -0500 ext4: cleanup flag definitions for extent status tree Currently flags for extent status tree are defined twice, once shifted and once without a being shifted. Consolidate these definitions into one place and make some computations automatic to make adding flags less error prone. Compiler should be clever enough to figure out these are constants and generate the same code. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit 9915b75359e3cc99a4dd3b39f83f394395f8f1aa Author: Jan Kara <[email protected]> Date: Tue Nov 25 11:51:23 2014 -0500 ext4: limit number of scanned extents in status tree shrinker Currently we scan extent status trees of inodes until we reclaim nr_to_scan extents. This can however require a lot of scanning when there are lots of delayed extents (as those cannot be reclaimed). Change shrinker to work as shrinkers are supposed to and *scan* only nr_to_scan extents regardless of how many extents did we actually reclaim. We however need to be careful and avoid scanning each status tree from the beginning - that could lead to a situation where we would not be able to reclaim anything at all when first nr_to_scan extents in the tree are always unreclaimable. We remember with each inode offset where we stopped scanning and continue from there when we next come across the inode. Note that we also need to update places calling __es_shrink() manually to pass reasonable nr_to_scan to have a chance of reclaiming anything and not just 1. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit 149f510220f5df2e3eada0dfc8a0b18208ffaf65 Author: Jan Kara <[email protected]> Date: Tue Nov 25 11:49:25 2014 -0500 ext4: move handling of list of shrinkable inodes into extent status code Currently callers adding extents to extent status tree were responsible for adding the inode to the list of inodes with freeable extents. This is error prone and puts list handling in unnecessarily many places. Just add inode to the list automatically when the first non-delay extent is added to the tree and remove inode from the list when the last non-delay extent is removed. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit aaf3ce623d42e534492c19e3b314c800e8d97b84 Author: Zheng Liu <[email protected]> Date: Tue Nov 25 11:47:01 2014 -0500 ext4: change LRU to round-robin in extent status tree shrinker In this commit we discard the lru algorithm for inodes with extent status tree because it takes significant effort to maintain a lru list in extent status tree shrinker and the shrinker can take a long time to scan this lru list in order to reclaim some objects. We replace the lru ordering with a simple round-robin. After that we never need to keep a lru list. That means that the list needn't be sorted if the shrinker can not reclaim any objects in the first round. Cc: Andreas Dilger <[email protected]> Signed-off-by: Zheng Liu <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit 036d4c3764f230f39fba5b2574a6c29b00a48bf9 Author: Zheng Liu <[email protected]> Date: Tue Nov 25 11:44:37 2014 -0500 ext4: cache extent hole in extent status tree for ext4_da_map_blocks() Currently extent status tree doesn't cache extent hole when a write looks up in extent tree to make sure whether a block has been allocated or not. In this case, we don't put extent hole in extent cache because later this extent might be removed and a new delayed extent might be added back. But it will cause a defect when we do a lot of writes. If we don't put extent hole in extent cache, the following writes also need to access extent tree to look at whether or not a block has been allocated. It brings a cache miss. This commit fixes this defect. Also if the inode doesn't have any extent, this extent hole will be cached as well. Cc: Andreas Dilger <[email protected]> Signed-off-by: Zheng Liu <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit f3c08e9bdc8b51aa941aed2fedb4ab468ce7588e Author: Jan Kara <[email protected]> Date: Tue Nov 25 11:41:49 2014 -0500 ext4: fix block reservation for bigalloc filesystems For bigalloc filesystems we have to check whether newly requested inode block isn't already part of a cluster for which we already have delayed allocation reservation. This check happens in ext4_ext_map_blocks() and that function sets EXT4_MAP_FROM_CLUSTER if that's the case. However if ext4_da_map_blocks() finds in extent cache information about the block, we don't call into ext4_ext_map_blocks() and thus we always end up getting new reservation even if the space for cluster is already reserved. This results in overreservation and premature ENOSPC reports. Fix the problem by checking for existing cluster reservation already in ext4_da_map_blocks(). That simplifies the logic and actually allows us to get rid of the EXT4_MAP_FROM_CLUSTER flag completely. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> commit 439b8bddaa1ebed9f9f8fb2f6f33f5e639d76ab8 Author: Dmitry Lavnikevich <[email protected]> Date: Fri Nov 21 18:29:07 2014 +0300 mfd: da9063: Get irq base dynamically before registering device After registering mfd device with proper irq_base platform_get_irq_byname() calls will return VIRQ instead of local IRQ. This fixes da9063 rtc registration issue: da9063-rtc da9063-rtc: Failed to request ALARM IRQ 1: -22 Signed-off-by: Dmitry Lavnikevich <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 2c20f6de95afef89127163d16c88cd0456c48077 Author: Krzysztof Kozlowski <[email protected]> Date: Wed Nov 12 16:28:02 2014 +0100 mfd: max14577: Fix obvious typo in company name in copyright Fix a typo in name of company in copyright comment. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 0e50e92669357e4702fcd6a85e2f0d6e92295664 Author: Lee Jones <[email protected]> Date: Tue Nov 11 12:36:46 2014 +0000 mfd: axp20x: Constify axp20x_acpi_match and rid unused warning axp20x.c:239:30: warning: ‘axp20x_acpi_match’ defined but not used [-Wunused-variable] Signed-off-by: Lee Jones <[email protected]> commit 71d679b84ce8ca3207e547488f70c259575d2f2f Author: Dmitry Eremin-Solenikov <[email protected]> Date: Mon Nov 17 18:07:42 2014 +0300 mfd: t7l66xb: prepare/unprepare clocks Change clk_enable/disable() calls to clk_prepare_enable() and clk_disable_unprepare(). Signed-off-by: Dmitry Eremin-Solenikov <[email protected]> commit 7263bd39251e6926ca7fa5591679b26577fdaccb Author: Dmitry Eremin-Solenikov <[email protected]> Date: Mon Nov 17 18:07:43 2014 +0300 mfd: tc6387xb: prepare/unprepare clocks Change clk_enable/disable() calls to clk_prepare_enable() and clk_disable_unprepare(). Signed-off-by: Dmitry Eremin-Solenikov <[email protected]> commit 21cf3318d675b6ceeb5a3ed82ffe467a2b6eaee4 Author: Laurentiu Palcu <[email protected]> Date: Fri Nov 7 14:45:14 2014 +0200 mfd: dln2: add support for USB-SPI module Signed-off-by: Laurentiu Palcu <[email protected]> commit 783f6fc4cecd770dfdb1418c7c890dbeb3bf3c91 Author: Charles Keepax <[email protected]> Date: Tue Nov 4 13:04:07 2014 +0000 mfd: wm5110: Add missing registers for AIF2 channels 3-6 When the extra 4 channels were added to AIF2 the necessary frame control registers were not given defaults and marked readable. This patch fixes this. Signed-off-by: Charles Keepax <[email protected]> commit 90f2d0f7bf069b1a2798156b7dcc8e7d1e874406 Author: Linus Walleij <[email protected]> Date: Tue Oct 28 11:06:56 2014 +0100 mfd: tc3589x: get rid of static base The TC3589x driver is now a device tree-only driver, so we want only dynamic IRQs and GPIO numbers from the tc3589x, no static assignments. Signed-off-by: Linus Walleij <[email protected]> commit 47958c5ab4035bd91f05598f76a61cd9f7f2934c Author: Charles Keepax <[email protected]> Date: Tue Nov 4 15:24:36 2014 +0000 mfd: arizona: Document HP_CTRL_1L and HP_CTRL_1R registers These registers are documented in the datasheet and used as part of the extcon driver. Expose them properly through regmap as the datasheet notes they should be treated as volatile do so. Signed-off-by: Charles Keepax <[email protected]> commit e62cace7b602b29cc9226e64dfb2c47ddfb9558e Author: Charles Keepax <[email protected]> Date: Tue Nov 4 15:26:22 2014 +0000 mfd: wm8997: Mark INTERRUPT_STATUS_2_MASK as readable Technically this register is not used on wm8997 however the regmap core requires a continuous block of IRQs. The simplest solution is just to add the register. Signed-off-by: Charles Keepax <[email protected]> commit a64ab6b4cd098f6c2ea959fe9bf1fd3f8b13b1f3 Author: Dmitry Eremin-Solenikov <[email protected]> Date: Thu Nov 6 11:52:38 2014 +0300 mfd: tc6393xb: Prepare/unprepare clocks Change clk_enable/disable() calls to clk_prepare_enable() and clk_disable_unrepapre(). Signed-off-by: Dmitry Eremin-Solenikov <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 12849b63a4e9e22fb63d0fc967726e8cdf2a19c2 Author: Lee Jones <[email protected]> Date: Mon Nov 10 12:28:36 2014 +0000 mfd: tps65090: Fix bonkers indenting strategy First spotted pointless (incorrect) indent of 'if (ret)', then double indentations of a struct attribute 'mask'. Decided to go through the whole file and make amendments instead and this is the result. Signed-off-by: Lee Jones <[email protected]> commit 1a5fb99de4850cba710d91becfa2c65653048589 Author: Dmitry Eremin-Solenikov <[email protected]> Date: Fri Oct 24 21:19:57 2014 +0400 mfd: tc6393xb: Fail ohci suspend if full state restore is required Some boards with TC6393XB chip require full state restore during system resume thanks to chip's VCC being cut off during suspend (Sharp SL-6000 tosa is one of them). Failing to do so would result in ohci Oops on resume due to internal memory contentes being changed. Fail ohci suspend on tc6393xb is full state restore is required. Recommended workaround is to unbind tmio-ohci driver before suspend and rebind it after resume. Cc: [email protected] Signed-off-by: Dmitry Eremin-Solenikov <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit bde3e706a63d5c258b3e4f5e327bcf032fb1adfe Author: Andy Shevchenko <[email protected]> Date: Mon Nov 3 19:29:23 2014 +0200 mfd: lpc_sch: Don't call mfd_remove_devices() MFD core already cares about failing registration. It will remove successfully registered devices in case of error. Thus, no need to repeatedly call mfd_remove_devices(). Fixes: 5829e9b64e65 (mfd: lpc_sch: Accomodate partial population of the MFD devices) Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 1753b40f5c97e0d0bf2f0a562603cfc592945a5e Author: Joe Perches <[email protected]> Date: Sun Oct 26 22:25:02 2014 -0700 mfd: wm8350-core: Fix probable mask then right shift defect Precedence of & and >> is not the same and is not left to right. shift has higher precedence and should be done after the mask. Add parentheses around the mask. Signed-off-by: Joe Perches <[email protected]> Acked-by: Charles Keepax <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 7d082baa349e59ce3de6452abc05e5de6436aad4 Author: Guenter Roeck <[email protected]> Date: Thu Oct 9 09:18:38 2014 -0700 mfd: ab8500-sysctrl: Drop ab8500_restart ab8500_restart is not called from anywhere in the kernel, so drop it. Signed-off-by: Guenter Roeck <[email protected]> Acked-by: Linus Walleij <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 6bdf891a17148a1b91beb603b09c599dc98eb4fb Author: Lee Jones <[email protected]> Date: Mon Nov 3 16:12:26 2014 +0000 mfd: db8500-prcmu: Provide sane error path values Also rid superfluous gotos and label. Cc: Linus Walleij <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 51a7e02bb629498c32915881ed4fb61ef778282a Author: Pramod Gurav <[email protected]> Date: Thu Oct 30 14:51:35 2014 +0530 mfd: db8500-prcmu: Check return of devm_ioremap for error Error check around return value of devm_ioremap is missing. Add the same to avoid NULL pointer dereference. Signed-off-by: Pramod Gurav <[email protected]> Acked-by: Linus Walleij <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 1b9b46d05f887aec418b3a5f4f55abf79316fcda Author: Tony Lindgren <[email protected]> Date: Sun Nov 2 10:09:38 2014 -0800 mfd: twl4030-power: Fix regression with missing compatible flag Commit e7cd1d1eb16f ("mfd: twl4030-power: Add generic reset configuration") accidentally removed the compatible flag for "ti,twl4030-power" that should be there as documented in the binding. If "ti,twl4030-power" only the poweroff configuration is done by the driver. Fixes: e7cd1d1eb16f ("mfd: twl4030-power: Add generic reset configuration") Cc: [email protected] # v3.16+ Reported-by: "Dr. H. Nikolaus Schaller" <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 5cb5d9616a47d5383a85379afa4429382ef46b38 Author: Micky Ching <[email protected]> Date: Fri Oct 10 13:58:44 2014 +0800 mfd: rtsx: Fix PM suspend for 5227 & 5249 Fix rts5227&5249 failed send buffer cmd after suspend, PM_CTRL3 should reset before send any buffer cmd after suspend. Otherwise, buffer cmd will failed, this will lead resume fail. Signed-off-by: Micky Ching <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 130dd5b039dcbab7bcb2fdce0bb3cc7347b08b29 Author: Krzysztof Kozlowski <[email protected]> Date: Tue Oct 21 13:23:16 2014 +0200 mfd/regulator: dt-bindings: max77686: Document regulators off in suspend Add information which regulators can be disabled during system suspend. Suggested-by: Javier Martinez Canillas <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 6ce286f182e22837a02a368eeb49a0057b2cd5f9 Author: Javier Martinez Canillas <[email protected]> Date: Mon Oct 20 23:05:50 2014 +0200 Revert "mfd: sec-core: Prepare regulators for suspend state to reduce power-consumption" This reverts commit b7cde7078d2344073c310aa65fc2b0a845d2cb5b ("mfd: sec-core: Prepare regulators for suspend state to reduce power-consumption") Commit b7cde7078d23 called regulator_suspend_prepare() to prepare the regulators for a suspend state. But it did from the device pm suspend handler while the regulator suspend prepare function iterates over all regulators and not only the one managed by this device so it doesn't seems to be correct to call it from within a device driver. It is better to call the regulator suspend prepare/finish functions from platform code instead so this patch reverts the mentioned commit. Suggested-by: Doug Anderson <[email protected]> Signed-off-by: Javier Martinez Canillas <[email protected]> Reviewed-by: Chanwoo Choi <[email protected]> Reviewed-by: Doug Anderson <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit efa3ca414be9b930774b5c1a3190f735596f5115 Author: Krzysztof Kozlowski <[email protected]> Date: Mon Oct 20 14:34:46 2014 +0200 mfd: max77693: Map charger device to its own of_node Add a "maxim,max77693-charger" of_compatible to the mfd_cell so the MFD child device (the charger) will have its own of_node set. This will be used by the max77693 charger driver in next patches to obtain battery configuration from DTS. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 6f467e5f0cdc3c0304bfcabcbaf56970ddea3d52 Author: Will Sheppard <[email protected]> Date: Wed Oct 15 09:38:47 2014 +0100 mfd: arizona-spi: Add lines after declarations - checkpatch catch This was found whilst running checkpatch.pl on arizona-spi. WARNING: Missing a blank line after declarations + struct arizona *arizona = spi_get_drvdata(spi); + arizona_dev_exit(arizona); Signed-off-by: Will Sheppard <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 6cbac55368324ec2a0e59e192c5b596d0f4569f7 Author: Krzysztof Kozlowski <[email protected]> Date: Fri Oct 10 10:23:53 2014 +0200 mfd: max77693: Remove unused define Remove old MAX77693_NUM_IRQ_MUIC_REGS define. Not used anywhere. Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 590b7795b3dc293a36136a4321ba59da60e5853c Author: Boris Brezillon <[email protected]> Date: Mon Oct 6 15:48:44 2014 +0200 mfd: Add documentation for atmel-hlcdc DT bindings The HLCDC IP available on some Atmel SoCs (i.e. at91sam9n12, at91sam9x5 family or sama5d3 family) exposes 2 subdevices: - a display controller (controlled by a DRM driver) - a PWM chip This patch adds documentation for atmel-hlcdc DT bindings. Signed-off-by: Boris Brezillon <[email protected]> Tested-by: Anthony Harivel <[email protected]> Tested-by: Ludovic Desroches <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 2c86e9fb7263dbca2c21a086090d32ba90129f7b Author: Boris Brezillon <[email protected]> Date: Mon Oct 6 15:48:43 2014 +0200 mfd: Add atmel-hlcdc driver The HLCDC IP available on some Atmel SoCs (i.e. at91sam9n12, at91sam9x5 family or sama5d3 family) exposes 2 subdevices: - a display controller (controlled by a DRM driver) - a PWM chip The MFD device provides a regmap and several clocks (those connected to this hardware block) to its subdevices. This way concurrent accesses to the iomem range are handled by the regmap framework, and each subdevice can safely access HLCDC registers. Signed-off-by: Boris Brezillon <[email protected]> Tested-by: Anthony Harivel <[email protected]> Tested-by: Ludovic Desroches <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 6e3f62f0793ebff3f91076490ff0fbb107939701 Author: Johan Hovold <[email protected]> Date: Fri Sep 26 12:55:33 2014 +0200 mfd: core: Fix platform-device id generation Make sure to always honour multi-function devices registered with PLATFORM_DEVID_NONE (-1) or PLATFORM_DEVID_AUTO (-2) as id base. In this case it does not make sense to append the cell id to the mfd-id base and potentially change the requested behaviour. Specifically this will allow multi-function devices to be registered with PLATFORM_DEVID_AUTO while still having non-zero cell ids. Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 16b5fe2966b8dc4a474d6618d382d26933d90b24 Author: Johan Hovold <[email protected]> Date: Fri Sep 26 12:55:32 2014 +0200 HID: hid-sensor-hub: Use mfd_add_hotplug_devices() helper Use mfd_add_hotplug_devices() helper to register the subdevices. Compile-only tested. Signed-off-by: Johan Hovold <[email protected]> Acked-by: Jiri Kosina <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 1ab589c72ef66f5f281d658ab606e02d661031d8 Author: Johan Hovold <[email protected]> Date: Fri Sep 26 12:55:31 2014 +0200 mfd: Use mfd_add_hotplug_devices() helper Use mfd_add_hotplug_devices helper to register the subdevices. Signed-off-by: Johan Hovold <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit bdb0066df96e74a4002125467ebe459feff1ebef Author: Pankaj Dubey <[email protected]> Date: Tue Sep 30 14:05:27 2014 +0530 mfd: syscon: Decouple syscon interface from platform devices Currently a syscon entity can be only registered directly through a platform device that binds to a dedicated syscon driver. However in certain use cases it is desirable to make a device used with another driver a syscon interface provider. For example, certain SoCs (e.g. Exynos) contain system controller blocks which perform various functions such as power domain control, CPU power management, low power mode control, but in addition contain certain IP integration glue, such as various signal masks, coprocessor power control, etc. In such case, there is a need to have a dedicated driver for such system controller but also share registers with other drivers. The latter is where the syscon interface is helpful. In case of DT based platforms, this patch decouples syscon object from syscon platform driver, and allows to create syscon objects first time when it is required by calling of syscon_regmap_lookup_by APIs and keep a list of such syscon objects along with syscon provider device_nodes and regmap handles. For non-DT based platforms, this patch keeps syscon platform driver structure so that syscon can be probed and such non-DT based drivers can use syscon_regmap_lookup_by_pdev API and access regmap handles. Once all users of "syscon_regmap_lookup_by_pdev" migrated to DT based, we can completely remove platform driver of syscon, and keep only helper functions to get regmap handles. Suggested-by: Arnd Bergmann <[email protected]> Suggested-by: Tomasz Figa <[email protected]> Tested-by: Vivek Gautam <[email protected]> Tested-by: Javier Martinez Canillas <[email protected]> Signed-off-by: Pankaj Dubey <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Tested-by: Heiko Stuebner <[email protected]> Reviewed-by: Heiko Stuebner <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit ee828d02611f325a70ef6a3c1fd2dd1eb3bc9704 Author: Jaewon Kim <[email protected]> Date: Thu Sep 18 01:40:26 2014 +0900 mfd: max77693: Update DT binding to support haptic This patch add haptic DT binding documentation and example to support haptic driver in max77693 Multifunction device. Signed-off-by: Jaewon Kim <[email protected]> Acked-by: Chanwoo Choi <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit d1bafd78fce5f07bfc9c2c8a77146f74528a469e Author: Jaewon Kim <[email protected]> Date: Thu Sep 18 01:40:25 2014 +0900 mfd: max77693: Add haptic of_compatible in mfd_cell This patch add haptic of_compatible in order to use the haptic device driver using Devicetree. Signed-off-by: Jaewon Kim <[email protected]> Acked-by: Chanwoo Choi <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit efbf49224acc8ba5ac4d7ac93b5035836aebf400 Author: Jaewon Kim <[email protected]> Date: Thu Sep 18 01:40:24 2014 +0900 mfd: max77693: Initialize haptic register map This patch add regmap_haptic initialization to use haptic register map in haptic device driver. Signed-off-by: Jaewon Kim <[email protected]> Acked-by: Chanwoo Choi <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 11d0d30093301169833aedfc130d9e4abe621be1 Author: Johannes Pointner <[email protected]> Date: Thu Sep 25 08:31:42 2014 +0200 mfd: tps65217: Add compatible string for subdevices Adds of_compatible strings to mfd_cells for sub devices of the tps65217. Signed-off-by: Johannes Pointner <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 65fa5f773196ccc88c4aae43ece9119f092c20a6 Author: Damien Lespiau <[email protected]> Date: Tue Nov 25 13:45:41 2014 +0000 drm/i915: Fix short description of intel_display_power_is_enabled() That's the version actually taking the dev_priv->power_domains lock. Signed-off-by: Damien Lespiau <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> commit 07c802bd7c3951410b105356787d655e273ab537 Author: Will Deacon <[email protected]> Date: Tue Nov 25 15:26:13 2014 +0000 arm64: vmlinux.lds.S: don't discard .exit.* sections at link-time .exit.* sections may be subject to patching by the new alternatives framework and so shouldn't be discarded at link-time. Without this patch, such a section will result in the following linker error: `.exit.text' referenced in section `.altinstructions' of drivers/built-in.o: defined in discarded section `.exit.text' of drivers/built-in.o Signed-off-by: Will Deacon <[email protected]> commit af86e5974d3069bd26ebcf7c046c6e59726acaaa Author: Laura Abbott <[email protected]> Date: Fri Nov 21 21:50:42 2014 +0000 arm64: Factor out fixmap initialization from ioremap The fixmap API was originally added for arm64 for early_ioremap purposes. It can be used for other purposes too so move the initialization from ioremap to somewhere more generic. This makes it obvious where the fixmap is being set up and allows for a cleaner implementation of __set_fixmap. Reviewed-by: Kees Cook <[email protected]> Acked-by: Mark Rutland <[email protected]> Tested-by: Mark Rutland <[email protected]> Tested-by: Kees Cook <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit c3684fbb446501b48dec6677a6a9f61c215053de Author: Laura Abbott <[email protected]> Date: Fri Nov 21 21:50:40 2014 +0000 arm64: Move cpu_resume into the text section The function cpu_resume currently lives in the .data section. There's no reason for it to be there since we can use relative instructions without a problem. Move a few cpu_resume data structures out of the assembly file so the .data annotation can be dropped completely and cpu_resume ends up in the read only text section. Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Reviewed-by: Lorenzo Pieralisi <[email protected]> Tested-by: Mark Rutland <[email protected]> Tested-by: Lorenzo Pieralisi <[email protected]> Tested-by: Kees Cook <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit ac2dec5f6c27a581f8571da605d9ba04df18330d Author: Laura Abbott <[email protected]> Date: Fri Nov 21 21:50:39 2014 +0000 arm64: Switch to adrp for loading the stub vectors The hyp stub vectors are currently loaded using adr. This instruction has a +/- 1MB range for the loading address. If the alignment for sections is changed the address may be more than 1MB away, resulting in reclocation errors. Switch to using adrp for getting the address to ensure we aren't affected by the location of the __hyp_stub_vectors. Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Marc Zyngier <[email protected]> Tested-by: Mark Rutland <[email protected]> Tested-by: Kees Cook <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit fcff588633e848aa728a4437ef96d437299ba03d Author: Laura Abbott <[email protected]> Date: Fri Nov 21 21:50:38 2014 +0000 arm64: Treat handle_arch_irq as a function pointer handle_arch_irq isn't actually text, it's just a function pointer. It doesn't need to be stored in the text section and doing so causes problesm if we ever want to make the kernel text read only. Declare handle_arch_irq as a proper function pointer stored in the data section. Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Mark Rutland <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Tested-by: Mark Rutland <[email protected]> Tested-by: Kees Cook <[email protected]> Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit 3eebdbe5fc7d64c7a6ef14cc5b8be518ffd563fa Author: Mark Rutland <[email protected]> Date: Tue Nov 25 13:27:43 2014 +0000 arm64: sanity checks: add ID_AA64DFR{0,1}_EL1 While we currently expect self-hosted debug support to be identical across CPUs, we don't currently sanity check this. This patch adds logging of the ID_AA64DFR{0,1}_EL1 values and associated sanity checking code. It's not clear to me whether we need to check PMUVer, TraceVer, and DebugVer, as we don't currently rely on these fields at all. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit efdf4211d5b103535ae22972acadf57c9fc38b30 Author: Mark Rutland <[email protected]> Date: Tue Nov 25 13:27:42 2014 +0000 arm64: sanity checks: add missing newline to print A missing newline in the WARN_TAINT_ONCE string results in ugly and somewhat difficult to read output in the case of a sanity check failure, as the next print does not appear on a new line: Unsupported CPU feature variation.Modules linked in: This patch adds the missing newline, fixing the output formatting. Signed-off-by: Mark Rutland <[email protected]> Cc: Catalin Marinas <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit 9760270c36a4d2ac640ea6294f4f8634a8b27121 Author: Mark Rutland <[email protected]> Date: Tue Nov 25 13:27:41 2014 +0000 arm64: sanity checks: ignore ID_MMFR0.AuxReg It seems that Cortex-A53 r0p4 added support for AIFSR and ADFSR, and ID_MMFR0.AuxReg has been updated accordingly to report this fact. As Cortex-A53 could be paired with CPUs which do not implement these registers (e.g. all current revisions of Cortex-A57), this may trigger a sanity check failure at boot. The AuxReg value describes the availability of the ACTLR, AIFSR, and ADFSR registers, which are only of use to 32-bit guest OSs, and have IMPLEMENTATION DEFINED contents. Given the nature of these registers it is likely that KVM will need to trap accesses regardless of whether the CPUs are heterogeneous. This patch masks out the ID_MMFR0.AuxReg value from the sanity checks, preventing spurious warnings at boot time. Signed-off-by: Mark Rutland <[email protected]> Reported-by: Andre Przywara <[email protected]> Cc: Catalin Marinas <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Peter Maydell <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit 1cefdaea613bcd247b2235079fcbb0ca4687542d Author: Mark Brown <[email protected]> Date: Fri Nov 21 00:36:49 2014 +0000 arm64: topology: Fix handling of multi-level cluster MPIDR-based detection The only requirement the scheduler has on cluster IDs is that they must be unique. When enumerating the topology based on MPIDR information the kernel currently generates cluster IDs by using the first level of affinity above the core ID (either level one or two depending on if the core has multiple threads) however the ARMv8 architecture allows for up to three levels of affinity. This means that an ARMv8 system may contain cores which have MPIDRs identical other than affinity level three which with current code will cause us to report multiple cores with the same identification to the scheduler in violation of its uniqueness requirement. Ensure that we do not violate the scheduler requirements on systems that uses all the affinity levels by incorporating both affinity levels two and three into the cluser ID when the cores are not threaded. While no currently known hardware uses multi-level clusters it is better to program defensively, this will help ease bringup of systems that have them and will ensure that things like distribution install media do not need to be respun to replace kernels in order to deploy such systems. In the worst case the system will work but perform suboptimally until a kernel modified to handle the new topology better is installed, in the best case this will be an adequate description of such topologies for the scheduler to perform well. Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit c0a01b84b1fdbd98bff5bca5b201fe73fda7e9d9 Author: Andre Przywara <[email protected]> Date: Fri Nov 14 15:54:12 2014 +0000 arm64: protect alternatives workarounds with Kconfig options Not all of the errata we have workarounds for apply necessarily to all SoCs, so people compiling a kernel for one very specific SoC may not need to patch the kernel. Introduce a new submenu in the "Platform selection" menu to allow people to turn off certain bugs if they are not affected. By default all of them are enabled. Normal users or distribution kernels shouldn't bother to deselect any bugs here, since the alternatives framework will take care of patching them in only if needed. Signed-off-by: Andre Przywara <[email protected]> [will: moved kconfig menu under `Kernel Features'] Signed-off-by: Will Deacon <[email protected]> commit 5afaa1fc1b320cec48affa7e6949f2493f875c12 Author: Andre Przywara <[email protected]> Date: Fri Nov 14 15:54:11 2014 +0000 arm64: add Cortex-A57 erratum 832075 workaround The ARM erratum 832075 applies to certain revisions of Cortex-A57, one of the workarounds is to change device loads into using load-aquire semantics. This is achieved using the alternatives framework. Signed-off-by: Andre Przywara <[email protected]> Signed-off-by: Will Deacon <[email protected]> commit 301bcfac42897dbd1b0b3c1be49f24654a1bc49e Author: Andre Przywara <[email protected]> Date: Fri Nov 14 15:54:10 2014 +0000 arm64: add Cortex-A53 cache errata workaround The ARM errata 819472, 826319, 827319 and 824069 define the same workaround for these hardware issues in certain Cortex-A53 parts. Use the new alternatives framework and the CPU MIDR detection to patch "cache clean" into "cache clean and invalidate" instructions if an affected CPU is detected at runtime. Signed-off-by: Andre Przywara <[email protected]> [will: add __maybe_unused to squash gcc warning] Signed-off-by: Will Deacon <[email protected]> commit 159a5e920446aed12fe373ecc3c7b3dc667091ae Author: Chanwoo Choi <[email protected]> Date: Tue Nov 18 17:59:43 2014 +0900 mfd: s2mps11: Add binding documentation for Samsung S2MPS13 PMIC This patch adds the binding documentation for Samsung S2MPS13 PMIC which is similiar with existing S2MPS14 PMIC. S2MPS13 has the different number of regulators from S2MPS14 and RTC/Clock is the same with the S2MPS14. Signed-off-by: Chanwoo Choi <[email protected]> Acked-by: Sangbeom Kim <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit f928b53d749bac4514c2ed1ea4d553fa581578b5 Author: Chanwoo Choi <[email protected]> Date: Tue Nov 18 17:59:41 2014 +0900 clk: s2mps11: Add the support for S2MPS13 PMIC clock This patch adds the support for S2MPS13 PMIC clock which is same with existing S2MPS14 RTC IP. But, S2MPS13 uses all of clocks (32khz_{ap|bt|cp}). Signed-off-by: Chanwoo Choi <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Acked-by: Michael Turquette <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 76b9840b24ae049b39f1b3cf0e49f21b7c41748f Author: Chanwoo Choi <[email protected]> Date: Tue Nov 18 17:59:40 2014 +0900 regulator: s2mps11: Add support S2MPS13 regulator device This patch adds S2MPS13 regulator device to existing S2MPS11 device driver. The S2MPS13 has just different number of regulators from S2MPS14. The S2MPS13 regulator device includes LDO[1-40] and BUCK[1-10]. Signed-off-by: Chanwoo Choi <[email protected]> Acked-by: Sangbeom Kim <[email protected]> Acked-by: Mark Brown <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit 3bc2ee91a470c52fb3979c23c12d43283455f10d Author: Chanwoo Choi <[email protected]> Date: Tue Nov 18 17:59:39 2014 +0900 mfd: sec-core: Add support for S2MPS13 device This patch adds the support for Samsung S2MPS13 PMIC device to the sec-core MFD driver. The S2MPS13 is very similar with existing S2MPS14 and includes PMIC/ RTC/CLOCK devices. Signed-off-by: Chanwoo Choi <[email protected]> Acked-by: Sangbeom Kim <[email protected]> Signed-off-by: Lee Jones <[email protected]> commit ce79d54ae447d65117303ca9f61ffe9dcbc2465d Author: Pantelis Antoniou <[email protected]> Date: Tue Oct 28 22:36:05 2014 +0200 spi/of: Add OF notifier handler Add OF notifier handler needed for creating/destroying spi devices according to dynamic runtime changes in the DT live tree. This code is enabled when CONFIG_OF_DYNAMIC is selected. Signed-off-by: Pantelis Antoniou <[email protected]> Signed-off-by: Grant Likely <[email protected]> Reviewed-by: Mark Brown <[email protected]> Cc: <[email protected]> commit aff5e3f89a0b07e7edf9243f913378cc27e4110d Author: Pantelis Antoniou <[email protected]> Date: Wed Oct 29 10:40:37 2014 +0200 spi/of: Create new device registration method and accessors Dynamically inserting spi device nodes requires the use of a single device registration method. Refactor the existing of_register_spi_devices() to split out the core functionality for a single device into a separate function; of_register_spi_device(). This function will be used by the OF_DYNAMIC overlay code to make live modifications to the tree. Methods to lookup a device/master using a device node are added as well, of_find_spi_master_by_node() & of_find_spi_device_by_node(). Signed-off-by: Pantelis Antoniou <[email protected]> [grant.likely] Split patch into two pieces for clarity Signed-off-by: Grant Likely <[email protected]> Reviewed-by: Mark Brown <[email protected]> Cc: <[email protected]> commit ea7513bbc04170f1cbf42953187a4d8b731c71c4 Author: Pantelis Antoniou <[email protected]> Date: Tue Oct 28 22:36:03 2014 +0200 i2c/of: Add OF_RECONFIG notifier handler CONFIG_OF_DYNAMIC enables runtime changes to the device tree which in turn may trigger addition or removal of devices from Linux. Add an OF_RECONFIG notifier handler to receive tree change events and to creating or destroy i2c devices as required. Signed-off-by: Pantelis Antoniou <[email protected]> [grant.likely: clean up #ifdefs and drop unneeded error handling] Signed-off-by: Grant Likely <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Cc: Rob Herring <[email protected]> Cc: [email protected] commit a430a3455f2c48995e06b359a82a1109a419e9ef Author: Pantelis Antoniou <[email protected]> Date: Tue Oct 28 22:36:02 2014 +0200 i2c/of: Factor out Devicetree registration code Dynamically inserting i2c client device nodes requires the use of a single device registration method. Factor out the loop body of of_i2c_register_devices() so that it can be called for individual device_nodes instead of for all the children of a node. Note: The diff of this commit looks far more complicated than it actually is due the indentation being changed for a large block of code. When viewed using the diff -w flag to ignore whitespace changes it can be seen that the change is actually quite simple. Signed-off-by: Pantelis Antoniou <[email protected]> [grant.likely: Made new function static and removed changes to header] Signed-off-by: Grant Likely <[email protected]> Reviewed-by: Wolfram Sang <[email protected]> Cc: Rob Herring <[email protected]> Cc: [email protected] commit ebf3992061db1f7b3aa093f37fb308acc74fbc82 Author: Tony Lindgren <[email protected]> Date: Mon Nov 24 11:05:06 2014 -0800 usb: musb: Use IS_ENABLED for tusb6010 This removes the ifdef clutter a bit and saves few lines. It also makes it easier to detect the remaining places where we have conditional building of code done based on if defined for things like DMA. Signed-off-by: Tony Lindgren <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> commit 82c02f58ba3a1ee0a067c0f90513e826d6152ba6 Author: Tony Lindgren <[email protected]> Date: Mon Nov 24 11:05:05 2014 -0800 usb: musb: Allow multiple glue layers to be built in There's no reason any longer to keep it as a choice now that the IO access has been fixed. Signed-off-by: Tony Lindgren <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> commit 8a77f05aa39be879535f22a9757e703581fa1392 Author: Tony Lindgren <[email protected]> Date: Mon Nov 24 11:05:04 2014 -0800 usb: musb: Pass fifo_mode in platform data This allows setting the correct fifo_mode when multiple MUSB glue layers are built-in. Cc: Fabio Baltieri <[email protected]> Cc: Lee Jones <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Acked-by: Apelete Seketeli <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> commit d026e9c76aac3632af174cf02d5c94defa5e6026 Author: Tony Lindgren <[email protected]> Date: Mon Nov 24 11:05:03 2014 -0800 usb: musb: Change end point selection to use new IO access This allows the endpoints to work when multiple MUSB glue layers are built in. Cc: Fabio Baltieri <[email protected]> Cc: Lee Jones <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Acked-by: Linus Walleij <[email protected]> Acked-by: Apelete Seketeli <[email protected]> Signed-off-by: Tony Lindgren <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> commit 1b40fc57a517878cf4c2e16ce29cc9a066dc1064 Author: Tony Lindgren <[email protected]> Date: Mon Nov 24 11:05:02 2014 -0800 usb: musb: Change to use new IO access Change to use new IO access. This allows us to build in multiple …

…robe commit b5b2ffc upstream. While working on a different issue, I noticed an annoying use after free bug on my machine when unloading the ixgbe driver: [ 8642.318797] ixgbe 0000:02:00.1: removed PHC on p2p2 [ 8642.742716] ixgbe 0000:02:00.1: complete [ 8642.743784] BUG: unable to handle kernel paging request at ffff8807d3740a90 [ 8642.744828] IP: [<ffffffffa01c77dc>] ixgbe_remove+0xfc/0x1b0 [ixgbe] [ 8642.745886] PGD 20c6067 PUD 81c1f6067 PMD 81c15a067 PTE 80000007d3740060 [ 8642.746956] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [ 8642.748039] Modules linked in: [...] [ 8642.752929] CPU: 1 PID: 1225 Comm: rmmod Not tainted 3.18.0-rc2+ #49 [ 8642.754203] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 1.1b 11/01/2013 [ 8642.755505] task: ffff8807e34d3fe0 ti: ffff8807b7204000 task.ti: ffff8807b7204000 [ 8642.756831] RIP: 0010:[<ffffffffa01c77dc>] [<ffffffffa01c77dc>] ixgbe_remove+0xfc/0x1b0 [ixgbe] [...] [ 8642.774335] Stack: [ 8642.775805] ffff8807ee824098 ffff8807ee824098 ffffffffa01f3000 ffff8807ee824000 [ 8642.777326] ffff8807b7207e18 ffffffff8137720f ffff8807ee824098 ffff8807ee824098 [ 8642.778848] ffffffffa01f3068 ffff8807ee8240f8 ffff8807b7207e38 ffffffff8144180f [ 8642.780365] Call Trace: [ 8642.781869] [<ffffffff8137720f>] pci_device_remove+0x3f/0xc0 [ 8642.783395] [<ffffffff8144180f>] __device_release_driver+0x7f/0xf0 [ 8642.784876] [<ffffffff814421f8>] driver_detach+0xb8/0xc0 [ 8642.786352] [<ffffffff814414a9>] bus_remove_driver+0x59/0xe0 [ 8642.787783] [<ffffffff814429d0>] driver_unregister+0x30/0x70 [ 8642.789202] [<ffffffff81375c65>] pci_unregister_driver+0x25/0xa0 [ 8642.790657] [<ffffffffa01eb38e>] ixgbe_exit_module+0x1c/0xc8e [ixgbe] [ 8642.792064] [<ffffffff810f93a2>] SyS_delete_module+0x132/0x1c0 [ 8642.793450] [<ffffffff81012c61>] ? do_notify_resume+0x61/0xa0 [ 8642.794837] [<ffffffff816d2029>] system_call_fastpath+0x12/0x17 The issue is that test_and_set_bit() done on adapter->state is being performed *after* the netdevice has been freed via free_netdev(). When netdev is being allocated on initialization time, it allocates a private area, here struct ixgbe_adapter, that resides after the net_device structure. In ixgbe_probe(), the device init routine, we set up the adapter after alloc_etherdev_mq() on the private area and add a reference for the pci_dev as well via pci_set_drvdata(). Both in the error path of ixgbe_probe(), but also on module unload when ixgbe_remove() is being called, commit 41c6284 ("ixgbe: Fix rcu warnings induced by LER") accesses adapter after free_netdev(). The patch stores the result in a bool and thus fixes above oops on my side. Fixes: 41c6284 ("ixgbe: Fix rcu warnings induced by LER") Cc: Mark Rustad <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Active pages should not be freed to the page allocator as it triggers a bad page state warning. Fengguang Wu reported the following bug and bisected it to the patch "mm: remove lru parameter from __lru_cache_add and lru_cache_add_lru" which is currently in mmotm as mm-remove-lru-parameter-from-__lru_cache_add-and-lru_cache_add_lru.patch [ 84.212960] BUG: Bad page state in process rm pfn:0b0c9 [ 84.214682] page:ffff88000d646240 count:0 mapcount:0 mapping: (null) index:0x0 [ 84.216883] page flags: 0x20000000004c(referenced|uptodate|active) [ 84.218697] CPU: 1 PID: 283 Comm: rm Not tainted 3.10.0-rc4-04361-geeb9bfc torvalds#49 [ 84.220729] ffff88000d646240 ffff88000d179bb8 ffffffff82562956 ffff88000d179bd8 [ 84.223242] ffffffff811333f1 000020000000004c ffff88000d646240 ffff88000d179c28 [ 84.225387] ffffffff811346a4 ffff880000270000 0000000000000000 0000000000000006 [ 84.227294] Call Trace: [ 84.227867] [<ffffffff82562956>] dump_stack+0x27/0x30 [ 84.229045] [<ffffffff811333f1>] bad_page+0x130/0x158 [ 84.230261] [<ffffffff811346a4>] free_pages_prepare+0x8b/0x1e3 [ 84.231765] [<ffffffff8113542a>] free_hot_cold_page+0x28/0x1cf [ 84.233171] [<ffffffff82585830>] ? _raw_spin_unlock_irqrestore+0x6b/0xc6 [ 84.234822] [<ffffffff81135b59>] free_hot_cold_page_list+0x30/0x5a [ 84.236311] [<ffffffff8113a4ed>] release_pages+0x251/0x267 [ 84.237653] [<ffffffff8112a88d>] ? delete_from_page_cache+0x48/0x9e [ 84.239142] [<ffffffff8113ad93>] __pagevec_release+0x2b/0x3d [ 84.240473] [<ffffffff8113b45a>] truncate_inode_pages_range+0x1b0/0x7ce [ 84.242032] [<ffffffff810e76ab>] ? put_lock_stats.isra.20+0x1c/0x53 [ 84.243480] [<ffffffff810e77f5>] ? lock_release_holdtime+0x113/0x11f [ 84.244935] [<ffffffff8113ba8c>] truncate_inode_pages+0x14/0x1d [ 84.246337] [<ffffffff8119b3ef>] evict+0x11f/0x232 [ 84.247501] [<ffffffff8119c527>] iput+0x1a5/0x218 [ 84.248607] [<ffffffff8118f015>] do_unlinkat+0x19b/0x25a [ 84.249828] [<ffffffff810ea993>] ? trace_hardirqs_on_caller+0x210/0x2ce [ 84.251382] [<ffffffff8144372e>] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 84.252879] [<ffffffff8118f10d>] SyS_unlinkat+0x39/0x4c [ 84.254174] [<ffffffff825874d6>] system_call_fastpath+0x1a/0x1f [ 84.255596] Disabling lock debugging due to kernel taint The problem was that a page marked for activation was released via pagevec. This patch clears the active bit before freeing in this case. Signed-off-by: Mel Gorman <[email protected]> Reported-and-Tested-by: Fengguang Wu <[email protected]> Signed-off-by: Paul Reioux <[email protected]>

MIPS: Ci20: Change DT node for LED triggers to match silkscreen

This fixes a bug which results in stale vcore pointers being left in the per-cpu preempted vcore lists when a VM is destroyed. The result of the stale vcore pointers is usually either a crash or a lockup inside collect_piggybacks() when another VM is run. A typical lockup message looks like: [ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039] [ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv] [ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ #49 [ 472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000 [ 472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130 [ 472.162337] REGS: c000001e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+) [ 472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24848844 XER: 00000000 [ 472.162588] CFAR: c00000000096b0ac SOFTE: 1 GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001 GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821 GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740 GPR12: c000000000111130 c00000000fdae400 [ 472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130 [ 472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130 [ 472.163179] Call Trace: [ 472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable) [ 472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90 [ 472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv] [ 472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv] [ 472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm] [ 472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [ 472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm] [ 472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0 [ 472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0 [ 472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0 [ 472.164060] Instruction dump: [ 472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2 [ 472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070 The bug is that kvmppc_run_vcpu does not correctly handle the case where a vcpu task receives a signal while its guest vcpu is executing in the guest as a result of being piggy-backed onto the execution of another vcore. In that case we need to wait for the vcpu to finish executing inside the guest, and then remove this vcore from the preempted vcores list. That way, we avoid leaving this vcpu's vcore on the preempted vcores list when the vcpu gets interrupted. Fixes: ec25716 Reported-by: Thomas Huth <[email protected]> Tested-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>

If a user key gets negatively instantiated, an error code is cached in the payload area. A negatively instantiated key may be then be positively instantiated by updating it with valid data. However, the ->update key type method must be aware that the error code may be there. The following may be used to trigger the bug in the user key type: keyctl request2 user user "" @U keyctl add user user "a" @U which manifests itself as: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046 PGD 7cc30067 PUD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ torvalds#49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000 RIP: 0010:[<ffffffff810a376f>] [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046 RSP: 0018:ffff88003dd8bdb0 EFLAGS: 00010246 RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001 RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82 RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000 R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82 R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700 FS: 0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0 Stack: ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620 Call Trace: [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129 [< inline >] __key_update security/keys/key.c:730 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908 [< inline >] SYSC_add_key security/keys/keyctl.c:125 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185 Note the error code (-ENOKEY) in EDX. A similar bug can be tripped by: keyctl request2 trusted user "" @U keyctl add trusted user "a" @U This should also affect encrypted keys - but that has to be correctly parameterised or it will fail with EINVAL before getting to the bit that will crashes. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: David Howells <[email protected]> Acked-by: Mimi Zohar <[email protected]>

If a user key gets negatively instantiated, an error code is cached in the payload area. A negatively instantiated key may be then be positively instantiated by updating it with valid data. However, the ->update key type method must be aware that the error code may be there. The following may be used to trigger the bug in the user key type: keyctl request2 user user "" @U keyctl add user user "a" @U which manifests itself as: BUG: unable to handle kernel paging request at 00000000ffffff8a IP: [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046 PGD 7cc30067 PUD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 2644 Comm: a.out Not tainted 4.3.0+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff88003ddea700 ti: ffff88003dd88000 task.ti: ffff88003dd88000 RIP: 0010:[<ffffffff810a376f>] [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 [<ffffffff810a376f>] __call_rcu.constprop.76+0x1f/0x280 kernel/rcu/tree.c:3046 RSP: 0018:ffff88003dd8bdb0 EFLAGS: 00010246 RAX: 00000000ffffff82 RBX: 0000000000000000 RCX: 0000000000000001 RDX: ffffffff81e3fe40 RSI: 0000000000000000 RDI: 00000000ffffff82 RBP: ffff88003dd8bde0 R08: ffff88007d2d2da0 R09: 0000000000000000 R10: 0000000000000000 R11: ffff88003e8073c0 R12: 00000000ffffff82 R13: ffff88003dd8be68 R14: ffff88007d027600 R15: ffff88003ddea700 FS: 0000000000b92880(0063) GS:ffff88007fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000ffffff8a CR3: 000000007cc5f000 CR4: 00000000000006e0 Stack: ffff88003dd8bdf0 ffffffff81160a8a 0000000000000000 00000000ffffff82 ffff88003dd8be68 ffff88007d027600 ffff88003dd8bdf0 ffffffff810a39e5 ffff88003dd8be20 ffffffff812a31ab ffff88007d027600 ffff88007d027620 Call Trace: [<ffffffff810a39e5>] kfree_call_rcu+0x15/0x20 kernel/rcu/tree.c:3136 [<ffffffff812a31ab>] user_update+0x8b/0xb0 security/keys/user_defined.c:129 [< inline >] __key_update security/keys/key.c:730 [<ffffffff8129e5c1>] key_create_or_update+0x291/0x440 security/keys/key.c:908 [< inline >] SYSC_add_key security/keys/keyctl.c:125 [<ffffffff8129fc21>] SyS_add_key+0x101/0x1e0 security/keys/keyctl.c:60 [<ffffffff8185f617>] entry_SYSCALL_64_fastpath+0x12/0x6a arch/x86/entry/entry_64.S:185 Note the error code (-ENOKEY) in EDX. A similar bug can be tripped by: keyctl request2 trusted user "" @U keyctl add trusted user "a" @U This should also affect encrypted keys - but that has to be correctly parameterised or it will fail with EINVAL before getting to the bit that will crashes. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: David Howells <[email protected]> Acked-by: Mimi Zohar <[email protected]> Signed-off-by: James Morris <[email protected]>

…11-kernels Correct the KERNEL_VERSION logic regarding whether or not

This fixes a bug which results in stale vcore pointers being left in the per-cpu preempted vcore lists when a VM is destroyed. The result of the stale vcore pointers is usually either a crash or a lockup inside collect_piggybacks() when another VM is run. A typical lockup message looks like: [ 472.161074] NMI watchdog: BUG: soft lockup - CPU#24 stuck for 22s! [qemu-system-ppc:7039] [ 472.161204] Modules linked in: kvm_hv kvm_pr kvm xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ses enclosure shpchp rtc_opal i2c_opal powernv_rng binfmt_misc dm_service_time scsi_dh_alua radeon i2c_algo_bit drm_kms_helper ttm drm tg3 ptp pps_core cxgb3 ipr i2c_core mdio dm_multipath [last unloaded: kvm_hv] [ 472.162111] CPU: 24 PID: 7039 Comm: qemu-system-ppc Not tainted 4.2.0-kvm+ openbmc#49 [ 472.162187] task: c000001e38512750 ti: c000001e41bfc000 task.ti: c000001e41bfc000 [ 472.162262] NIP: c00000000096b094 LR: c00000000096b08c CTR: c000000000111130 [ 472.162337] REGS: c000001e41bff520 TRAP: 0901 Not tainted (4.2.0-kvm+) [ 472.162399] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24848844 XER: 00000000 [ 472.162588] CFAR: c00000000096b0ac SOFTE: 1 GPR00: c000000000111170 c000001e41bff7a0 c00000000127df00 0000000000000001 GPR04: 0000000000000003 0000000000000001 0000000000000000 0000000000874821 GPR08: c000001e41bff8e0 0000000000000001 0000000000000000 d00000000efde740 GPR12: c000000000111130 c00000000fdae400 [ 472.163053] NIP [c00000000096b094] _raw_spin_lock_irqsave+0xa4/0x130 [ 472.163117] LR [c00000000096b08c] _raw_spin_lock_irqsave+0x9c/0x130 [ 472.163179] Call Trace: [ 472.163206] [c000001e41bff7a0] [c000001e41bff7f0] 0xc000001e41bff7f0 (unreliable) [ 472.163295] [c000001e41bff7e0] [c000000000111170] __wake_up+0x40/0x90 [ 472.163375] [c000001e41bff830] [d00000000efd6fc0] kvmppc_run_core+0x1240/0x1950 [kvm_hv] [ 472.163465] [c000001e41bffa30] [d00000000efd8510] kvmppc_vcpu_run_hv+0x5a0/0xd90 [kvm_hv] [ 472.163559] [c000001e41bffb70] [d00000000e9318a4] kvmppc_vcpu_run+0x44/0x60 [kvm] [ 472.163653] [c000001e41bffba0] [d00000000e92e674] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm] [ 472.163745] [c000001e41bffbe0] [d00000000e9263a8] kvm_vcpu_ioctl+0x538/0x7b0 [kvm] [ 472.163834] [c000001e41bffd40] [c0000000002d0f50] do_vfs_ioctl+0x480/0x7c0 [ 472.163910] [c000001e41bffde0] [c0000000002d1364] SyS_ioctl+0xd4/0xf0 [ 472.163986] [c000001e41bffe30] [c000000000009260] system_call+0x38/0xd0 [ 472.164060] Instruction dump: [ 472.164098] ebc1fff0 ebe1fff8 7c0803a6 4e800020 60000000 60000000 60420000 8bad02e2 [ 472.164224] 7fc3f378 4b6a57c1 60000000 7c210b78 <e92d0000> 89290009 792affe3 40820070 The bug is that kvmppc_run_vcpu does not correctly handle the case where a vcpu task receives a signal while its guest vcpu is executing in the guest as a result of being piggy-backed onto the execution of another vcore. In that case we need to wait for the vcpu to finish executing inside the guest, and then remove this vcore from the preempted vcores list. That way, we avoid leaving this vcpu's vcore on the preempted vcores list when the vcpu gets interrupted. Fixes: ec25716 Reported-by: Thomas Huth <[email protected]> Tested-by: Thomas Huth <[email protected]> Signed-off-by: Paul Mackerras <[email protected]>

…place functions In the mac and vlan register/unregister/replace functions, the driver locks the mac table mutex (or vlan table mutex) on both ports. Use mutex_lock_nested() to prevent warnings, such as the one below. [ 101.828445] ============================================= [ 101.834820] [ INFO: possible recursive locking detected ] [ 101.841199] 4.5.0-rc2-cmpn_stream_dbg-vm_2016_02_22-14_48_g759ca5d torvalds#49 Not tainted [ 101.850251] --------------------------------------------- [ 101.856621] modprobe/3054 is trying to acquire lock: [ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core] [ 101.874598] [ 101.874598] but task is already holding lock: [ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core] [ 101.893776] [ 101.893776] other info that might help us debug this: [ 101.901658] Possible unsafe locking scenario: [ 101.901658] [ 101.908859] CPU0 [ 101.911923] ---- [ 101.914985] lock(&table->mutex#2); [ 101.919595] lock(&table->mutex#2); [ 101.924199] [ 101.924199] * DEADLOCK * [ 101.924199] [ 101.931643] May be due to missing lock nesting notation This fix used here is the same as the one used by Haggai Eran in commit 31b57b8 ("IB/ucma: Fix lockdep warning in ucma_lock_files"). issue: 736271 Change-Id: Ic74edc755ef8c7d15bd96a122aa484f9b442cce4 Fixes: 5f61385 ("net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode") Suggested-by: Doron Tsur <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]>

In the mac and vlan register/unregister/replace functions, the driver locks the mac table mutex (or vlan table mutex) on both ports. We move to use mutex_lock_nested() to prevent warnings, such as the one below. [ 101.828445] ============================================= [ 101.834820] [ INFO: possible recursive locking detected ] [ 101.841199] 4.5.0-rc2+ torvalds#49 Not tainted [ 101.850251] --------------------------------------------- [ 101.856621] modprobe/3054 is trying to acquire lock: [ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core] [ 101.874598] [ 101.874598] but task is already holding lock: [ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core] [ 101.893776] [ 101.893776] other info that might help us debug this: [ 101.901658] Possible unsafe locking scenario: [ 101.901658] [ 101.908859] CPU0 [ 101.911923] ---- [ 101.914985] lock(&table->mutex#2); [ 101.919595] lock(&table->mutex#2); [ 101.924199] [ 101.924199] * DEADLOCK * [ 101.924199] [ 101.931643] May be due to missing lock nesting notation Fixes: 5f61385 ('net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode') Signed-off-by: Jack Morgenstein <[email protected]> Suggested-by: Doron Tsur <[email protected]> Signed-off-by: David S. Miller <[email protected]>

…place functions In the mac and vlan register/unregister/replace functions, the driver locks the mac table mutex (or vlan table mutex) on both ports. Use mutex_lock_nested() to prevent warnings, such as the one below. [ 101.828445] ============================================= [ 101.834820] [ INFO: possible recursive locking detected ] [ 101.841199] 4.5.0-rc2-cmpn_stream_dbg-vm_2016_02_22-14_48_g759ca5d torvalds#49 Not tainted [ 101.850251] --------------------------------------------- [ 101.856621] modprobe/3054 is trying to acquire lock: [ 101.862514] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c10e>] __mlx4_register_mac+0x87e/0xa90 [mlx4_core] [ 101.874598] [ 101.874598] but task is already holding lock: [ 101.881703] (&table->mutex#2){+.+.+.}, at: [<ffffffffa079c0f0>] __mlx4_register_mac+0x860/0xa90 [mlx4_core] [ 101.893776] [ 101.893776] other info that might help us debug this: [ 101.901658] Possible unsafe locking scenario: [ 101.901658] [ 101.908859] CPU0 [ 101.911923] ---- [ 101.914985] lock(&table->mutex#2); [ 101.919595] lock(&table->mutex#2); [ 101.924199] [ 101.924199] * DEADLOCK * [ 101.924199] [ 101.931643] May be due to missing lock nesting notation This fix used here is the same as the one used by Haggai Eran in commit 31b57b8 ("IB/ucma: Fix lockdep warning in ucma_lock_files"). issue: 736271 Change-Id: Ic74edc755ef8c7d15bd96a122aa484f9b442cce4 Fixes: 5f61385 ("net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode") Suggested-by: Doron Tsur <[email protected]> Signed-off-by: Jack Morgenstein <[email protected]>

jamen · 2016-03-13T07:39:45Z

Huh... Lots of references.

bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration block, but allocates it much later. As a result, the following crash happens on my setup: BUG: kernel NULL pointer dereference, address: 0000000000000090 fbcon: Taking over console #PF: supervisor write access in kernel mode #PF: error_code (0x0002) - not-present page PGD 12f382067 P4D 0 Oops: 8002 [#1] PREEMPT SMP NOPTI CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ torvalds#49 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540 FS: 000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x3a6/0x400 ? exc_page_fault+0x7a/0x1b0 ? asm_exc_page_fault+0x26/8x30 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en] _bnxt_open_nic+0x198/0xa50 [bnxt_en] ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en] bnxt_open+0xeb/0x1b0 [bnxt_en] _dev_open+0x12e/0x1f0 _dev_change_flags+0xb0/0x200 dev_change_flags+0x25/0x60 do_setlink+0x463/0x1260 ? sock_def_readable+0x14/0xc0 ? rtnl_getlink+0x4b9/0x590 ? _nla_validate_parse+0x91/0xfa0 rtnl_newlink+0xbac/0xe40 <...> Don't create a variable and dereference the first array member directly since it's used only once in the code. Fixes: ef4ee64 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index") Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Michael Chan <[email protected]> Signed-off-by: NipaLocal <nipa@local>

bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration block, but allocates it much later. As a result, the following crash happens on my setup: BUG: kernel NULL pointer dereference, address: 0000000000000090 fbcon: Taking over console #PF: supervisor write access in kernel mode #PF: error_code (0x0002) - not-present page PGD 12f382067 P4D 0 Oops: 8002 [#1] PREEMPT SMP NOPTI CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ torvalds#49 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540 FS: 000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x3a6/0x400 ? exc_page_fault+0x7a/0x1b0 ? asm_exc_page_fault+0x26/8x30 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en] _bnxt_open_nic+0x198/0xa50 [bnxt_en] ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en] bnxt_open+0xeb/0x1b0 [bnxt_en] _dev_open+0x12e/0x1f0 _dev_change_flags+0xb0/0x200 dev_change_flags+0x25/0x60 do_setlink+0x463/0x1260 ? sock_def_readable+0x14/0xc0 ? rtnl_getlink+0x4b9/0x590 ? _nla_validate_parse+0x91/0xfa0 rtnl_newlink+0xbac/0xe40 <...> Don't create a variable and dereference the first array member directly since it's used only once in the code. Fixes: ef4ee64 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index") Signed-off-by: Alexander Lobakin <[email protected]>

bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration block, but allocates it much later. As a result, the following crash happens on my setup: BUG: kernel NULL pointer dereference, address: 0000000000000090 fbcon: Taking over console #PF: supervisor write access in kernel mode #PF: error_code (0x0002) - not-present page PGD 12f382067 P4D 0 Oops: 8002 [#1] PREEMPT SMP NOPTI CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ torvalds#49 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540 FS: 000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x3a6/0x400 ? exc_page_fault+0x7a/0x1b0 ? asm_exc_page_fault+0x26/8x30 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en] _bnxt_open_nic+0x198/0xa50 [bnxt_en] ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en] bnxt_open+0xeb/0x1b0 [bnxt_en] _dev_open+0x12e/0x1f0 _dev_change_flags+0xb0/0x200 dev_change_flags+0x25/0x60 do_setlink+0x463/0x1260 ? sock_def_readable+0x14/0xc0 ? rtnl_getlink+0x4b9/0x590 ? _nla_validate_parse+0x91/0xfa0 rtnl_newlink+0xbac/0xe40 <...> Don't create a variable and dereference the first array member directly since it's used only once in the code. Fixes: ef4ee64 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index") Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Michael Chan <[email protected]> Signed-off-by: NipaLocal <nipa@local>

bnxt_alloc_mem() dereferences ::vnic_info in the variable declaration block, but allocates it much later. As a result, the following crash happens on my setup: BUG: kernel NULL pointer dereference, address: 0000000000000090 fbcon: Taking over console #PF: supervisor write access in kernel mode #PF: error_code (0x0002) - not-present page PGD 12f382067 P4D 0 Oops: 8002 [#1] PREEMPT SMP NOPTI CPU: 47 PID: 2516 Comm: NetworkManager Not tainted 6.8.0-rc5-libeth+ torvalds#49 Hardware name: Intel Corporation M50CYP2SBSTD/M58CYP2SBSTD, BIOS SE5C620.86B.01.01.0088.2305172341 05/17/2023 RIP: 0010:bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] Code: 81 c8 48 83 c8 08 31 c9 e9 d7 fe ff ff c7 44 24 Oc 00 00 00 00 49 89 d5 e9 2d fe ff ff 41 89 c6 e9 88 00 00 00 48 8b 44 24 50 <80> 88 90 00 00 00 Od 8b 43 74 a8 02 75 1e f6 83 14 02 00 00 80 74 RSP: 0018:ff3f25580f3432c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ff15a5cfc45249e0 RCX: 0000002079777000 RDX: ff15a5dfb9767000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: ff15a5dfb9777000 R11: ffffff8000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000020 R15: ff15a5cfce34f540 FS: 000007fb9a160500(0000) GS:ff15a5dfbefc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CRO: 0000000080050033 CR2: 0000000000000090 CR3: 0000000109efc00Z CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DRZ: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0x68/0xb0 ? page_fault_oops+0x3a6/0x400 ? exc_page_fault+0x7a/0x1b0 ? asm_exc_page_fault+0x26/8x30 ? bnxt_alloc_mem+0x1609/0x1910 [bnxt_en] ? bnxt_alloc_mem+0x1389/8x1918 [bnxt_en] _bnxt_open_nic+0x198/0xa50 [bnxt_en] ? bnxt_hurm_if_change+0x287/0x3d0 [bnxt_en] bnxt_open+0xeb/0x1b0 [bnxt_en] _dev_open+0x12e/0x1f0 _dev_change_flags+0xb0/0x200 dev_change_flags+0x25/0x60 do_setlink+0x463/0x1260 ? sock_def_readable+0x14/0xc0 ? rtnl_getlink+0x4b9/0x590 ? _nla_validate_parse+0x91/0xfa0 rtnl_newlink+0xbac/0xe40 <...> Don't create a variable and dereference the first array member directly since it's used only once in the code. Fixes: ef4ee64 ("bnxt_en: Define BNXT_VNIC_DEFAULT for the default vnic index") Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>

If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ torvalds#49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/[email protected] Fixes: cd11016 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Xiubo Li <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Damien Le Moal <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Dave Chinner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ torvalds#49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/[email protected] Fixes: cd11016 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Xiubo Li <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Damien Le Moal <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Dave Chinner <[email protected]> Tested-by: Xiubo Li <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>

commit 6fe6046 upstream. If stack_depot_save_flags() allocates memory it always drops __GFP_NOLOCKDEP flag. So when KASAN tries to track __GFP_NOLOCKDEP allocation we may end up with lockdep splat like bellow: ====================================================== WARNING: possible circular locking dependency detected 6.9.0-rc3+ torvalds#49 Not tainted ------------------------------------------------------ kswapd0/149 is trying to acquire lock: ffff88811346a920 (&xfs_nondir_ilock_class){++++}-{4:4}, at: xfs_reclaim_inode+0x3ac/0x590 [xfs] but task is already holding lock: ffffffff8bb33100 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x5d9/0xad0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (fs_reclaim){+.+.}-{0:0}: __lock_acquire+0x7da/0x1030 lock_acquire+0x15d/0x400 fs_reclaim_acquire+0xb5/0x100 prepare_alloc_pages.constprop.0+0xc5/0x230 __alloc_pages+0x12a/0x3f0 alloc_pages_mpol+0x175/0x340 stack_depot_save_flags+0x4c5/0x510 kasan_save_stack+0x30/0x40 kasan_save_track+0x10/0x30 __kasan_slab_alloc+0x83/0x90 kmem_cache_alloc+0x15e/0x4a0 __alloc_object+0x35/0x370 __create_object+0x22/0x90 __kmalloc_node_track_caller+0x477/0x5b0 krealloc+0x5f/0x110 xfs_iext_insert_raw+0x4b2/0x6e0 [xfs] xfs_iext_insert+0x2e/0x130 [xfs] xfs_iread_bmbt_block+0x1a9/0x4d0 [xfs] xfs_btree_visit_block+0xfb/0x290 [xfs] xfs_btree_visit_blocks+0x215/0x2c0 [xfs] xfs_iread_extents+0x1a2/0x2e0 [xfs] xfs_buffered_write_iomap_begin+0x376/0x10a0 [xfs] iomap_iter+0x1d1/0x2d0 iomap_file_buffered_write+0x120/0x1a0 xfs_file_buffered_write+0x128/0x4b0 [xfs] vfs_write+0x675/0x890 ksys_write+0xc3/0x160 do_syscall_64+0x94/0x170 entry_SYSCALL_64_after_hwframe+0x71/0x79 Always preserve __GFP_NOLOCKDEP to fix this. Link: https://lkml.kernel.org/r/[email protected] Fixes: cd11016 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Andrey Ryabinin <[email protected]> Reported-by: Xiubo Li <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Reported-by: Damien Le Moal <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Dave Chinner <[email protected]> Tested-by: Xiubo Li <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ torvalds#49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ torvalds#49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock. Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map. There are four callers for ->map_fd_put_ptr: (1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well. (2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist. Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is. (3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t. (4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t. After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable|enable} pair. Signed-off-by: Hou Tao <[email protected]>

For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ #49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock. Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map. There are four callers for ->map_fd_put_ptr: (1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well. (2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist. Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is. (3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t. (4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t. After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable|enable} pair. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>

open_cached_dir() may either race with the tcon reconnection even before compound_send_recv() or directly trigger a reconnection via SMB2_open_init() or SMB_query_info_init(). The reconnection process invokes invalidate_all_cached_dirs() via cifs_mark_open_files_invalid(), which removes all cfids from the cfids->entries list but doesn't drop a ref if has_lease isn't true. This results in the currently-being-constructed cfid not being on the list, but still having a refcount of 2. It leaks if returned from open_cached_dir(). Fix this by setting cfid->has_lease when the ref is actually taken; the cfid will not be used by other threads until it has a valid time. Addresses these kmemleaks: unreferenced object 0xffff8881090c4000 (size 1024): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O..... backtrace (crc 6f58c20f): [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350 [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881044fdcf8 (size 8): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 8 bytes): 00 cc cc cc cc cc cc cc ........ backtrace (crc 10c106a9): [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480 [<ffffffff8b7d7256>] kstrdup+0x36/0x60 [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e And addresses these BUG splats when unmounting the SMB filesystem: BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100 Modules linked in: CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:umount_check+0xd0/0x100 Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41 RSP: 0018:ffff88811cc27978 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3 R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08 R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0 FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> d_walk+0x6a/0x530 shrink_dcache_for_umount+0x6a/0x200 generic_shutdown_super+0x52/0x2a0 kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050 RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0 RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610 R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000 </TASK> irq event stamp: 1163486 hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60 hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0 softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990 softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0 ---[ end trace 0000000000000000 ]--- VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Tainted: [W]=WARN Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 This reproduces eventually with an SMB mount and two shells running these loops concurrently - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10 echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20 echo "recovering"; iptables -F OUTPUT; sleep 10; done Fixes: ebe98f1 ("cifs: enable caching of directories for which a lease is held") Fixes: 5c86919 ("smb: client: fix use-after-free in smb2_query_info_compound()") Cc: [email protected] Signed-off-by: Paul Aurich <[email protected]>

open_cached_dir() may either race with the tcon reconnection even before compound_send_recv() or directly trigger a reconnection via SMB2_open_init() or SMB_query_info_init(). The reconnection process invokes invalidate_all_cached_dirs() via cifs_mark_open_files_invalid(), which removes all cfids from the cfids->entries list but doesn't drop a ref if has_lease isn't true. This results in the currently-being-constructed cfid not being on the list, but still having a refcount of 2. It leaks if returned from open_cached_dir(). Fix this by setting cfid->has_lease when the ref is actually taken; the cfid will not be used by other threads until it has a valid time. Addresses these kmemleaks: unreferenced object 0xffff8881090c4000 (size 1024): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O..... backtrace (crc 6f58c20f): [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350 [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881044fdcf8 (size 8): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 8 bytes): 00 cc cc cc cc cc cc cc ........ backtrace (crc 10c106a9): [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480 [<ffffffff8b7d7256>] kstrdup+0x36/0x60 [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e And addresses these BUG splats when unmounting the SMB filesystem: BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100 Modules linked in: CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:umount_check+0xd0/0x100 Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41 RSP: 0018:ffff88811cc27978 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3 R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08 R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0 FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> d_walk+0x6a/0x530 shrink_dcache_for_umount+0x6a/0x200 generic_shutdown_super+0x52/0x2a0 kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050 RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0 RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610 R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000 </TASK> irq event stamp: 1163486 hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60 hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0 softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990 softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0 ---[ end trace 0000000000000000 ]--- VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Tainted: [W]=WARN Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 This reproduces eventually with an SMB mount and two shells running these loops concurrently - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10 echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20 echo "recovering"; iptables -F OUTPUT; sleep 10; done Fixes: ebe98f1 ("cifs: enable caching of directories for which a lease is held") Fixes: 5c86919 ("smb: client: fix use-after-free in smb2_query_info_compound()") Cc: [email protected] Signed-off-by: Paul Aurich <[email protected]> Signed-off-by: Steve French <[email protected]>

For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ #49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock. Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map. There are four callers for ->map_fd_put_ptr: (1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well. (2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist. Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is. (3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t. (4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t. After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable|enable} pair. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]>

open_cached_dir() may either race with the tcon reconnection even before compound_send_recv() or directly trigger a reconnection via SMB2_open_init() or SMB_query_info_init(). The reconnection process invokes invalidate_all_cached_dirs() via cifs_mark_open_files_invalid(), which removes all cfids from the cfids->entries list but doesn't drop a ref if has_lease isn't true. This results in the currently-being-constructed cfid not being on the list, but still having a refcount of 2. It leaks if returned from open_cached_dir(). Fix this by setting cfid->has_lease when the ref is actually taken; the cfid will not be used by other threads until it has a valid time. Addresses these kmemleaks: unreferenced object 0xffff8881090c4000 (size 1024): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O..... backtrace (crc 6f58c20f): [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350 [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881044fdcf8 (size 8): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 8 bytes): 00 cc cc cc cc cc cc cc ........ backtrace (crc 10c106a9): [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480 [<ffffffff8b7d7256>] kstrdup+0x36/0x60 [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e And addresses these BUG splats when unmounting the SMB filesystem: BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100 Modules linked in: CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:umount_check+0xd0/0x100 Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41 RSP: 0018:ffff88811cc27978 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3 R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08 R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0 FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> d_walk+0x6a/0x530 shrink_dcache_for_umount+0x6a/0x200 generic_shutdown_super+0x52/0x2a0 kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050 RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0 RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610 R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000 </TASK> irq event stamp: 1163486 hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60 hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0 softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990 softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0 ---[ end trace 0000000000000000 ]--- VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Tainted: [W]=WARN Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 This reproduces eventually with an SMB mount and two shells running these loops concurrently - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10 echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20 echo "recovering"; iptables -F OUTPUT; sleep 10; done Fixes: ebe98f1 ("cifs: enable caching of directories for which a lease is held") Fixes: 5c86919 ("smb: client: fix use-after-free in smb2_query_info_compound()") Cc: [email protected] Signed-off-by: Paul Aurich <[email protected]> Signed-off-by: Steve French <[email protected]>

open_cached_dir() may either race with the tcon reconnection even before compound_send_recv() or directly trigger a reconnection via SMB2_open_init() or SMB_query_info_init(). The reconnection process invokes invalidate_all_cached_dirs() via cifs_mark_open_files_invalid(), which removes all cfids from the cfids->entries list but doesn't drop a ref if has_lease isn't true. This results in the currently-being-constructed cfid not being on the list, but still having a refcount of 2. It leaks if returned from open_cached_dir(). Fix this by setting cfid->has_lease when the ref is actually taken; the cfid will not be used by other threads until it has a valid time. Addresses these kmemleaks: unreferenced object 0xffff8881090c4000 (size 1024): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O..... backtrace (crc 6f58c20f): [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350 [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881044fdcf8 (size 8): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 8 bytes): 00 cc cc cc cc cc cc cc ........ backtrace (crc 10c106a9): [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480 [<ffffffff8b7d7256>] kstrdup+0x36/0x60 [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e And addresses these BUG splats when unmounting the SMB filesystem: BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100 Modules linked in: CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:umount_check+0xd0/0x100 Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41 RSP: 0018:ffff88811cc27978 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3 R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08 R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0 FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> d_walk+0x6a/0x530 shrink_dcache_for_umount+0x6a/0x200 generic_shutdown_super+0x52/0x2a0 kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050 RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0 RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610 R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000 </TASK> irq event stamp: 1163486 hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60 hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0 softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990 softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0 ---[ end trace 0000000000000000 ]--- VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Tainted: [W]=WARN Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 This reproduces eventually with an SMB mount and two shells running these loops concurrently - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10 echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20 echo "recovering"; iptables -F OUTPUT; sleep 10; done Fixes: ebe98f1 ("cifs: enable caching of directories for which a lease is held") Fixes: 5c86919 ("smb: client: fix use-after-free in smb2_query_info_compound()") Cc: [email protected] Signed-off-by: Paul Aurich <[email protected]>

open_cached_dir() may either race with the tcon reconnection even before compound_send_recv() or directly trigger a reconnection via SMB2_open_init() or SMB_query_info_init(). The reconnection process invokes invalidate_all_cached_dirs() via cifs_mark_open_files_invalid(), which removes all cfids from the cfids->entries list but doesn't drop a ref if has_lease isn't true. This results in the currently-being-constructed cfid not being on the list, but still having a refcount of 2. It leaks if returned from open_cached_dir(). Fix this by setting cfid->has_lease when the ref is actually taken; the cfid will not be used by other threads until it has a valid time. Addresses these kmemleaks: unreferenced object 0xffff8881090c4000 (size 1024): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 32 bytes): 00 01 00 00 00 00 ad de 22 01 00 00 00 00 ad de ........"....... 00 ca 45 22 81 88 ff ff f8 dc 4f 04 81 88 ff ff ..E"......O..... backtrace (crc 6f58c20f): [<ffffffff8b895a1e>] __kmalloc_cache_noprof+0x2be/0x350 [<ffffffff8bda06e3>] open_cached_dir+0x993/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881044fdcf8 (size 8): comm "bash", pid 1860, jiffies 4295126592 hex dump (first 8 bytes): 00 cc cc cc cc cc cc cc ........ backtrace (crc 10c106a9): [<ffffffff8b89a3d3>] __kmalloc_node_track_caller_noprof+0x363/0x480 [<ffffffff8b7d7256>] kstrdup+0x36/0x60 [<ffffffff8bda0700>] open_cached_dir+0x9b0/0x1fb0 [<ffffffff8bdaa750>] cifs_readdir+0x15a0/0x1d50 [<ffffffff8b9a853f>] iterate_dir+0x28f/0x4b0 [<ffffffff8b9a9aed>] __x64_sys_getdents64+0xfd/0x200 [<ffffffff8cf6da05>] do_syscall_64+0x95/0x1a0 [<ffffffff8d00012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e And addresses these BUG splats when unmounting the SMB filesystem: BUG: Dentry ffff888140590ba0{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] WARNING: CPU: 3 PID: 3433 at fs/dcache.c:1536 umount_check+0xd0/0x100 Modules linked in: CPU: 3 UID: 0 PID: 3433 Comm: bash Not tainted 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:umount_check+0xd0/0x100 Code: 8d 7c 24 40 e8 31 5a f4 ff 49 8b 54 24 40 41 56 49 89 e9 45 89 e8 48 89 d9 41 57 48 89 de 48 c7 c7 80 e7 db ac e8 f0 72 9a ff <0f> 0b 58 31 c0 5a 5b 5d 41 5c 41 5d 41 5e 41 5f e9 2b e5 5d 01 41 RSP: 0018:ffff88811cc27978 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff888140590ba0 RCX: ffffffffaaf20bae RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff8881f6fb6f40 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed1023984ee3 R10: ffff88811cc2771f R11: 00000000016cfcc0 R12: ffff888134383e08 R13: 0000000000000002 R14: ffff8881462ec668 R15: ffffffffaceab4c0 FS: 00007f23bfa98740(0000) GS:ffff8881f6f80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556de4a6f808 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> d_walk+0x6a/0x530 shrink_dcache_for_umount+0x6a/0x200 generic_shutdown_super+0x52/0x2a0 kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 0d 11 93 0d 00 f7 d8 64 89 01 b8 ff ff ff ff eb bf 0f 1f 44 00 00 b8 50 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 92 0d 00 f7 d8 64 89 01 48 RSP: 002b:00007ffee9138598 EFLAGS: 00000246 ORIG_RAX: 0000000000000050 RAX: 0000000000000000 RBX: 0000558f1803e9a0 RCX: 00007f23bfb93ae7 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000558f1803e9a0 RBP: 0000558f1803e600 R08: 0000000000000007 R09: 0000558f17fab610 R10: d91d5ec34ab757b0 R11: 0000000000000246 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000015 R15: 0000000000000000 </TASK> irq event stamp: 1163486 hardirqs last enabled at (1163485): [<ffffffffac98d344>] _raw_spin_unlock_irqrestore+0x34/0x60 hardirqs last disabled at (1163486): [<ffffffffac97dcfc>] __schedule+0xc7c/0x19a0 softirqs last enabled at (1163482): [<ffffffffab79a3ee>] __smb_send_rqst+0x3de/0x990 softirqs last disabled at (1163480): [<ffffffffac2314f1>] release_sock+0x21/0xf0 ---[ end trace 0000000000000000 ]--- VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 1 UID: 0 PID: 3433 Comm: bash Tainted: G W 6.12.0-rc4-g850925a8133c-dirty torvalds#49 Tainted: [W]=WARN Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 Call Trace: <TASK> kill_anon_super+0x22/0x40 cifs_kill_sb+0x159/0x1e0 deactivate_locked_super+0x66/0xe0 cleanup_mnt+0x140/0x210 task_work_run+0xfb/0x170 syscall_exit_to_user_mode+0x29f/0x2b0 do_syscall_64+0xa1/0x1a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f23bfb93ae7 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:generic_shutdown_super+0x290/0x2a0 Code: e8 15 7c f7 ff 48 8b 5d 28 48 89 df e8 09 7c f7 ff 48 8b 0b 48 89 ee 48 8d 95 68 06 00 00 48 c7 c7 80 7f db ac e8 00 69 af ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 RSP: 0018:ffff88811cc27a50 EFLAGS: 00010246 RAX: 000000000000003e RBX: ffffffffae994420 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffffffffab06180e RDI: ffff8881f6eb18c8 RBP: ffff8881462ec000 R08: 0000000000000001 R09: ffffed103edd6319 R10: ffff8881f6eb18cb R11: 00000000016d3158 R12: ffff8881462ec9c0 R13: ffff8881462ec050 R14: 0000000000000001 R15: 0000000000000000 FS: 00007f23bfa98740(0000) GS:ffff8881f6e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8364005d68 CR3: 0000000123c80000 CR4: 0000000000350ef0 This reproduces eventually with an SMB mount and two shells running these loops concurrently - while true; do cd ~; sleep 1; for i in {1..3}; do cd /mnt/test/subdir; echo $PWD; sleep 1; cd ..; echo $PWD; sleep 1; done; echo ...; done - while true; do iptables -F OUTPUT; mount -t cifs -a; for _ in {0..2}; do ls /mnt/test/subdir/ | wc -l; done; iptables -I OUTPUT -p tcp --dport 445 -j DROP; sleep 10 echo "unmounting"; umount -l -t cifs -a; echo "done unmounting"; sleep 20 echo "recovering"; iptables -F OUTPUT; sleep 10; done Fixes: ebe98f1 ("cifs: enable caching of directories for which a lease is held") Fixes: 5c86919 ("smb: client: fix use-after-free in smb2_query_info_compound()") Cc: [email protected] Signed-off-by: Paul Aurich <[email protected]> Signed-off-by: Steve French <[email protected]>

rm redundant empty line

5a4d9e4

a completely trivial patch to remove a completely redundant blank line.

arjun024 closed this Oct 2, 2013

aejsmith pushed a commit to aejsmith/linux that referenced this pull request Jul 20, 2015

Merge pull request torvalds#49 from HarveyHunt/ci20-v3.18-led-rearrange

80b3133

MIPS: Ci20: Change DT node for LED triggers to match silkscreen

diaevd referenced this pull request in diaevd/zen-kernel Dec 30, 2015

Merge pull request zen-kernel#49 from torstehu/inode-apply-to-post-3.…

4a12733

…11-kernels Correct the KERNEL_VERSION logic regarding whether or not

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rm redundant empty line #49

rm redundant empty line #49

arjun024 commented Oct 2, 2013

arjun024 commented Oct 2, 2013

jamen commented Mar 13, 2016

rm redundant empty line #49

rm redundant empty line #49

Conversation

arjun024 commented Oct 2, 2013

arjun024 commented Oct 2, 2013

jamen commented Mar 13, 2016