-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 32 bit support - Done, but needs testing! #4
Comments
Please test it with the current kernel |
Recompiled the 06 May Version. Left-Right Channels were swapped. Gain has increased and is normal |
I think the channels were swapped before and it is now right. At least for my codec this is the case. For testing 32 bit support and higher bitrates, I will upload the ess9018 driver soon. |
Are you thinking of providing a driver set of supported DACs and we can just modprobe the corresponding dac on startup ? |
Yes? |
As steveha8838 reported in issue #1 32 bit runs smoothly (or at least 24 bit files via 32 bit mode) |
-Top level ARC makefile removes -I for platform headers -asm/irq.h no longer includes plat/irq.h -platform makefile adds -I for it's specfic platform headers -platform code to directly include it's plat/irq.h -Linker script needed plat/memmap.h for CCM info, already in .config Signed-off-by: Vineet Gupta <[email protected]> Cc: Arnd Bergmann <[email protected]> Acked-by: Arnd Bergmann <[email protected]>
This fixes an oops where a LAYOUTGET is in still in the rpciod queue, but the requesting processes has been killed. Without this, killing the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout to NULL while the LAYOUTGET rpc task still references it. Example oops: BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] PGD 7365b067 PUD 7365d067 PMD 0 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4 CPU 0 Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa01bd586>] [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4] RSP: 0018:ffff88007b0c1c88 EFLAGS: 00010246 RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958 RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958 R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e FS: 0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0) Stack: ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791 Call Trace: [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4] [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc] [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc] Reported-by: Tigran Mkrtchyan <[email protected]> Signed-off-by: Weston Andros Adamson <[email protected]> Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
…ernel/git/vgupta/arc Pull new ARC architecture from Vineet Gupta: "Initial ARC Linux port with some fixes on top for 3.9-rc1: I would like to introduce the Linux port to ARC Processors (from Synopsys) for 3.9-rc1. The patch-set has been discussed on the public lists since Nov and has received a fair bit of review, specially from Arnd, tglx, Al and other subsystem maintainers for DeviceTree, kgdb... The arch bits are in arch/arc, some asm-generic changes (acked by Arnd), a minor change to PARISC (acked by Helge). The series is a touch bigger for a new port for 2 main reasons: 1. It enables a basic kernel in first sub-series and adds ptrace/kgdb/.. later 2. Some of the fallout of review (DeviceTree support, multi-platform- image support) were added on top of orig series, primarily to record the revision history. This updated pull request additionally contains - fixes due to our GNU tools catching up with the new syscall/ptrace ABI - some (minor) cross-arch Kconfig updates." * tag 'arc-v3.9-rc1-late' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (82 commits) ARC: split elf.h into uapi and export it for userspace ARC: Fixup the current ABI version ARC: gdbserver using regset interface possibly broken ARC: Kconfig cleanup tracking cross-arch Kconfig pruning in merge window ARC: make a copy of flat DT ARC: [plat-arcfpga] DT arc-uart bindings change: "baud" => "current-speed" ARC: Ensure CONFIG_VIRT_TO_BUS is not enabled ARC: Fix pt_orig_r8 access ARC: [3.9] Fallout of hlist iterator update ARC: 64bit RTSC timestamp hardware issue ARC: Don't fiddle with non-existent caches ARC: Add self to MAINTAINERS ARC: Provide a default serial.h for uart drivers needing BASE_BAUD ARC: [plat-arcfpga] defconfig for fully loaded ARC Linux ARC: [Review] Multi-platform image #8: platform registers SMP callbacks ARC: [Review] Multi-platform image #7: SMP common code to use callbacks ARC: [Review] Multi-platform image #6: cpu-to-dma-addr optional ARC: [Review] Multi-platform image #5: NR_IRQS defined by ARC core ARC: [Review] Multi-platform image #4: Isolate platform headers ARC: [Review] Multi-platform image #3: switch to board callback ...
…optimizations Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on assumptions about the implementation of memset and similar functions. The current ARM optimized memset code does not return the value of its first argument, as is usually expected from standard implementations. For instance in the following function: void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); waiter->magic = waiter; INIT_LIST_HEAD(&waiter->list); } compiled as: 800554d0 <debug_mutex_lock_common>: 800554d0: e92d4008 push {r3, lr} 800554d4: e1a00001 mov r0, r1 800554d8: e3a02010 mov r2, #16 ; 0x10 800554dc: e3a01011 mov r1, #17 ; 0x11 800554e0: eb04426e bl 80165ea0 <memset> 800554e4: e1a03000 mov r3, r0 800554e8: e583000c str r0, [r3, #12] 800554ec: e5830000 str r0, [r3] 800554f0: e5830004 str r0, [r3, #4] 800554f4: e8bd8008 pop {r3, pc} GCC assumes memset returns the value of pointer 'waiter' in register r0; causing register/memory corruptions. This patch fixes the return value of the assembly version of memset. It adds a 'mov' instruction and merges an additional load+store into existing load/store instructions. For ease of review, here is a breakdown of the patch into 4 simple steps: Step 1 ====== Perform the following substitutions: ip -> r8, then r0 -> ip, and insert 'mov ip, r0' as the first statement of the function. At this point, we have a memset() implementation returning the proper result, but corrupting r8 on some paths (the ones that were using ip). Step 2 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 1: save r8: - str lr, [sp, #-4]! + stmfd sp!, {r8, lr} and restore r8 on both exit paths: - ldmeqfd sp!, {pc} @ Now <64 bytes to go. + ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go. (...) tst r2, #16 stmneia ip!, {r1, r3, r8, lr} - ldr lr, [sp], #4 + ldmfd sp!, {r8, lr} Step 3 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 0: save r8: - stmfd sp!, {r4-r7, lr} + stmfd sp!, {r4-r8, lr} and restore r8 on both exit paths: bgt 3b - ldmeqfd sp!, {r4-r7, pc} + ldmeqfd sp!, {r4-r8, pc} (...) tst r2, #16 stmneia ip!, {r4-r7} - ldmfd sp!, {r4-r7, lr} + ldmfd sp!, {r4-r8, lr} Step 4 ====== Rewrite register list "r4-r7, r8" as "r4-r8". Signed-off-by: Ivan Djelic <[email protected]> Reviewed-by: Nicolas Pitre <[email protected]> Signed-off-by: Dirk Behme <[email protected]> Signed-off-by: Russell King <[email protected]>
The following script will produce a kernel oops: sudo ip netns add v sudo ip netns exec v ip ad add 127.0.0.1/8 dev lo sudo ip netns exec v ip link set lo up sudo ip netns exec v ip ro add 224.0.0.0/4 dev lo sudo ip netns exec v ip li add vxlan0 type vxlan id 42 group 239.1.1.1 dev lo sudo ip netns exec v ip link set vxlan0 up sudo ip netns del v where inspect by gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 107] 0xffffffffa0289e33 in ?? () (gdb) bt #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 #1 vxlan_stop (dev=0xffff88001bafa000) at drivers/net/vxlan.c:1087 #2 0xffffffff812cc498 in __dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1299 #3 0xffffffff812cd920 in dev_close_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:1335 #4 0xffffffff812cef31 in rollback_registered_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:4851 #5 0xffffffff812cf040 in unregister_netdevice_many (head=head@entry=0xffff88001f2e7dc8) at net/core/dev.c:5752 #6 0xffffffff812cf1ba in default_device_exit_batch (net_list=0xffff88001f2e7e18) at net/core/dev.c:6170 #7 0xffffffff812cab27 in cleanup_net (work=<optimized out>) at net/core/net_namespace.c:302 #8 0xffffffff810540ef in process_one_work (worker=0xffff88001ba9ed40, work=0xffffffff8167d020) at kernel/workqueue.c:2157 #9 0xffffffff810549d0 in worker_thread (__worker=__worker@entry=0xffff88001ba9ed40) at kernel/workqueue.c:2276 #10 0xffffffff8105870c in kthread (_create=0xffff88001f2e5d68) at kernel/kthread.c:168 #11 <signal handler called> #12 0x0000000000000000 in ?? () #13 0x0000000000000000 in ?? () (gdb) fr 0 #0 vxlan_leave_group (dev=0xffff88001bafa000) at drivers/net/vxlan.c:533 533 struct sock *sk = vn->sock->sk; (gdb) l 528 static int vxlan_leave_group(struct net_device *dev) 529 { 530 struct vxlan_dev *vxlan = netdev_priv(dev); 531 struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); 532 int err = 0; 533 struct sock *sk = vn->sock->sk; 534 struct ip_mreqn mreq = { 535 .imr_multiaddr.s_addr = vxlan->gaddr, 536 .imr_ifindex = vxlan->link, 537 }; (gdb) p vn->sock $4 = (struct socket *) 0x0 The kernel calls `vxlan_exit_net` when deleting the netns before shutting down vxlan interfaces. Later the removal of all vxlan interfaces, where `vn->sock` is already gone causes the oops. so we should manually shutdown all interfaces before deleting `vn->sock` as the patch does. Signed-off-by: Zang MingJie <[email protected]> Signed-off-by: David S. Miller <[email protected]>
snd_seq_timer_open() didn't catch the whole error path but let through if the timer id is a slave. This may lead to Oops by accessing the uninitialized pointer. BUG: unable to handle kernel NULL pointer dereference at 00000000000002ae IP: [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130 PGD 785cd067 PUD 76964067 PMD 0 Oops: 0002 [#4] SMP CPU 0 Pid: 4288, comm: trinity-child7 Tainted: G D W 3.9.0-rc1+ raspberrypi#100 Bochs Bochs RIP: 0010:[<ffffffff819b3477>] [<ffffffff819b3477>] snd_seq_timer_open+0xe7/0x130 RSP: 0018:ffff88006ece7d38 EFLAGS: 00010246 RAX: 0000000000000286 RBX: ffff88007851b400 RCX: 0000000000000000 RDX: 000000000000ffff RSI: ffff88006ece7d58 RDI: ffff88006ece7d38 RBP: ffff88006ece7d98 R08: 000000000000000a R09: 000000000000fffe R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8800792c5400 R14: 0000000000e8f000 R15: 0000000000000007 FS: 00007f7aaa650700(0000) GS:ffff88007f800000(0000) GS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002ae CR3: 000000006efec000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process trinity-child7 (pid: 4288, threadinfo ffff88006ece6000, task ffff880076a8a290) Stack: 0000000000000286 ffffffff828f2be0 ffff88006ece7d58 ffffffff810f354d 65636e6575716573 2065756575712072 ffff8800792c0030 0000000000000000 ffff88006ece7d98 ffff8800792c5400 ffff88007851b400 ffff8800792c5520 Call Trace: [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff819b17e9>] snd_seq_queue_timer_open+0x29/0x70 [<ffffffff819ae01a>] snd_seq_ioctl_set_queue_timer+0xda/0x120 [<ffffffff819acb9b>] snd_seq_do_ioctl+0x9b/0xd0 [<ffffffff819acbe0>] snd_seq_ioctl+0x10/0x20 [<ffffffff811b9542>] do_vfs_ioctl+0x522/0x570 [<ffffffff8130a4b3>] ? file_has_perm+0x83/0xa0 [<ffffffff810f354d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff811b95ed>] sys_ioctl+0x5d/0xa0 [<ffffffff813663fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81faed69>] system_call_fastpath+0x16/0x1b Reported-and-tested-by: Tommi Rantala <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
ca22e56 (driver-core: implement 'sysdev' functionality for regular devices and buses) has introduced bus_register macro with a static key to distinguish different subsys mutex classes. This however doesn't work for different subsys which use a common registering function. One example is subsys_system_register (and mce_device and cpu_device). In the end this leads to the following lockdep splat: [ 207.271924] ====================================================== [ 207.271932] [ INFO: possible circular locking dependency detected ] [ 207.271942] 3.9.0-rc1-0.7-default+ #34 Not tainted [ 207.271948] ------------------------------------------------------- [ 207.271957] bash/10493 is trying to acquire lock: [ 207.271963] (subsys mutex){+.+.+.}, at: [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.271987] [ 207.271987] but task is already holding lock: [ 207.271995] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60 [ 207.272012] [ 207.272012] which lock already depends on the new lock. [ 207.272012] [ 207.272023] [ 207.272023] the existing dependency chain (in reverse order) is: [ 207.272033] [ 207.272033] -> #4 (cpu_hotplug.lock){+.+.+.}: [ 207.272044] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272056] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272069] [<ffffffff81046ba9>] get_online_cpus+0x29/0x40 [ 207.272082] [<ffffffff81185210>] drain_all_stock+0x30/0x150 [ 207.272094] [<ffffffff811853da>] mem_cgroup_reclaim+0xaa/0xe0 [ 207.272104] [<ffffffff8118775e>] __mem_cgroup_try_charge+0x51e/0xcf0 [ 207.272114] [<ffffffff81188486>] mem_cgroup_charge_common+0x36/0x60 [ 207.272125] [<ffffffff811884da>] mem_cgroup_newpage_charge+0x2a/0x30 [ 207.272135] [<ffffffff81150531>] do_wp_page+0x231/0x830 [ 207.272147] [<ffffffff8115151e>] handle_pte_fault+0x19e/0x8d0 [ 207.272157] [<ffffffff81151da8>] handle_mm_fault+0x158/0x1e0 [ 207.272166] [<ffffffff814b6153>] do_page_fault+0x2a3/0x4e0 [ 207.272178] [<ffffffff814b2578>] page_fault+0x28/0x30 [ 207.272189] [ 207.272189] -> #3 (&mm->mmap_sem){++++++}: [ 207.272199] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272208] [<ffffffff8114c5ad>] might_fault+0x6d/0x90 [ 207.272218] [<ffffffff811a11e3>] filldir64+0xb3/0x120 [ 207.272229] [<ffffffffa013fc19>] call_filldir+0x89/0x130 [ext3] [ 207.272248] [<ffffffffa0140377>] ext3_readdir+0x6b7/0x7e0 [ext3] [ 207.272263] [<ffffffff811a1519>] vfs_readdir+0xa9/0xc0 [ 207.272273] [<ffffffff811a15cb>] sys_getdents64+0x9b/0x110 [ 207.272284] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272296] [ 207.272296] -> #2 (&type->i_mutex_dir_key#3){+.+.+.}: [ 207.272309] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272319] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272329] [<ffffffff8119c254>] link_path_walk+0x6f4/0x9a0 [ 207.272339] [<ffffffff8119e7fa>] path_openat+0xba/0x470 [ 207.272349] [<ffffffff8119ecf8>] do_filp_open+0x48/0xa0 [ 207.272358] [<ffffffff8118d81c>] file_open_name+0xdc/0x110 [ 207.272369] [<ffffffff8118d885>] filp_open+0x35/0x40 [ 207.272378] [<ffffffff8135c76e>] _request_firmware+0x52e/0xb20 [ 207.272389] [<ffffffff8135cdd6>] request_firmware+0x16/0x20 [ 207.272399] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode] [ 207.272416] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode] [ 207.272431] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode] [ 207.272444] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100 [ 207.272457] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4 [ 207.272472] [<ffffffff81000202>] do_one_initcall+0x42/0x180 [ 207.272485] [<ffffffff810bbeff>] load_module+0x19df/0x1b70 [ 207.272499] [<ffffffff810bc376>] sys_init_module+0xe6/0x130 [ 207.272511] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272523] [ 207.272523] -> #1 (umhelper_sem){++++.+}: [ 207.272537] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272548] [<ffffffff814ae9c4>] down_read+0x34/0x50 [ 207.272559] [<ffffffff81062bff>] usermodehelper_read_trylock+0x4f/0x100 [ 207.272575] [<ffffffff8135c7dd>] _request_firmware+0x59d/0xb20 [ 207.272587] [<ffffffff8135cdd6>] request_firmware+0x16/0x20 [ 207.272599] [<ffffffffa03bdb91>] request_microcode_fw+0x61/0xd0 [microcode] [ 207.272613] [<ffffffffa03bd554>] microcode_init_cpu+0x104/0x150 [microcode] [ 207.272627] [<ffffffffa03bd61c>] mc_device_add+0x7c/0xb0 [microcode] [ 207.272641] [<ffffffff8134a419>] subsys_interface_register+0xc9/0x100 [ 207.272654] [<ffffffffa04fc0f4>] 0xffffffffa04fc0f4 [ 207.272666] [<ffffffff81000202>] do_one_initcall+0x42/0x180 [ 207.272678] [<ffffffff810bbeff>] load_module+0x19df/0x1b70 [ 207.272690] [<ffffffff810bc376>] sys_init_module+0xe6/0x130 [ 207.272702] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272715] [ 207.272715] -> #0 (subsys mutex){+.+.+.}: [ 207.272729] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0 [ 207.272740] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.272751] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.272763] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.272775] [<ffffffff81349114>] device_del+0x134/0x1f0 [ 207.272786] [<ffffffff813491f2>] device_unregister+0x22/0x60 [ 207.272798] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad [ 207.272812] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130 [ 207.272824] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10 [ 207.272839] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350 [ 207.272851] [<ffffffff81499130>] cpu_down+0x40/0x60 [ 207.272862] [<ffffffff8149cc55>] store_online+0x75/0xe0 [ 207.272874] [<ffffffff813474a0>] dev_attr_store+0x20/0x30 [ 207.272886] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150 [ 207.272900] [<ffffffff8118e10b>] vfs_write+0xcb/0x130 [ 207.272911] [<ffffffff8118e924>] sys_write+0x64/0xa0 [ 207.272923] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b [ 207.272936] [ 207.272936] other info that might help us debug this: [ 207.272936] [ 207.272952] Chain exists of: [ 207.272952] subsys mutex --> &mm->mmap_sem --> cpu_hotplug.lock [ 207.272952] [ 207.272973] Possible unsafe locking scenario: [ 207.272973] [ 207.272984] CPU0 CPU1 [ 207.272992] ---- ---- [ 207.273000] lock(cpu_hotplug.lock); [ 207.273009] lock(&mm->mmap_sem); [ 207.273020] lock(cpu_hotplug.lock); [ 207.273031] lock(subsys mutex); [ 207.273040] [ 207.273040] *** DEADLOCK *** [ 207.273040] [ 207.273055] 5 locks held by bash/10493: [ 207.273062] #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff81209049>] sysfs_write_file+0x49/0x150 [ 207.273080] #1: (s_active#150){.+.+.+}, at: [<ffffffff812090c2>] sysfs_write_file+0xc2/0x150 [ 207.273099] #2: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81027557>] cpu_hotplug_driver_lock+0x17/0x20 [ 207.273121] #3: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8149911c>] cpu_down+0x2c/0x60 [ 207.273140] #4: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81046ccf>] cpu_hotplug_begin+0x2f/0x60 [ 207.273158] [ 207.273158] stack backtrace: [ 207.273170] Pid: 10493, comm: bash Not tainted 3.9.0-rc1-0.7-default+ #34 [ 207.273180] Call Trace: [ 207.273192] [<ffffffff810ab373>] print_circular_bug+0x223/0x310 [ 207.273204] [<ffffffff810ae002>] __lock_acquire+0x13b2/0x15f0 [ 207.273216] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0 [ 207.273227] [<ffffffff810ae329>] lock_acquire+0xe9/0x120 [ 207.273239] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0 [ 207.273251] [<ffffffff814ad807>] mutex_lock_nested+0x37/0x360 [ 207.273263] [<ffffffff8134af27>] ? bus_remove_device+0x37/0x1c0 [ 207.273274] [<ffffffff812086b0>] ? sysfs_hash_and_remove+0x60/0xc0 [ 207.273286] [<ffffffff8134af27>] bus_remove_device+0x37/0x1c0 [ 207.273298] [<ffffffff81349114>] device_del+0x134/0x1f0 [ 207.273309] [<ffffffff813491f2>] device_unregister+0x22/0x60 [ 207.273321] [<ffffffff814a24ea>] mce_cpu_callback+0x15e/0x1ad [ 207.273332] [<ffffffff814b6402>] notifier_call_chain+0x72/0x130 [ 207.273344] [<ffffffff81073d6e>] __raw_notifier_call_chain+0xe/0x10 [ 207.273356] [<ffffffff81498f76>] _cpu_down+0x1d6/0x350 [ 207.273368] [<ffffffff81027557>] ? cpu_hotplug_driver_lock+0x17/0x20 [ 207.273380] [<ffffffff81499130>] cpu_down+0x40/0x60 [ 207.273391] [<ffffffff8149cc55>] store_online+0x75/0xe0 [ 207.273402] [<ffffffff813474a0>] dev_attr_store+0x20/0x30 [ 207.273413] [<ffffffff812090d9>] sysfs_write_file+0xd9/0x150 [ 207.273425] [<ffffffff8118e10b>] vfs_write+0xcb/0x130 [ 207.273436] [<ffffffff8118e924>] sys_write+0x64/0xa0 [ 207.273447] [<ffffffff814bb599>] system_call_fastpath+0x16/0x1b Which reports a false possitive deadlock because it sees: 1) load_module -> subsys_interface_register -> mc_deveice_add (*) -> subsys->p->mutex -> link_path_walk -> lookup_slow -> i_mutex 2) sys_write -> _cpu_down -> cpu_hotplug_begin -> cpu_hotplug.lock -> mce_cpu_callback -> mce_device_remove(**) -> device_unregister -> bus_remove_device -> subsys mutex 3) vfs_readdir -> i_mutex -> filldir64 -> might_fault -> might_lock_read(mmap_sem) -> page_fault -> mmap_sem -> drain_all_stock -> cpu_hotplug.lock but 1) takes cpu_subsys subsys (*) but 2) takes mce_device subsys (**) so the deadlock is not possible AFAICS. The fix is quite simple. We can pull the key inside bus_type structure because they are defined per device so the pointer will be unique as well. bus_register doesn't need to be a macro anymore so change it to the inline. We could get rid of __bus_register as there is no other caller but maybe somebody will want to use a different key so keep it around for now. Reported-by: Li Zefan <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Jiri Kosina <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
The first backtrace appears on tx path with DMA mapping operations debug enabled. [ 345.637919] ------------[ cut here ]------------ [ 345.637971] WARNING: at lib/dma-debug.c:937 check_unmap+0x4df/0x910() [ 345.637977] Hardware name: System Name [ 345.637987] sis900 0000:00:01.1: DMA-API: device driver failed to check map error[device address=0x000000000d4aed02] [si ze=60 bytes] [mapped as single] [ 345.637993] Modules linked in: bridge stp llc dmfe sundance 3c59x sis900 [ 345.638022] Pid: 0, comm: swapper Not tainted 3.9.0-rc6+ #4 [ 345.638028] Call Trace: [ 345.638042] [<c122097f>] ? check_unmap+0x4df/0x910 [ 345.638059] [<c102b19c>] warn_slowpath_common+0x7c/0xa0 [ 345.638070] [<c122097f>] ? check_unmap+0x4df/0x910 [ 345.638081] [<c102b23e>] warn_slowpath_fmt+0x2e/0x30 [ 345.638092] [<c122097f>] check_unmap+0x4df/0x910 [ 345.638107] [<c100bfeb>] ? save_stack_trace+0x2b/0x50 [ 345.638120] [<c107238e>] ? mark_lock+0x31e/0x5d0 [ 345.638132] [<c1072b2c>] ? __lock_acquire+0x4ec/0x7d0 [ 345.638143] [<c1220f6d>] debug_dma_unmap_page+0x6d/0x80 [ 345.638166] [<cf834dec>] sis900_interrupt+0x49c/0x860 [sis900] [ 345.638195] [<c1094b73>] handle_irq_event_percpu+0x43/0x1c0 [ 345.638206] [<c1094d1e>] ? handle_irq_event+0x2e/0x60 [ 345.638217] [<c1094d27>] handle_irq_event+0x37/0x60 [ 345.638235] [<c10973f0>] ? irq_set_chip_data+0x40/0x40 [ 345.638246] [<c1097442>] handle_level_irq+0x52/0xa0 [ 345.638251] <IRQ> [<c1003629>] ? do_IRQ+0x39/0xa0 [ 345.638293] [<c1484631>] ? common_interrupt+0x31/0x36 [ 345.638347] [<d08c2c52>] ? br_flood_forward+0x12/0x20 [bridge] [ 345.638364] [<d08c2d40>] ? br_dev_queue_push_xmit+0x60/0x60 [bridge] [ 345.638381] [<d08c3b2b>] ? br_handle_frame_finish+0x25b/0x280 [bridge] [ 345.638399] [<d08c3ce3>] ? br_handle_frame+0x193/0x290 [bridge] [ 345.638416] [<d08c3b50>] ? br_handle_frame_finish+0x280/0x280 [bridge] [ 345.638431] [<c13b3c87>] ? __netif_receive_skb_core+0x1d7/0x710 [ 345.638442] [<c13b3b19>] ? __netif_receive_skb_core+0x69/0x710 [ 345.638454] [<c13b41e1>] ? __netif_receive_skb+0x21/0x70 [ 345.638464] [<c13b42b5>] ? process_backlog+0x85/0x130 [ 345.638476] [<c13b4bbb>] ? net_rx_action+0xfb/0x1d0 [ 345.638497] [<c1032768>] ? __do_softirq+0xa8/0x1f0 [ 345.638527] [<c147daad>] ? _raw_spin_unlock+0x1d/0x20 [ 345.638538] [<c10038c0>] ? handle_irq+0x20/0xd0 [ 345.638550] [<c1032f27>] ? irq_exit+0x97/0xa0 [ 345.638560] [<c1003632>] ? do_IRQ+0x42/0xa0 [ 345.638580] [<c104d003>] ? hrtimer_start+0x23/0x30 [ 345.638580] [<c1484631>] ? common_interrupt+0x31/0x36 [ 345.638580] [<c1008703>] ? default_idle+0x33/0xc0 [ 345.638580] [<c10086ac>] ? cpu_idle+0x4c/0x70 [ 345.638580] [<c14787e0>] ? rest_init+0xa0/0xb0 [ 345.638580] [<c1478740>] ? reciprocal_value+0x50/0x50 [ 345.638580] [<c16b5bcf>] ? start_kernel+0x28f/0x320 [ 345.638580] [<c16b54e0>] ? repair_env_string+0x60/0x60 [ 345.638580] [<c16b5269>] ? i386_start_kernel+0x39/0xa0 [ 345.638580] ---[ end trace a244264b69b8a7ae ]--- [ 345.638580] Mapped at: [ 345.638580] [<c1221c65>] debug_dma_map_page+0x65/0x110 [ 345.638580] [<cf8355a9>] sis900_start_xmit+0x129/0x210 [sis900] [ 345.638580] [<c13b2527>] dev_hard_start_xmit+0x1b7/0x530 [ 345.638580] [<c13cc32e>] sch_direct_xmit+0x8e/0x280 [ 345.638580] [<c13b4e39>] dev_queue_xmit+0x1a9/0x5b0 Signed-off-by: Denis Kirjanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
This fixes the following lockdep splat: [ 66.460000] ================================= [ 66.460000] [ INFO: inconsistent lock state ] [ 66.460000] 3.9.0-rc5-00161-ga48dd49 #4 Not tainted [ 66.460000] --------------------------------- [ 66.460000] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 66.460000] swapper/1 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 66.460000] (timer_lock){+.?...}, at: [<d0006cde>] rs_poll+0x12/0xdc [ 66.460000] {SOFTIRQ-ON-W} state was registered at: [ 66.460000] [<d00421f0>] lock_acquire+0xec/0x13c [ 66.460000] [<d01ea036>] _raw_spin_lock+0x3a/0x84 [ 66.460000] [<d0006c8c>] rs_open+0x18/0x58 [ 66.460000] [<d0139ea2>] tty_open+0x262/0x3cc [ 66.460000] [<d00942e0>] chrdev_open+0x8c/0xe0 [ 66.460000] [<d00907b2>] do_dentry_open$isra$16+0x10e/0x190 [ 66.460000] [<d0091141>] finish_open+0x39/0x48 [ 66.460000] [<d009a0b4>] do_last$isra$34+0x6c4/0x824 [ 66.460000] [<d009a27a>] path_openat+0x66/0x310 [ 66.460000] [<d009a53a>] do_filp_open+0x16/0x44 [ 66.460000] [<d0091445>] do_sys_open+0xd5/0x13c [ 66.460000] [<d00914be>] sys_open+0x12/0x18 [ 66.460000] [<d0413ffc>] kernel_init_freeable+0xe4/0x12c [ 66.460000] [<d01e2a9c>] kernel_init+0xc/0x9c [ 66.460000] [<d00044fc>] ret_from_kernel_thread+0x8/0xc [ 66.460000] irq event stamp: 132542 [ 66.460000] hardirqs last enabled at (132542): [<d01ea2ec>] _raw_spin_unlock_irq+0x30/0x44 [ 66.460000] hardirqs last disabled at (132541): [<d01ea11e>] _raw_spin_lock_irq+0xe/0x8c [ 66.460000] softirqs last enabled at (132234): [<d0017d32>] __do_softirq+0x216/0x2a4 [ 66.460000] softirqs last disabled at (132539): [<d0018024>] irq_exit+0x38/0x40 [ 66.460000] [ 66.460000] other info that might help us debug this: [ 66.460000] Possible unsafe locking scenario: [ 66.460000] [ 66.460000] CPU0 [ 66.460000] ---- [ 66.460000] lock(timer_lock); [ 66.460000] <Interrupt> [ 66.460000] lock(timer_lock); [ 66.460000] [ 66.460000] *** DEADLOCK *** [ 66.460000] [ 66.460000] 1 lock held by swapper/1: [ 66.460000] #0: (((&serial_timer))){+.-...}, at: [<d001c65c>] call_timer_fn+0x0/0x1f0 [ 66.460000] Stack: d7c2fac0 00000018 00000004 00000001 d7c2faa0 00000004 00000006 d7c2fa90 9003e87c d7c2fae0 d7c30000 d025a87c 00000001 0000000f 00000000 d7c2fac0 9004005d d7c2fb10 d7c30000 d7c30338 00000001 00000001 00000000 d7c30338 [ 66.460000] Call Trace: [ 66.460000] [<d01e4f93>] print_usage_bug$part$26+0x1c3/0x1c8 [ 66.460000] [<d003e87c>] mark_lock+0x2b4/0x440 [ 66.460000] [<d004005d>] __lock_acquire+0x54d/0x16c4 [ 66.460000] [<d00421f0>] lock_acquire+0xec/0x13c [ 66.460000] [<d01ea036>] _raw_spin_lock+0x3a/0x84 [ 66.460000] [<d0006cde>] rs_poll+0x12/0xdc [ 66.460000] [<d001c71a>] call_timer_fn+0xbe/0x1f0 [ 66.460000] [<d001cd90>] run_timer_softirq+0x198/0x1f4 [ 66.460000] [<d0017c30>] __do_softirq+0x114/0x2a4 [ 66.460000] [<d0018024>] irq_exit+0x38/0x40 [ 66.460000] [<d00046c0>] do_IRQ+0x44/0x48 [ 66.460000] [<d0005c58>] do_interrupt+0x4c/0x54 [ 66.460000] [<d0003c80>] common_exception_return+0x0/0x5c [ 66.460000] [<d006682c>] free_pcppages_bulk+0x254/0x308 [ 66.460000] Signed-off-by: Max Filippov <[email protected]> Signed-off-by: Chris Zankel <[email protected]>
This can easily be triggered if a new CPU is added (via ACPI hotplug mechanism) and from user-space you do: echo 1 > /sys/devices/system/cpu/cpu3/online (or wait for UDEV to do it) on a newly appeared physical CPU. The deadlock is that the "store_online" in drivers/base/cpu.c takes the cpu_hotplug_driver_lock() lock, then calls "cpu_up". "cpu_up" eventually ends up calling "save_mc_for_early" which also takes the cpu_hotplug_driver_lock() lock. And here is that lockdep thinks of it: smpboot: Stack at about ffff880075c39f44 smpboot: CPU3: has booted. microcode: CPU3 sig=0x206a7, pf=0x2, revision=0x25 ============================================= [ INFO: possible recursive locking detected ] 3.9.0upstream-10129-g167af0e #1 Not tainted --------------------------------------------- sh/2487 is trying to acquire lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 but task is already holding lock: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(x86_cpu_hotplug_driver_mutex); lock(x86_cpu_hotplug_driver_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by sh/2487: #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ca48d>] vfs_write+0x17d/0x190 #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff812464ef>] sysfs_write_file+0x3f/0x160 #2: (s_active#20){.+.+.+}, at: [<ffffffff81246578>] sysfs_write_file+0xc8/0x160 #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff81075512>] cpu_hotplug_driver_lock+0x12/0x20 #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff810961c2>] cpu_maps_update_begin+0x12/0x20 #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810962a7>] cpu_hotplug_begin+0x27/0x60 Suggested-and-Acked-by: Borislav Petkov <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] # for v3.9 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
Cleanup regex_lock and ftrace_lock locking points around ftrace_ops hash update code. The new rule is that regex_lock protects ops->*_hash read-update-write code for each ftrace_ops. Usually, hash update is done by following sequence. 1. allocate a new local hash and copy the original hash. 2. update the local hash. 3. move(actually, copy) back the local hash to ftrace_ops. 4. update ftrace entries if needed. 5. release the local hash. This makes regex_lock protect #1-#4, and ftrace_lock to protect #3, #4 and adding and removing ftrace_ops from the ftrace_ops_list. The ftrace_lock protects #3 as well because the move functions update the entries too. Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522 Cc: Srikar Dronamraju <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Tom Zanussi <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
… into node number df2d5ae ("workqueue: map an unbound workqueues to multiple per-node pool_workqueues") made unbound workqueues to map to multiple per-node pool_workqueues and accordingly updated workqueue_contested() so that, for unbound workqueues, it maps the specified @cpu to the NUMA node number to obtain the matching pool_workqueue to query the congested state. Before this change, workqueue_congested() ignored @cpu for unbound workqueues as there was only one pool_workqueue and some users (fscache) called it with WORK_CPU_UNBOUND. After the commit, this causes the following oops as WORK_CPU_UNBOUND gets translated to garbage by cpu_to_node(). BUG: unable to handle kernel paging request at ffff8803598d98b8 IP: [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa PGD 2421067 PUD 0 Oops: 0000 [#1] SMP CPU: 1 PID: 2689 Comm: cat Tainted: GF 3.9.0-fsdevel+ #4 task: ffff88003d801040 ti: ffff880025806000 task.ti: ffff880025806000 RIP: 0010:[<ffffffff81043b7e>] [<ffffffff81043b7e>] unbound_pwq_by_node+0xa1/0xfa RSP: 0018:ffff880025807ad8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff8800388a2400 RCX: 0000000000000003 RDX: ffff880025807fd8 RSI: ffffffff81a31420 RDI: ffff88003d8016e0 RBP: ffff880025807ae8 R08: ffff88003d801730 R09: ffffffffa00b4898 R10: ffffffff81044217 R11: ffff88003d801040 R12: 0000000064206e97 R13: ffff880036059d98 R14: ffff880038cc8080 R15: ffff880038cc82d0 FS: 00007f21afd9c740(0000) GS:ffff88003d100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff8803598d98b8 CR3: 000000003df49000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff8800388a2400 0000000000000002 ffff880025807b18 ffffffff810442ce ffffffff81044217 ffff880000000002 ffff8800371b4080 ffff88003d112ec0 ffff880025807b38 ffffffffa00810b0 ffff880036059d88 ffff880036059be8 Call Trace: [<ffffffff810442ce>] workqueue_congested+0xb7/0x12c [<ffffffffa00810b0>] fscache_enqueue_object+0xb2/0xe8 [fscache] [<ffffffffa007facd>] __fscache_acquire_cookie+0x3b9/0x56c [fscache] [<ffffffffa00ad8fe>] nfs_fscache_set_inode_cookie+0xee/0x132 [nfs] [<ffffffffa009e112>] do_open+0x9/0xd [nfs] [<ffffffff810e804a>] do_dentry_open+0x175/0x24b [<ffffffff810e8298>] finish_open+0x41/0x51 Fix it by using smp_processor_id() if @cpu is WORK_CPU_UNBOUND. Signed-off-by: Tejun Heo <[email protected]> Reported-by: David Howells <[email protected]> Tested-and-Acked-by: David Howells <[email protected]>
commit fb745e9 ("net/802/mrp: fix possible race condition when calling mrp_pdu_queue()") introduced a lockdep splat. [ 19.735147] ================================= [ 19.735235] [ INFO: inconsistent lock state ] [ 19.735324] 3.9.2-build-0063 #4 Not tainted [ 19.735412] --------------------------------- [ 19.735500] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 19.735592] rmmod/1840 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 19.735682] (&(&app->lock)->rlock#2){+.?...}, at: [<f862bb5b>] mrp_uninit_applicant+0x69/0xba [mrp] app->lock is normally taken under softirq context, so disable BH to avoid the splat. Reported-by: Denys Fedoryshchenko <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: David Ward <[email protected]> Cc: Cong Wang <[email protected]> Tested-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
It is required to enable respective clock-domain before enabling any clock/module inside that clock-domain. During common-clock migration, .clkdm_name field got missed for "clkdiv32k_ick" clock, which leaves "clk_24mhz_clkdm" unused; so it will be disabled even if childs of this clock-domain is enabled, which keeps child modules in idle mode. This fixes the kernel crash observed on AM335xEVM-SK platform, where clkdiv32_ick clock is being used as a gpio debounce clock and since clkdiv32k_ick is in idle mode it leads to below crash - Crash Log: ========== [ 2.598347] Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa1ac150 [ 2.606434] Internal error: : 1028 [#1] SMP ARM [ 2.611207] Modules linked in: [ 2.614449] CPU: 0 Not tainted (3.8.4-01382-g1f449cd-dirty #4) [ 2.620973] PC is at _set_gpio_debounce+0x60/0x104 [ 2.626025] LR is at clk_enable+0x30/0x3c Cc: [email protected] # v3.9 Signed-off-by: Vaibhav Hiremath <[email protected]> Cc: Rajendra Nayak <[email protected]> Acked-by: Paul Walmsley <[email protected]> Signed-off-by: Tony Lindgren <[email protected]>
Daniel Petre reported crashes in icmp_dst_unreach() with following call graph: #3 [ffff88003fc03938] __stack_chk_fail at ffffffff81037f77 #4 [ffff88003fc03948] icmp_send at ffffffff814d5fec #5 [ffff88003fc03ae8] ipv4_link_failure at ffffffff814a1795 #6 [ffff88003fc03af8] ipgre_tunnel_xmit at ffffffff814e7965 #7 [ffff88003fc03b78] dev_hard_start_xmit at ffffffff8146e032 #8 [ffff88003fc03bc8] sch_direct_xmit at ffffffff81487d66 #9 [ffff88003fc03c08] __qdisc_run at ffffffff81487efd #10 [ffff88003fc03c48] dev_queue_xmit at ffffffff8146e5a7 #11 [ffff88003fc03c88] ip_finish_output at ffffffff814ab596 Daniel found a similar problem mentioned in http://lkml.indiana.edu/hypermail/linux/kernel/1007.0/00961.html And indeed this is the root cause : skb->cb[] contains data fooling IP stack. We must clear IPCB in ip_tunnel_xmit() sooner in case dst_link_failure() is called. Or else skb->cb[] might contain garbage from GSO segmentation layer. A similar fix was tested on linux-3.9, but gre code was refactored in linux-3.10. I'll send patches for stable kernels as well. Many thanks to Daniel for providing reports, patches and testing ! Reported-by: Daniel Petre <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Commit be283b2 ( Btrfs: use helper to cleanup tree roots) introduced the following bug, BUG: unable to handle kernel NULL pointer dereference at 0000000000000034 IP: [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] [...] Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox RIP: 0010:[<ffffffffa039368c>] [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730) [...] Call Trace: [<ffffffffa0398a99>] btrfs_search_slot+0x104/0x64d [btrfs] [<ffffffffa039aea4>] btrfs_next_old_leaf+0xa7/0x334 [btrfs] [<ffffffffa039b141>] btrfs_next_leaf+0x10/0x12 [btrfs] [<ffffffffa039ea13>] caching_thread+0x1a3/0x2e0 [btrfs] [<ffffffffa03d8811>] worker_loop+0x14b/0x48e [btrfs] [<ffffffffa03d86c6>] ? btrfs_queue_worker+0x25c/0x25c [btrfs] [<ffffffff81068d3d>] kthread+0x8d/0x95 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 [<ffffffff8151e5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81068cb0>] ? kthread_freezable_should_stop+0x43/0x43 RIP [<ffffffffa039368c>] extent_buffer_get+0x4/0xa [btrfs] We've free'ed commit_root before actually getting to free block groups where caching thread needs valid extent_root->commit_root. Signed-off-by: Liu Bo <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>
This commit fixes a lockdep-detected deadlock by moving a wake_up() call out from a rnp->lock critical section. Please see below for the long version of this story. On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote: > [12572.705832] ====================================================== > [12572.750317] [ INFO: possible circular locking dependency detected ] > [12572.796978] 3.10.0-rc3+ raspberrypi#39 Not tainted > [12572.833381] ------------------------------------------------------- > [12572.862233] trinity-child17/31341 is trying to acquire lock: > [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12572.878859] > but task is already holding lock: > [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12572.903381] > which lock already depends on the new lock. > > [12572.927541] > the existing dependency chain (in reverse order) is: > [12572.943736] > -> #4 (&ctx->lock){-.-...}: > [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0 > [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0 > [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0 > [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70 > [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e > [12573.018321] > -> #3 (&rq->lock){-.-.-.}: > [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260 > [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470 > [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30 > [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140 > [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe > [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c > [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf > [12573.109528] > -> #2 (&p->pi_lock){-.-.-.}: > [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320 > [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20 > [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40 > [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90 > [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50 > [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 > [12573.221379] > -> #1 (&rsp->gp_wq){..-.-.}: > [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50 > [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 Notice the above call chain. rcu_start_future_gp() is called with the rnp->lock held. Then it calls rcu_start_gp_advance, which does a wakeup. You can't do wakeups while holding the rnp->lock, as that would mean that you could not do a rcu_read_unlock() while holding the rq lock, or any lock that was taken while holding the rq lock. This is because... (See below). > [12573.295067] > -> #0 (rcu_node_0){..-.-.}: > [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 Notice the above trace. perf took its own ctx->lock, which can be taken while holding the rq lock. While holding this lock, it did a rcu_read_unlock(). The perf_lock_task_context() basically looks like: rcu_read_lock(); raw_spin_lock(ctx->lock); rcu_read_unlock(); Now, what looks to have happened, is that we scheduled after taking that first rcu_read_lock() but before taking the spin lock. When we scheduled back in and took the ctx->lock, the following rcu_read_unlock() triggered the "special" code. The rcu_read_unlock_special() takes the rnp->lock, which gives us a possible deadlock scenario. CPU0 CPU1 CPU2 ---- ---- ---- rcu_nocb_kthread() lock(rq->lock); lock(ctx->lock); lock(rnp->lock); wake_up(); lock(rq->lock); rcu_read_unlock(); rcu_read_unlock_special(); lock(rnp->lock); lock(ctx->lock); **** DEADLOCK **** > [12573.382068] > other info that might help us debug this: > > [12573.403229] Chain exists of: > rcu_node_0 --> &rq->lock --> &ctx->lock > > [12573.424471] Possible unsafe locking scenario: > > [12573.438499] CPU0 CPU1 > [12573.445599] ---- ---- > [12573.452691] lock(&ctx->lock); > [12573.459799] lock(&rq->lock); > [12573.467010] lock(&ctx->lock); > [12573.474192] lock(rcu_node_0); > [12573.481262] > *** DEADLOCK *** > > [12573.501931] 1 lock held by trinity-child17/31341: > [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12573.516475] > stack backtrace: > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ raspberrypi#39 > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00 > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40 > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0 > [12573.567856] Call Trace: > [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b > [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f > [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100 > [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0 > [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40 > [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0 > [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370 > [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50 > [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 This commit delays the wakeup via irq_work(), which is what perf and ftrace use to perform wakeups in critical sections. Reported-by: Dave Jones <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
In Steven Rostedt's words: > I've been debugging the last couple of days why my tests have been > locking up. One of my tracing tests, runs all available tracers. The > lockup always happened with the mmiotrace, which is used to trace > interactions between priority drivers and the kernel. But to do this > easily, when the tracer gets registered, it disables all but the boot > CPUs. The lockup always happened after it got done disabling the CPUs. > > Then I decided to try this: > > while :; do > for i in 1 2 3; do > echo 0 > /sys/devices/system/cpu/cpu$i/online > done > for i in 1 2 3; do > echo 1 > /sys/devices/system/cpu/cpu$i/online > done > done > > Well, sure enough, that locked up too, with the same users. Doing a > sysrq-w (showing all blocked tasks): > > [ 2991.344562] task PC stack pid father > [ 2991.344562] rcu_preempt D ffff88007986fdf8 0 10 2 0x00000000 > [ 2991.344562] ffff88007986fc98 0000000000000002 ffff88007986fc48 0000000000000908 > [ 2991.344562] ffff88007986c280 ffff88007986ffd8 ffff88007986ffd8 00000000001d3c80 > [ 2991.344562] ffff880079248a40 ffff88007986c280 0000000000000000 00000000fffd4295 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81541750>] schedule_timeout+0xbc/0xf9 > [ 2991.344562] [<ffffffff8154bec0>] ? ftrace_call+0x5/0x2f > [ 2991.344562] [<ffffffff81049513>] ? cascade+0xa8/0xa8 > [ 2991.344562] [<ffffffff815417ab>] schedule_timeout_uninterruptible+0x1e/0x20 > [ 2991.344562] [<ffffffff810c980c>] rcu_gp_kthread+0x502/0x94b > [ 2991.344562] [<ffffffff81062791>] ? __init_waitqueue_head+0x50/0x50 > [ 2991.344562] [<ffffffff810c930a>] ? rcu_gp_fqs+0x64/0x64 > [ 2991.344562] [<ffffffff81061cdb>] kthread+0xb1/0xb9 > [ 2991.344562] [<ffffffff81091e31>] ? lock_release_holdtime.part.23+0x4e/0x55 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] kworker/0:1 D ffffffff81a30680 0 47 2 0x00000000 > [ 2991.344562] Workqueue: events cpuset_hotplug_workfn > [ 2991.344562] ffff880078dbbb58 0000000000000002 0000000000000006 00000000000000d8 > [ 2991.344562] ffff880078db8100 ffff880078dbbfd8 ffff880078dbbfd8 00000000001d3c80 > [ 2991.344562] ffff8800779ca5c0 ffff880078db8100 ffffffff81541fcf 0000000000000000 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24 > [ 2991.344562] [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff8103d11b>] ? get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40 > [ 2991.344562] [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50 > [ 2991.344562] [<ffffffff810af7e6>] rebuild_sched_domains_locked+0x6e/0x3a8 > [ 2991.344562] [<ffffffff810b0ec6>] rebuild_sched_domains+0x1c/0x2a > [ 2991.344562] [<ffffffff810b109b>] cpuset_hotplug_workfn+0x1c7/0x1d3 > [ 2991.344562] [<ffffffff810b0ed9>] ? cpuset_hotplug_workfn+0x5/0x1d3 > [ 2991.344562] [<ffffffff81058e07>] process_one_work+0x2d4/0x4d1 > [ 2991.344562] [<ffffffff81058d3a>] ? process_one_work+0x207/0x4d1 > [ 2991.344562] [<ffffffff8105964c>] worker_thread+0x2e7/0x3b5 > [ 2991.344562] [<ffffffff81059365>] ? rescuer_thread+0x332/0x332 > [ 2991.344562] [<ffffffff81061cdb>] kthread+0xb1/0xb9 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] [<ffffffff8154c1dc>] ret_from_fork+0x7c/0xb0 > [ 2991.344562] [<ffffffff81061c2a>] ? __init_kthread_worker+0x58/0x58 > [ 2991.344562] bash D ffffffff81a4aa80 0 2618 2612 0x10000000 > [ 2991.344562] ffff8800379abb58 0000000000000002 0000000000000006 0000000000000c2c > [ 2991.344562] ffff880077fea140 ffff8800379abfd8 ffff8800379abfd8 00000000001d3c80 > [ 2991.344562] ffff8800779ca5c0 ffff880077fea140 ffffffff81541fcf 0000000000000000 > [ 2991.344562] Call Trace: > [ 2991.344562] [<ffffffff81541fcf>] ? __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff815437ba>] schedule+0x64/0x66 > [ 2991.344562] [<ffffffff81543a39>] schedule_preempt_disabled+0x18/0x24 > [ 2991.344562] [<ffffffff81541fcf>] __mutex_lock_common+0x3d4/0x609 > [ 2991.344562] [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff81530078>] ? rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff815422ff>] mutex_lock_nested+0x3b/0x40 > [ 2991.344562] [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e > [ 2991.344562] [<ffffffff81091c99>] ? __lock_is_held+0x32/0x53 > [ 2991.344562] [<ffffffff81548912>] notifier_call_chain+0x6b/0x98 > [ 2991.344562] [<ffffffff810671fd>] __raw_notifier_call_chain+0xe/0x10 > [ 2991.344562] [<ffffffff8103cf64>] __cpu_notify+0x20/0x32 > [ 2991.344562] [<ffffffff8103cf8d>] cpu_notify_nofail+0x17/0x36 > [ 2991.344562] [<ffffffff815225de>] _cpu_down+0x154/0x259 > [ 2991.344562] [<ffffffff81522710>] cpu_down+0x2d/0x3a > [ 2991.344562] [<ffffffff81526351>] store_online+0x4e/0xe7 > [ 2991.344562] [<ffffffff8134d764>] dev_attr_store+0x20/0x22 > [ 2991.344562] [<ffffffff811b3c5f>] sysfs_write_file+0x108/0x144 > [ 2991.344562] [<ffffffff8114c5ef>] vfs_write+0xfd/0x158 > [ 2991.344562] [<ffffffff8114c928>] SyS_write+0x5c/0x83 > [ 2991.344562] [<ffffffff8154c494>] tracesys+0xdd/0xe2 > > As well as held locks: > > [ 3034.728033] Showing all locks held in the system: > [ 3034.728033] 1 lock held by rcu_preempt/10: > [ 3034.728033] #0: (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff810c9471>] rcu_gp_kthread+0x167/0x94b > [ 3034.728033] 4 locks held by kworker/0:1/47: > [ 3034.728033] #0: (events){.+.+.+}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1 > [ 3034.728033] #1: (cpuset_hotplug_work){+.+.+.}, at: [<ffffffff81058d3a>] process_one_work+0x207/0x4d1 > [ 3034.728033] #2: (cpuset_mutex){+.+.+.}, at: [<ffffffff810b0ec1>] rebuild_sched_domains+0x17/0x2a > [ 3034.728033] #3: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103d11b>] get_online_cpus+0x3c/0x50 > [ 3034.728033] 1 lock held by mingetty/2563: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2565: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2569: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2572: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 1 lock held by mingetty/2575: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > [ 3034.728033] 7 locks held by bash/2618: > [ 3034.728033] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff8114bc3f>] file_start_write+0x2a/0x2c > [ 3034.728033] #1: (&buffer->mutex#2){+.+.+.}, at: [<ffffffff811b3b93>] sysfs_write_file+0x3c/0x144 > [ 3034.728033] #2: (s_active#54){.+.+.+}, at: [<ffffffff811b3c3e>] sysfs_write_file+0xe7/0x144 > [ 3034.728033] #3: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff810217c2>] cpu_hotplug_driver_lock+0x17/0x19 > [ 3034.728033] #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103d196>] cpu_maps_update_begin+0x17/0x19 > [ 3034.728033] #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8103cfd8>] cpu_hotplug_begin+0x2c/0x6d > [ 3034.728033] #6: (rcu_preempt_state.onoff_mutex){+.+...}, at: [<ffffffff81530078>] rcu_cpu_notify+0x2f5/0x86e > [ 3034.728033] 1 lock held by bash/2980: > [ 3034.728033] #0: (&ldata->atomic_read_lock){+.+...}, at: [<ffffffff8131e28a>] n_tty_read+0x252/0x7e8 > > Things looked a little weird. Also, this is a deadlock that lockdep did > not catch. But what we have here does not look like a circular lock > issue: > > Bash is blocked in rcu_cpu_notify(): > > 1961 /* Exclude any attempts to start a new grace period. */ > 1962 mutex_lock(&rsp->onoff_mutex); > > > kworker is blocked in get_online_cpus(), which makes sense as we are > currently taking down a CPU. > > But rcu_preempt is not blocked on anything. It is simply sleeping in > rcu_gp_kthread (really rcu_gp_init) here: > > 1453 #ifdef CONFIG_PROVE_RCU_DELAY > 1454 if ((prandom_u32() % (rcu_num_nodes * 8)) == 0 && > 1455 system_state == SYSTEM_RUNNING) > 1456 schedule_timeout_uninterruptible(2); > 1457 #endif /* #ifdef CONFIG_PROVE_RCU_DELAY */ > > And it does this while holding the onoff_mutex that bash is waiting for. > > Doing a function trace, it showed me where it happened: > > [ 125.940066] rcu_pree-10 3.... 28384115273: schedule_timeout_uninterruptible <-rcu_gp_kthread > [...] > [ 125.940066] rcu_pree-10 3d..3 28384202439: sched_switch: prev_comm=rcu_preempt prev_pid=10 prev_prio=120 prev_state=D ==> next_comm=watchdog/3 next_pid=38 next_prio=120 > > The watchdog ran, and then: > > [ 125.940066] watchdog-38 3d..3 28384692863: sched_switch: prev_comm=watchdog/3 prev_pid=38 prev_prio=120 prev_state=P ==> next_comm=modprobe next_pid=2848 next_prio=118 > > Not sure what modprobe was doing, but shortly after that: > > [ 125.940066] modprobe-2848 3d..3 28385041749: sched_switch: prev_comm=modprobe prev_pid=2848 prev_prio=118 prev_state=R+ ==> next_comm=migration/3 next_pid=40 next_prio=0 > > Where the migration thread took down the CPU: > > [ 125.940066] migratio-40 3d..3 28389148276: sched_switch: prev_comm=migration/3 prev_pid=40 prev_prio=0 prev_state=P ==> next_comm=swapper/3 next_pid=0 next_prio=120 > > which finally did: > > [ 125.940066] <idle>-0 3...1 28389282142: arch_cpu_idle_dead <-cpu_startup_entry > [ 125.940066] <idle>-0 3...1 28389282548: native_play_dead <-arch_cpu_idle_dead > [ 125.940066] <idle>-0 3...1 28389282924: play_dead_common <-native_play_dead > [ 125.940066] <idle>-0 3...1 28389283468: idle_task_exit <-play_dead_common > [ 125.940066] <idle>-0 3...1 28389284644: amd_e400_remove_cpu <-play_dead_common > > > CPU 3 is now offline, the rcu_preempt thread that ran on CPU 3 is still > doing a schedule_timeout_uninterruptible() and it registered it's > timeout to the timer base for CPU 3. You would think that it would get > migrated right? The issue here is that the timer migration happens at > the CPU notifier for CPU_DEAD. The problem is that the rcu notifier for > CPU_DOWN is blocked waiting for the onoff_mutex to be released, which is > held by the thread that just put itself into a uninterruptible sleep, > that wont wake up until the CPU_DEAD notifier of the timer > infrastructure is called, which wont happen until the rcu notifier > finishes. Here's our deadlock! This commit breaks this deadlock cycle by substituting a shorter udelay() for the previous schedule_timeout_uninterruptible(), while at the same time increasing the probability of the delay. This maintains the intensity of the testing. Reported-by: Steven Rostedt <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: Steven Rostedt <[email protected]>
commit e8d0527 upstream. commit 2f7021a "cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()" caused a regression in CPU hotplug, because it lead to a deadlock between cpufreq governor worker thread and the CPU hotplug writer task. Lockdep splat corresponding to this deadlock is shown below: [ 60.277396] ====================================================== [ 60.277400] [ INFO: possible circular locking dependency detected ] [ 60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty raspberrypi#1744 Not tainted [ 60.277411] ------------------------------------------------------- [ 60.277417] bash/2225 is trying to acquire lock: [ 60.277422] ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280 [ 60.277444] but task is already holding lock: [ 60.277449] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60 [ 60.277465] which lock already depends on the new lock. [ 60.277472] the existing dependency chain (in reverse order) is: [ 60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}: [ 60.277490] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.277503] [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410 [ 60.277514] [<ffffffff81042cbc>] get_online_cpus+0x3c/0x60 [ 60.277522] [<ffffffff814b842a>] gov_queue_work+0x2a/0xb0 [ 60.277532] [<ffffffff814b7891>] cs_dbs_timer+0xc1/0xe0 [ 60.277543] [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0 [ 60.277552] [<ffffffff81063d31>] worker_thread+0x121/0x3a0 [ 60.277560] [<ffffffff8106ae2b>] kthread+0xdb/0xe0 [ 60.277569] [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0 [ 60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}: [ 60.277592] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.277600] [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410 [ 60.277608] [<ffffffff814b785d>] cs_dbs_timer+0x8d/0xe0 [ 60.277616] [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0 [ 60.277624] [<ffffffff81063d31>] worker_thread+0x121/0x3a0 [ 60.277633] [<ffffffff8106ae2b>] kthread+0xdb/0xe0 [ 60.277640] [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0 [ 60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}: [ 60.277661] [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30 [ 60.277669] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.277677] [<ffffffff810621ed>] flush_work+0x3d/0x280 [ 60.277685] [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120 [ 60.277693] [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20 [ 60.277701] [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0 [ 60.277709] [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20 [ 60.277719] [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100 [ 60.277728] [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0 [ 60.277737] [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c [ 60.277747] [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110 [ 60.277759] [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10 [ 60.277768] [<ffffffff815a0a68>] _cpu_down+0x88/0x330 [ 60.277779] [<ffffffff815a0d46>] cpu_down+0x36/0x50 [ 60.277788] [<ffffffff815a2748>] store_online+0x98/0xd0 [ 60.277796] [<ffffffff81452a28>] dev_attr_store+0x18/0x30 [ 60.277806] [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150 [ 60.277818] [<ffffffff8116806d>] vfs_write+0xbd/0x1f0 [ 60.277826] [<ffffffff811686fc>] SyS_write+0x4c/0xa0 [ 60.277834] [<ffffffff815bbbbe>] tracesys+0xd0/0xd5 [ 60.277842] other info that might help us debug this: [ 60.277848] Chain exists of: (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock [ 60.277864] Possible unsafe locking scenario: [ 60.277869] CPU0 CPU1 [ 60.277873] ---- ---- [ 60.277877] lock(cpu_hotplug.lock); [ 60.277885] lock(&j_cdbs->timer_mutex); [ 60.277892] lock(cpu_hotplug.lock); [ 60.277900] lock((&(&j_cdbs->work)->work)); [ 60.277907] *** DEADLOCK *** [ 60.277915] 6 locks held by bash/2225: [ 60.277919] #0: (sb_writers#6){.+.+.+}, at: [<ffffffff81168173>] vfs_write+0x1c3/0x1f0 [ 60.277937] #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff811d9e3c>] sysfs_write_file+0x3c/0x150 [ 60.277954] #2: (s_active#61){.+.+.+}, at: [<ffffffff811d9ec3>] sysfs_write_file+0xc3/0x150 [ 60.277972] #3: (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81024cf7>] cpu_hotplug_driver_lock+0x17/0x20 [ 60.277990] #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff815a0d32>] cpu_down+0x22/0x50 [ 60.278007] #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60 [ 60.278023] stack backtrace: [ 60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty raspberrypi#1744 [ 60.278037] Hardware name: Acer Aspire 5741G /Aspire 5741G , BIOS V1.20 02/08/2011 [ 60.278042] ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38 [ 60.278055] ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2 [ 60.278068] ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00 [ 60.278081] Call Trace: [ 60.278091] [<ffffffff815b3d90>] dump_stack+0x19/0x1b [ 60.278101] [<ffffffff815b0a8d>] print_circular_bug+0x2b6/0x2c5 [ 60.278111] [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30 [ 60.278123] [<ffffffff81067e08>] ? __kernel_text_address+0x58/0x80 [ 60.278134] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.278142] [<ffffffff810621b5>] ? flush_work+0x5/0x280 [ 60.278151] [<ffffffff810621ed>] flush_work+0x3d/0x280 [ 60.278159] [<ffffffff810621b5>] ? flush_work+0x5/0x280 [ 60.278169] [<ffffffff810a9b14>] ? mark_held_locks+0x94/0x140 [ 60.278178] [<ffffffff81062d77>] ? __cancel_work_timer+0x77/0x120 [ 60.278188] [<ffffffff810a9cbd>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 60.278196] [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120 [ 60.278206] [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20 [ 60.278214] [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0 [ 60.278225] [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20 [ 60.278234] [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100 [ 60.278244] [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0 [ 60.278255] [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c [ 60.278265] [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110 [ 60.278275] [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10 [ 60.278284] [<ffffffff815a0a68>] _cpu_down+0x88/0x330 [ 60.278292] [<ffffffff81024cf7>] ? cpu_hotplug_driver_lock+0x17/0x20 [ 60.278302] [<ffffffff815a0d46>] cpu_down+0x36/0x50 [ 60.278311] [<ffffffff815a2748>] store_online+0x98/0xd0 [ 60.278320] [<ffffffff81452a28>] dev_attr_store+0x18/0x30 [ 60.278329] [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150 [ 60.278337] [<ffffffff8116806d>] vfs_write+0xbd/0x1f0 [ 60.278347] [<ffffffff81185950>] ? fget_light+0x320/0x4b0 [ 60.278355] [<ffffffff811686fc>] SyS_write+0x4c/0xa0 [ 60.278364] [<ffffffff815bbbbe>] tracesys+0xd0/0xd5 [ 60.280582] smpboot: CPU 1 is now offline The intention of that commit was to avoid warnings during CPU hotplug, which indicated that offline CPUs were getting IPIs from the cpufreq governor's work items. But the real root-cause of that problem was commit a66b2e5 (cpufreq: Preserve sysfs files across suspend/resume) because it totally skipped all the cpufreq callbacks during CPU hotplug in the suspend/resume path, and hence it never actually shut down the cpufreq governor's worker threads during CPU offline in the suspend/resume path. Reflecting back, the reason why we never suspected that commit as the root-cause earlier, was that the original issue was reported with just the halt command and nobody had brought in suspend/resume to the equation. The reason for _that_ in turn, as it turns out, is that earlier halt/shutdown was being done by disabling non-boot CPUs while tasks were frozen, just like suspend/resume.... but commit cf7df37 (reboot: migrate shutdown/reboot to boot cpu) which came somewhere along that very same time changed that logic: shutdown/halt no longer takes CPUs offline. Thus, the test-cases for reproducing the bug were vastly different and thus we went totally off the trail. Overall, it was one hell of a confusion with so many commits affecting each other and also affecting the symptoms of the problems in subtle ways. Finally, now since the original problematic commit (a66b2e5) has been completely reverted, revert this intermediate fix too (2f7021a), to fix the CPU hotplug deadlock. Phew! Reported-by: Sergey Senozhatsky <[email protected]> Reported-by: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Srivatsa S. Bhat <[email protected]> Tested-by: Peter Wu <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 058ebd0 upstream. Jiri managed to trigger this warning: [] ====================================================== [] [ INFO: possible circular locking dependency detected ] [] 3.10.0+ raspberrypi#228 Tainted: G W [] ------------------------------------------------------- [] p/6613 is trying to acquire lock: [] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250 [] [] but task is already holding lock: [] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0 [] [] which lock already depends on the new lock. [] [] the existing dependency chain (in reverse order) is: [] [] -> #4 (&ctx->lock){-.-...}: [] -> #3 (&rq->lock){-.-.-.}: [] -> #2 (&p->pi_lock){-.-.-.}: [] -> #1 (&rnp->nocb_gp_wq[1]){......}: [] -> #0 (rcu_node_0){..-...}: Paul was quick to explain that due to preemptible RCU we cannot call rcu_read_unlock() while holding scheduler (or nested) locks when part of the read side critical section was preemptible. Therefore solve it by making the entire RCU read side non-preemptible. Also pull out the retry from under the non-preempt to play nice with RT. Reported-by: Jiri Olsa <[email protected]> Helped-out-by: Paul E. McKenney <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit 905a6f9 ] Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer which leads to a panic (from Srivatsa S. Bhat): BUG: unable to handle kernel paging request at ffff882018552020 IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4 Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012 Workqueue: netns cleanup_net task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000 RIP: 0010:[<ffffffffa0366b02>] [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP: 0018:ffff881039367bd8 EFLAGS: 00010286 RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200 RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68 RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222 R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040 R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0 Stack: ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0 Call Trace: [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6] [<ffffffff815bdecb>] inet_release+0xfb/0x220 [<ffffffff815bddf2>] ? inet_release+0x22/0x220 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6] [<ffffffff8151c1d9>] sock_release+0x29/0xa0 [<ffffffff81525520>] sk_release_kernel+0x30/0x70 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6] [<ffffffff8152fff9>] ops_exit_list+0x39/0x60 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0 [<ffffffff81075e3a>] process_one_work+0x1da/0x610 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610 [<ffffffff81076390>] worker_thread+0x120/0x3a0 [<ffffffff81076270>] ? process_one_work+0x610/0x610 [<ffffffff8107da2e>] kthread+0xee/0x100 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65 RIP [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP <ffff881039367bd8> CR2: ffff882018552020 Reported-by: Srivatsa S. Bhat <[email protected]> Tested-by: Srivatsa S. Bhat <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit ea3768b upstream. We used to keep the port's char device structs and the /sys entries around till the last reference to the port was dropped. This is actually unnecessary, and resulted in buggy behaviour: 1. Open port in guest 2. Hot-unplug port 3. Hot-plug a port with the same 'name' property as the unplugged one This resulted in hot-plug being unsuccessful, as a port with the same name already exists (even though it was unplugged). This behaviour resulted in a warning message like this one: -------------------8<--------------------------------------- WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted) Hardware name: KVM sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1' Call Trace: [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60 [<ffffffff812734b4>] ? kobject_add+0x44/0x70 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0 [<ffffffff8134b389>] ? device_add+0xc9/0x650 -------------------8<--------------------------------------- Instead of relying on guest applications to release all references to the ports, we should go ahead and unregister the port from all the core layers. Any open/read calls on the port will then just return errors, and an unplug/plug operation on the host will succeed as expected. This also caused buggy behaviour in case of the device removal (not just a port): when the device was removed (which means all ports on that device are removed automatically as well), the ports with active users would clean up only when the last references were dropped -- and it would be too late then to be referencing char device pointers, resulting in oopses: -------------------8<--------------------------------------- PID: 6162 TASK: ffff8801147ad500 CPU: 0 COMMAND: "cat" #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5 [exception RIP: strlen+2] RIP: ffffffff81272ae2 RSP: ffff88011b9d5d00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880118901c18 RCX: 0000000000000000 RDX: ffff88011799982c RSI: 00000000000000d0 RDI: 3a303030302f3030 RBP: ffff88011b9d5d38 R8: 0000000000000006 R9: ffffffffa0134500 R10: 0000000000001000 R11: 0000000000001000 R12: ffff880117a1cc10 R13: 00000000000000d0 R14: 0000000000000017 R15: ffffffff81aff700 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb #9 [ffff88011b9d5de0] device_del at ffffffff813440c7 -------------------8<--------------------------------------- So clean up when we have all the context, and all that's left to do when the references to the port have dropped is to free up the port struct itself. Reported-by: chayang <[email protected]> Reported-by: YOGANANTH SUBRAMANIAN <[email protected]> Reported-by: FuXiangChun <[email protected]> Reported-by: Qunfang Zhang <[email protected]> Reported-by: Sibiao Luo <[email protected]> Signed-off-by: Amit Shah <[email protected]> Signed-off-by: Rusty Russell <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 057db84 upstream. Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
(This is a VERY IMP change for low level interrupt/exception handling) ----------------------------------------------------------------------- WHAT ----------------------------------------------------------------------- * User 25 now saved in pt_regs->user_r25 (vs. tsk->thread_info.user_r25) * This allows Low level interrupt code to unconditionally save r25 (vs. the prev version which would only do it for U->K transition). Ofcourse for nested interrupts, only the pt_regs->user_r25 of bottom-most frame is useful. * simplifies the interrupt prologue/epilogue * Needed for ARCv2 ISA code and done here to keep design similar with ARCompact event handling ----------------------------------------------------------------------- WHY ------------------------------------------------------------------------- With CONFIG_ARC_CURR_IN_REG, r25 is used to cache "current" task pointer in kernel mode. So when entering kernel mode from User Mode - user r25 is specially safe-kept (it being a callee reg is NOT part of pt_regs which are saved by default on each interrupt/trap/exception) - r25 loaded with current task pointer. Further, if interrupt was taken in kernel mode, this is skipped since we know that r25 already has valid "current" pointer. With 2 level of interrupts in ARCompact ISA, detecting this is difficult but still possible, since we could be in kernel mode but r25 not already saved (in fact the stack itself might not have been switched). A. User mode B. L1 IRQ taken C. L2 IRQ taken (while on 1st line of L1 ISR) So in #C, although in kernel mode, r25 not saved (infact SP not switched at all) Given that ARcompact has manual stack switching, we could use a bit of trickey - The low level code would make sure that SP is only set to kernel mode value at the very end (after saving r25). So a non kernel mode SP, even if in kernel mode, meant r25 was NOT saved. The same paradigm won't work in ARCv2 ISA since SP is auto-switched so it's setting can't be delayed/constrained. Signed-off-by: Vineet Gupta <[email protected]>
…/kernel/git/vgupta/arc Pull first batch of ARC changes from Vineet Gupta: "There's a second bunch to follow next week - which depends on commits on other trees (irq/net). I'd have preferred the accompanying ARC change via respective trees, but it didn't workout somehow. Highlights of changes: - Continuation of ARC MM changes from 3.10 including zero page optimization Setting pagecache pages dirty by default Non executable stack by default Reducing dcache flushes for aliasing VIPT config - Long overdue rework of pt_regs machinery - removing the unused word gutters and adding ECR register to baseline (helps cleanup lot of low level code) - Support for ARC gcc 4.8 - Few other preventive fixes, cosmetics, usage of Kconfig helper.. The diffstat is larger than normal primarily because of arcregs.h header split as well as beautification of macros in entry.h" * tag 'arc-v3.11-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (32 commits) ARC: warn on improper stack unwind FDE entries arc: delete __cpuinit usage from all arc files ARC: [tlb-miss] Fix bug with CONFIG_ARC_DBG_TLB_MISS_COUNT ARC: [tlb-miss] Extraneous PTE bit testing/setting ARC: Adjustments for gcc 4.8 ARC: Setup Vector Table Base in early boot ARC: Remove explicit passing around of ECR ARC: pt_regs update #5: Use real ECR for pt_regs->event vs. synth values ARC: stop using pt_regs->orig_r8 ARC: pt_regs update #4: r25 saved/restored unconditionally ARC: K/U SP saved from one location in stack switching macro ARC: Entry Handler tweaks: Simplify branch for in-kernel preemption ARC: Entry Handler tweaks: Avoid hardcoded LIMMS for ECR values ARC: Increase readability of entry handlers ARC: pt_regs update #3: Remove unused gutter at start of callee_regs ARC: pt_regs update #2: Remove unused gutter at start of pt_regs ARC: pt_regs update #1: Align pt_regs end with end of kernel stack page ARC: pt_regs update #0: remove kernel stack canary ARC: [mm] Remove @Write argument to do_page_fault() ARC: [mm] Make stack/heap Non-executable by default ...
Jiri managed to trigger this warning: [] ====================================================== [] [ INFO: possible circular locking dependency detected ] [] 3.10.0+ raspberrypi#228 Tainted: G W [] ------------------------------------------------------- [] p/6613 is trying to acquire lock: [] (rcu_node_0){..-...}, at: [<ffffffff810ca797>] rcu_read_unlock_special+0xa7/0x250 [] [] but task is already holding lock: [] (&ctx->lock){-.-...}, at: [<ffffffff810f2879>] perf_lock_task_context+0xd9/0x2c0 [] [] which lock already depends on the new lock. [] [] the existing dependency chain (in reverse order) is: [] [] -> #4 (&ctx->lock){-.-...}: [] -> #3 (&rq->lock){-.-.-.}: [] -> #2 (&p->pi_lock){-.-.-.}: [] -> #1 (&rnp->nocb_gp_wq[1]){......}: [] -> #0 (rcu_node_0){..-...}: Paul was quick to explain that due to preemptible RCU we cannot call rcu_read_unlock() while holding scheduler (or nested) locks when part of the read side critical section was preemptible. Therefore solve it by making the entire RCU read side non-preemptible. Also pull out the retry from under the non-preempt to play nice with RT. Reported-by: Jiri Olsa <[email protected]> Helped-out-by: Paul E. McKenney <[email protected]> Cc: <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
commit 2f7021a "cpufreq: protect 'policy->cpus' from offlining during __gov_queue_work()" caused a regression in CPU hotplug, because it lead to a deadlock between cpufreq governor worker thread and the CPU hotplug writer task. Lockdep splat corresponding to this deadlock is shown below: [ 60.277396] ====================================================== [ 60.277400] [ INFO: possible circular locking dependency detected ] [ 60.277407] 3.10.0-rc7-dbg-01385-g241fd04-dirty raspberrypi#1744 Not tainted [ 60.277411] ------------------------------------------------------- [ 60.277417] bash/2225 is trying to acquire lock: [ 60.277422] ((&(&j_cdbs->work)->work)){+.+...}, at: [<ffffffff810621b5>] flush_work+0x5/0x280 [ 60.277444] but task is already holding lock: [ 60.277449] (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60 [ 60.277465] which lock already depends on the new lock. [ 60.277472] the existing dependency chain (in reverse order) is: [ 60.277477] -> #2 (cpu_hotplug.lock){+.+.+.}: [ 60.277490] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.277503] [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410 [ 60.277514] [<ffffffff81042cbc>] get_online_cpus+0x3c/0x60 [ 60.277522] [<ffffffff814b842a>] gov_queue_work+0x2a/0xb0 [ 60.277532] [<ffffffff814b7891>] cs_dbs_timer+0xc1/0xe0 [ 60.277543] [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0 [ 60.277552] [<ffffffff81063d31>] worker_thread+0x121/0x3a0 [ 60.277560] [<ffffffff8106ae2b>] kthread+0xdb/0xe0 [ 60.277569] [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0 [ 60.277580] -> #1 (&j_cdbs->timer_mutex){+.+...}: [ 60.277592] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.277600] [<ffffffff815b6157>] mutex_lock_nested+0x67/0x410 [ 60.277608] [<ffffffff814b785d>] cs_dbs_timer+0x8d/0xe0 [ 60.277616] [<ffffffff8106302d>] process_one_work+0x1cd/0x6a0 [ 60.277624] [<ffffffff81063d31>] worker_thread+0x121/0x3a0 [ 60.277633] [<ffffffff8106ae2b>] kthread+0xdb/0xe0 [ 60.277640] [<ffffffff815bb96c>] ret_from_fork+0x7c/0xb0 [ 60.277649] -> #0 ((&(&j_cdbs->work)->work)){+.+...}: [ 60.277661] [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30 [ 60.277669] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.277677] [<ffffffff810621ed>] flush_work+0x3d/0x280 [ 60.277685] [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120 [ 60.277693] [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20 [ 60.277701] [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0 [ 60.277709] [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20 [ 60.277719] [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100 [ 60.277728] [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0 [ 60.277737] [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c [ 60.277747] [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110 [ 60.277759] [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10 [ 60.277768] [<ffffffff815a0a68>] _cpu_down+0x88/0x330 [ 60.277779] [<ffffffff815a0d46>] cpu_down+0x36/0x50 [ 60.277788] [<ffffffff815a2748>] store_online+0x98/0xd0 [ 60.277796] [<ffffffff81452a28>] dev_attr_store+0x18/0x30 [ 60.277806] [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150 [ 60.277818] [<ffffffff8116806d>] vfs_write+0xbd/0x1f0 [ 60.277826] [<ffffffff811686fc>] SyS_write+0x4c/0xa0 [ 60.277834] [<ffffffff815bbbbe>] tracesys+0xd0/0xd5 [ 60.277842] other info that might help us debug this: [ 60.277848] Chain exists of: (&(&j_cdbs->work)->work) --> &j_cdbs->timer_mutex --> cpu_hotplug.lock [ 60.277864] Possible unsafe locking scenario: [ 60.277869] CPU0 CPU1 [ 60.277873] ---- ---- [ 60.277877] lock(cpu_hotplug.lock); [ 60.277885] lock(&j_cdbs->timer_mutex); [ 60.277892] lock(cpu_hotplug.lock); [ 60.277900] lock((&(&j_cdbs->work)->work)); [ 60.277907] *** DEADLOCK *** [ 60.277915] 6 locks held by bash/2225: [ 60.277919] #0: (sb_writers#6){.+.+.+}, at: [<ffffffff81168173>] vfs_write+0x1c3/0x1f0 [ 60.277937] #1: (&buffer->mutex){+.+.+.}, at: [<ffffffff811d9e3c>] sysfs_write_file+0x3c/0x150 [ 60.277954] #2: (s_active#61){.+.+.+}, at: [<ffffffff811d9ec3>] sysfs_write_file+0xc3/0x150 [ 60.277972] #3: (x86_cpu_hotplug_driver_mutex){+.+...}, at: [<ffffffff81024cf7>] cpu_hotplug_driver_lock+0x17/0x20 [ 60.277990] #4: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff815a0d32>] cpu_down+0x22/0x50 [ 60.278007] #5: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff81042d8b>] cpu_hotplug_begin+0x2b/0x60 [ 60.278023] stack backtrace: [ 60.278031] CPU: 3 PID: 2225 Comm: bash Not tainted 3.10.0-rc7-dbg-01385-g241fd04-dirty raspberrypi#1744 [ 60.278037] Hardware name: Acer Aspire 5741G /Aspire 5741G , BIOS V1.20 02/08/2011 [ 60.278042] ffffffff8204e110 ffff88014df6b9f8 ffffffff815b3d90 ffff88014df6ba38 [ 60.278055] ffffffff815b0a8d ffff880150ed3f60 ffff880150ed4770 3871c4002c8980b2 [ 60.278068] ffff880150ed4748 ffff880150ed4770 ffff880150ed3f60 ffff88014df6bb00 [ 60.278081] Call Trace: [ 60.278091] [<ffffffff815b3d90>] dump_stack+0x19/0x1b [ 60.278101] [<ffffffff815b0a8d>] print_circular_bug+0x2b6/0x2c5 [ 60.278111] [<ffffffff810ab826>] __lock_acquire+0x1766/0x1d30 [ 60.278123] [<ffffffff81067e08>] ? __kernel_text_address+0x58/0x80 [ 60.278134] [<ffffffff810ac6d4>] lock_acquire+0xa4/0x200 [ 60.278142] [<ffffffff810621b5>] ? flush_work+0x5/0x280 [ 60.278151] [<ffffffff810621ed>] flush_work+0x3d/0x280 [ 60.278159] [<ffffffff810621b5>] ? flush_work+0x5/0x280 [ 60.278169] [<ffffffff810a9b14>] ? mark_held_locks+0x94/0x140 [ 60.278178] [<ffffffff81062d77>] ? __cancel_work_timer+0x77/0x120 [ 60.278188] [<ffffffff810a9cbd>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 60.278196] [<ffffffff81062d8a>] __cancel_work_timer+0x8a/0x120 [ 60.278206] [<ffffffff81062e53>] cancel_delayed_work_sync+0x13/0x20 [ 60.278214] [<ffffffff814b89d9>] cpufreq_governor_dbs+0x529/0x6f0 [ 60.278225] [<ffffffff814b76a7>] cs_cpufreq_governor_dbs+0x17/0x20 [ 60.278234] [<ffffffff814b5df8>] __cpufreq_governor+0x48/0x100 [ 60.278244] [<ffffffff814b6b80>] __cpufreq_remove_dev.isra.14+0x80/0x3c0 [ 60.278255] [<ffffffff815adc0d>] cpufreq_cpu_callback+0x38/0x4c [ 60.278265] [<ffffffff81071a4d>] notifier_call_chain+0x5d/0x110 [ 60.278275] [<ffffffff81071b0e>] __raw_notifier_call_chain+0xe/0x10 [ 60.278284] [<ffffffff815a0a68>] _cpu_down+0x88/0x330 [ 60.278292] [<ffffffff81024cf7>] ? cpu_hotplug_driver_lock+0x17/0x20 [ 60.278302] [<ffffffff815a0d46>] cpu_down+0x36/0x50 [ 60.278311] [<ffffffff815a2748>] store_online+0x98/0xd0 [ 60.278320] [<ffffffff81452a28>] dev_attr_store+0x18/0x30 [ 60.278329] [<ffffffff811d9edb>] sysfs_write_file+0xdb/0x150 [ 60.278337] [<ffffffff8116806d>] vfs_write+0xbd/0x1f0 [ 60.278347] [<ffffffff81185950>] ? fget_light+0x320/0x4b0 [ 60.278355] [<ffffffff811686fc>] SyS_write+0x4c/0xa0 [ 60.278364] [<ffffffff815bbbbe>] tracesys+0xd0/0xd5 [ 60.280582] smpboot: CPU 1 is now offline The intention of that commit was to avoid warnings during CPU hotplug, which indicated that offline CPUs were getting IPIs from the cpufreq governor's work items. But the real root-cause of that problem was commit a66b2e5 (cpufreq: Preserve sysfs files across suspend/resume) because it totally skipped all the cpufreq callbacks during CPU hotplug in the suspend/resume path, and hence it never actually shut down the cpufreq governor's worker threads during CPU offline in the suspend/resume path. Reflecting back, the reason why we never suspected that commit as the root-cause earlier, was that the original issue was reported with just the halt command and nobody had brought in suspend/resume to the equation. The reason for _that_ in turn, as it turns out, is that earlier halt/shutdown was being done by disabling non-boot CPUs while tasks were frozen, just like suspend/resume.... but commit cf7df37 (reboot: migrate shutdown/reboot to boot cpu) which came somewhere along that very same time changed that logic: shutdown/halt no longer takes CPUs offline. Thus, the test-cases for reproducing the bug were vastly different and thus we went totally off the trail. Overall, it was one hell of a confusion with so many commits affecting each other and also affecting the symptoms of the problems in subtle ways. Finally, now since the original problematic commit (a66b2e5) has been completely reverted, revert this intermediate fix too (2f7021a), to fix the CPU hotplug deadlock. Phew! Reported-by: Sergey Senozhatsky <[email protected]> Reported-by: Bartlomiej Zolnierkiewicz <[email protected]> Signed-off-by: Srivatsa S. Bhat <[email protected]> Tested-by: Peter Wu <[email protected]> Cc: 3.10+ <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
Since commit 2025172 (spi/bitbang: Use core message pump), the following kernel crash is seen: Unable to handle kernel NULL pointer dereference at virtual address 0000000d pgd = 80004000 [0000000d] *pgd=00000000 Internal error: Oops: 5 [#1] SMP ARM Modules linked in: CPU: 1 PID: 48 Comm: spi32766 Not tainted 3.11.0-rc1+ #4 task: bfa3e580 ti: bfb90000 task.ti: bfb90000 PC is at spi_bitbang_transfer_one+0x50/0x248 LR is at spi_bitbang_transfer_one+0x20/0x248 ... ,and also the following build warning: drivers/spi/spi-bitbang.c: In function 'spi_bitbang_start': drivers/spi/spi-bitbang.c:436:31: warning: assignment from incompatible pointer type [enabled by default] In order to fix it, we need to change the first parameter of spi_bitbang_transfer_one() to 'struct spi_master *master'. Tested on a mx6qsabrelite by succesfully probing a SPI NOR flash. Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Mark Brown <[email protected]>
When we try to open a file with O_TMPFILE flag, we will trigger a bug. The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and this check always fails because we set ->i_nlink = 1 in inode_init_always(). We can use the following program to trigger it: int main(int argc, char *argv[]) { int fd; fd = open(argv[1], O_TMPFILE, 0666); if (fd < 0) { perror("open "); return -1; } close(fd); return 0; } The oops message looks like this: kernel: kernel BUG at fs/ext3/namei.c:1992! kernel: invalid opcode: 0000 [#1] SMP kernel: Modules linked in: ext4 jbd2 crc16 cpufreq_ondemand ipv6 dm_mirror dm_region_hash dm_log dm_mod parport_pc parport serio_raw sg dcdbas pcspkr i2c_i801 ehci_pci ehci_hcd button acpi_cpufreq mperf e1000e ptp pps_core ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core ext3 jbd sd_mod ahci libahci libata scsi_mod uhci_hcd kernel: CPU: 0 PID: 2882 Comm: tst_tmpfile Not tainted 3.11.0-rc1+ #4 kernel: Hardware name: Dell Inc. OptiPlex 780 /0V4W66, BIOS A05 08/11/2010 kernel: task: ffff880112d30050 ti: ffff8801124d4000 task.ti: ffff8801124d4000 kernel: RIP: 0010:[<ffffffffa00db5ae>] [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3] kernel: RSP: 0018:ffff8801124d5cc8 EFLAGS: 00010202 kernel: RAX: 0000000000000000 RBX: ffff880111510128 RCX: ffff8801114683a0 kernel: RDX: 0000000000000000 RSI: ffff880111510128 RDI: ffff88010fcf65a8 kernel: RBP: ffff8801124d5d18 R08: 0080000000000000 R09: ffffffffa00d3b7f kernel: R10: ffff8801114683a0 R11: ffff8801032a2558 R12: 0000000000000000 kernel: R13: ffff88010fcf6800 R14: ffff8801032a2558 R15: ffff8801115100d8 kernel: FS: 00007f5d172b5700(0000) GS:ffff880117c00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b kernel: CR2: 00007f5d16df15d0 CR3: 0000000110b1d000 CR4: 00000000000407f0 kernel: Stack: kernel: 000000000000000c ffff8801048a7dc8 ffff8801114685a8 ffffffffa00b80d7 kernel: ffff8801124d5e38 ffff8801032a2558 ffff88010ce24d68 0000000000000000 kernel: ffff88011146b300 ffff8801124d5d44 ffff8801124d5d78 ffffffffa00db7e1 kernel: Call Trace: kernel: [<ffffffffa00b80d7>] ? journal_start+0x8c/0xbd [jbd] kernel: [<ffffffffa00db7e1>] ext3_tmpfile+0xb2/0x13b [ext3] kernel: [<ffffffff821076f8>] path_openat+0x11f/0x5e7 kernel: [<ffffffff821c86b4>] ? list_del+0x11/0x30 kernel: [<ffffffff82065fa2>] ? __dequeue_entity+0x33/0x38 kernel: [<ffffffff82107cd5>] do_filp_open+0x3f/0x8d kernel: [<ffffffff82112532>] ? __alloc_fd+0x50/0x102 kernel: [<ffffffff820f9296>] do_sys_open+0x13b/0x1cd kernel: [<ffffffff820f935c>] SyS_open+0x1e/0x20 kernel: [<ffffffff82398c02>] system_call_fastpath+0x16/0x1b kernel: Code: 39 c7 0f 85 67 01 00 00 0f b7 03 25 00 f0 00 00 3d 00 40 00 00 74 18 3d 00 80 00 00 74 11 3d 00 a0 00 00 74 0a 83 7b 48 00 74 04 <0f> 0b eb fe 49 8b 85 50 03 00 00 4c 89 f6 48 c7 c7 c0 99 0e a0 kernel: RIP [<ffffffffa00db5ae>] ext3_orphan_add+0x6a/0x1eb [ext3] kernel: RSP <ffff8801124d5cc8> Here we couldn't call clear_nlink() directly because in d_tmpfile() we will call inode_dec_link_count() to decrease ->i_nlink. So this commit tries to call d_tmpfile() before ext4_orphan_add() to fix this problem. Signed-off-by: Zheng Liu <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]> Cc: Jan Kara <[email protected]> Cc: Al Viro <[email protected]>
Commits 6a1c068 and 9356b53, respectively 'tty: Convert termios_mutex to termios_rwsem' and 'n_tty: Access termios values safely' introduced a circular lock dependency with console_lock and termios_rwsem. The lockdep report [1] shows that n_tty_write() will attempt to claim console_lock while holding the termios_rwsem, whereas tty_do_resize() may already hold the console_lock while claiming the termios_rwsem. Since n_tty_write() and tty_do_resize() do not contend over the same data -- the tty->winsize structure -- correct the lock dependency by introducing a new lock which specifically serializes access to tty->winsize only. [1] Lockdep report ====================================================== [ INFO: possible circular locking dependency detected ] 3.10.0-0+tip-xeon+lockdep #0+tip Not tainted ------------------------------------------------------- modprobe/277 is trying to acquire lock: (&tty->termios_rwsem){++++..}, at: [<ffffffff81452656>] tty_do_resize+0x36/0xe0 but task is already holding lock: ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((fb_notifier_list).rwsem){.+.+.+}: [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff8175b797>] down_read+0x47/0x5c [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper] [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau] [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau] [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm] [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff813b13db>] local_pci_probe+0x4b/0x80 [<ffffffff813b1701>] pci_device_probe+0x111/0x120 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0 [<ffffffff81497bab>] __driver_attach+0xab/0xb0 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0 [<ffffffff814971fe>] driver_attach+0x1e/0x20 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290 [<ffffffff814982b7>] driver_register+0x77/0x170 [<ffffffff813b0454>] __pci_register_driver+0x64/0x70 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0 [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b -> #1 (console_lock){+.+.+.}: [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff810430a7>] console_lock+0x77/0x80 [<ffffffff8146b2a1>] con_flush_chars+0x31/0x50 [<ffffffff8145780c>] n_tty_write+0x1ec/0x4d0 [<ffffffff814541b9>] tty_write+0x159/0x2e0 [<ffffffff814543f5>] redirected_tty_write+0xb5/0xc0 [<ffffffff811ab9d5>] vfs_write+0xc5/0x1f0 [<ffffffff811abec5>] SyS_write+0x55/0xa0 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b -> #0 (&tty->termios_rwsem){++++..}: [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30 [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff8175b724>] down_write+0x44/0x70 [<ffffffff81452656>] tty_do_resize+0x36/0xe0 [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0 [<ffffffff8146c99f>] vc_resize+0x1f/0x30 [<ffffffff813e4535>] fbcon_init+0x385/0x5a0 [<ffffffff8146a4bc>] visual_init+0xbc/0x120 [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320 [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70 [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0 [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820 [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110 [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper] [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau] [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau] [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm] [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff813b13db>] local_pci_probe+0x4b/0x80 [<ffffffff813b1701>] pci_device_probe+0x111/0x120 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0 [<ffffffff81497bab>] __driver_attach+0xab/0xb0 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0 [<ffffffff814971fe>] driver_attach+0x1e/0x20 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290 [<ffffffff814982b7>] driver_register+0x77/0x170 [<ffffffff813b0454>] __pci_register_driver+0x64/0x70 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0 [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Chain exists of: &tty->termios_rwsem --> console_lock --> (fb_notifier_list).rwsem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((fb_notifier_list).rwsem); lock(console_lock); lock((fb_notifier_list).rwsem); lock(&tty->termios_rwsem); *** DEADLOCK *** 7 locks held by modprobe/277: #0: (&__lockdep_no_validate__){......}, at: [<ffffffff81497b5b>] __driver_attach+0x5b/0xb0 #1: (&__lockdep_no_validate__){......}, at: [<ffffffff81497b69>] __driver_attach+0x69/0xb0 #2: (drm_global_mutex){+.+.+.}, at: [<ffffffffa008a6dd>] drm_get_pci_dev+0xbd/0x2a0 [drm] #3: (registration_lock){+.+.+.}, at: [<ffffffff813d93f5>] register_framebuffer+0x25/0x320 #4: (&fb_info->lock){+.+.+.}, at: [<ffffffff813d8116>] lock_fb_info+0x26/0x60 #5: (console_lock){+.+.+.}, at: [<ffffffff813d95a4>] register_framebuffer+0x1d4/0x320 #6: ((fb_notifier_list).rwsem){.+.+.+}, at: [<ffffffff8107aac6>] __blocking_notifier_call_chain+0x56/0xc0 stack backtrace: CPU: 0 PID: 277 Comm: modprobe Not tainted 3.10.0-0+tip-xeon+lockdep #0+tip Hardware name: Dell Inc. Precision WorkStation T5400 /0RW203, BIOS A11 04/30/2012 ffffffff8213e5e0 ffff8802aa2fb298 ffffffff81755f19 ffff8802aa2fb2e8 ffffffff8174f506 ffff8802aa2fa000 ffff8802aa2fb378 ffff8802aa2ea8e8 ffff8802aa2ea910 ffff8802aa2ea8e8 0000000000000006 0000000000000007 Call Trace: [<ffffffff81755f19>] dump_stack+0x19/0x1b [<ffffffff8174f506>] print_circular_bug+0x1fb/0x20c [<ffffffff810b65c3>] __lock_acquire+0x1c43/0x1d30 [<ffffffff810b775e>] ? mark_held_locks+0xae/0x120 [<ffffffff810b78d5>] ? trace_hardirqs_on_caller+0x105/0x1d0 [<ffffffff810b6d62>] lock_acquire+0x92/0x1f0 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0 [<ffffffff8175b724>] down_write+0x44/0x70 [<ffffffff81452656>] ? tty_do_resize+0x36/0xe0 [<ffffffff81452656>] tty_do_resize+0x36/0xe0 [<ffffffff8146c841>] vc_do_resize+0x3e1/0x4c0 [<ffffffff8146c99f>] vc_resize+0x1f/0x30 [<ffffffff813e4535>] fbcon_init+0x385/0x5a0 [<ffffffff8146a4bc>] visual_init+0xbc/0x120 [<ffffffff8146cd13>] do_bind_con_driver+0x163/0x320 [<ffffffff8146cfa1>] do_take_over_console+0x61/0x70 [<ffffffff813e2b93>] do_fbcon_takeover+0x63/0xc0 [<ffffffff813e67a5>] fbcon_event_notify+0x715/0x820 [<ffffffff81762f9d>] notifier_call_chain+0x5d/0x110 [<ffffffff8107aadc>] __blocking_notifier_call_chain+0x6c/0xc0 [<ffffffff8107ab46>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813d7c0b>] fb_notifier_call_chain+0x1b/0x20 [<ffffffff813d95b2>] register_framebuffer+0x1e2/0x320 [<ffffffffa01043e1>] drm_fb_helper_initial_config+0x371/0x540 [drm_kms_helper] [<ffffffff8173cbcb>] ? kmemleak_alloc+0x5b/0xc0 [<ffffffff81198874>] ? kmem_cache_alloc_trace+0x104/0x290 [<ffffffffa01035e1>] ? drm_fb_helper_single_add_all_connectors+0x81/0xf0 [drm_kms_helper] [<ffffffffa01bcb05>] nouveau_fbcon_init+0x105/0x140 [nouveau] [<ffffffffa01ad0af>] nouveau_drm_load+0x43f/0x610 [nouveau] [<ffffffffa008a79e>] drm_get_pci_dev+0x17e/0x2a0 [drm] [<ffffffffa01ad4da>] nouveau_drm_probe+0x25a/0x2a0 [nouveau] [<ffffffff8175f162>] ? _raw_spin_unlock_irqrestore+0x42/0x80 [<ffffffff813b13db>] local_pci_probe+0x4b/0x80 [<ffffffff813b1701>] pci_device_probe+0x111/0x120 [<ffffffff814977eb>] driver_probe_device+0x8b/0x3a0 [<ffffffff81497bab>] __driver_attach+0xab/0xb0 [<ffffffff81497b00>] ? driver_probe_device+0x3a0/0x3a0 [<ffffffff814956ad>] bus_for_each_dev+0x5d/0xa0 [<ffffffff814971fe>] driver_attach+0x1e/0x20 [<ffffffff81496cc1>] bus_add_driver+0x111/0x290 [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffff814982b7>] driver_register+0x77/0x170 [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffff813b0454>] __pci_register_driver+0x64/0x70 [<ffffffffa008a9da>] drm_pci_init+0x11a/0x130 [drm] [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffffa022a000>] ? 0xffffffffa0229fff [<ffffffffa022a04d>] nouveau_drm_init+0x4d/0x1000 [nouveau] [<ffffffff810002ea>] do_one_initcall+0xea/0x1a0 [<ffffffff810c54cb>] load_module+0x123b/0x1bf0 [<ffffffff81399a50>] ? ddebug_proc_open+0xb0/0xb0 [<ffffffff813855ae>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff810c5f57>] SyS_init_module+0xd7/0x120 [<ffffffff817677c2>] system_call_fastpath+0x16/0x1b Signed-off-by: Peter Hurley <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Otherwise we end up dereferencing the already freed net->ipv6.mrt pointer which leads to a panic (from Srivatsa S. Bhat): BUG: unable to handle kernel paging request at ffff882018552020 IP: [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] PGD 290a067 PUD 207ffe0067 PMD 207ff1d067 PTE 8000002018552060 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: ebtable_nat ebtables nfs fscache nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables nfsd lockd nfs_acl exportfs auth_rpcgss autofs4 sunrpc 8021q garp bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter +ip6_tables ipv6 vfat fat vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support cdc_ether usbnet mii microcode i2c_i801 i2c_core lpc_ich mfd_core shpchp ioatdma dca mlx4_core be2net wmi acpi_cpufreq mperf ext4 jbd2 mbcache dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 7 Comm: kworker/u33:0 Not tainted 3.11.0-rc1-ea45e-a #4 Hardware name: IBM -[8737R2A]-/00Y2738, BIOS -[B2E120RUS-1.20]- 11/30/2012 Workqueue: netns cleanup_net task: ffff8810393641c0 ti: ffff881039366000 task.ti: ffff881039366000 RIP: 0010:[<ffffffffa0366b02>] [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP: 0018:ffff881039367bd8 EFLAGS: 00010286 RAX: ffff881039367fd8 RBX: ffff882018552000 RCX: dead000000200200 RDX: 0000000000000000 RSI: ffff881039367b68 RDI: ffff881039367b68 RBP: ffff881039367bf8 R08: ffff881039367b68 R09: 2222222222222222 R10: 2222222222222222 R11: 2222222222222222 R12: ffff882015a7a040 R13: ffff882014eb89c0 R14: ffff8820289e2800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88103fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff882018552020 CR3: 0000000001c0b000 CR4: 00000000000407f0 Stack: ffff881039367c18 ffff882014eb89c0 ffff882015e28c00 0000000000000000 ffff881039367c18 ffffffffa034d9d1 ffff8820289e2800 ffff882014eb89c0 ffff881039367c58 ffffffff815bdecb ffffffff815bddf2 ffff882014eb89c0 Call Trace: [<ffffffffa034d9d1>] rawv6_close+0x21/0x40 [ipv6] [<ffffffff815bdecb>] inet_release+0xfb/0x220 [<ffffffff815bddf2>] ? inet_release+0x22/0x220 [<ffffffffa032686f>] inet6_release+0x3f/0x50 [ipv6] [<ffffffff8151c1d9>] sock_release+0x29/0xa0 [<ffffffff81525520>] sk_release_kernel+0x30/0x70 [<ffffffffa034f14b>] icmpv6_sk_exit+0x3b/0x80 [ipv6] [<ffffffff8152fff9>] ops_exit_list+0x39/0x60 [<ffffffff815306fb>] cleanup_net+0xfb/0x1a0 [<ffffffff81075e3a>] process_one_work+0x1da/0x610 [<ffffffff81075dc9>] ? process_one_work+0x169/0x610 [<ffffffff81076390>] worker_thread+0x120/0x3a0 [<ffffffff81076270>] ? process_one_work+0x610/0x610 [<ffffffff8107da2e>] kthread+0xee/0x100 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 [<ffffffff8162a99c>] ret_from_fork+0x7c/0xb0 [<ffffffff8107d940>] ? __init_kthread_worker+0x70/0x70 Code: 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 4c 8b 67 30 49 89 fd e8 db 3c 1e e1 49 8b 9c 24 90 08 00 00 48 85 db 74 06 <4c> 39 6b 20 74 20 bb f3 ff ff ff e8 8e 3c 1e e1 89 d8 4c 8b 65 RIP [<ffffffffa0366b02>] ip6mr_sk_done+0x32/0xb0 [ipv6] RSP <ffff881039367bd8> CR2: ffff882018552020 Reported-by: Srivatsa S. Bhat <[email protected]> Tested-by: Srivatsa S. Bhat <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
We used to keep the port's char device structs and the /sys entries around till the last reference to the port was dropped. This is actually unnecessary, and resulted in buggy behaviour: 1. Open port in guest 2. Hot-unplug port 3. Hot-plug a port with the same 'name' property as the unplugged one This resulted in hot-plug being unsuccessful, as a port with the same name already exists (even though it was unplugged). This behaviour resulted in a warning message like this one: -------------------8<--------------------------------------- WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted) Hardware name: KVM sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:04.0/virtio0/virtio-ports/vport0p1' Call Trace: [<ffffffff8106b607>] ? warn_slowpath_common+0x87/0xc0 [<ffffffff8106b6f6>] ? warn_slowpath_fmt+0x46/0x50 [<ffffffff811f2319>] ? sysfs_add_one+0xc9/0x130 [<ffffffff811f23e8>] ? create_dir+0x68/0xb0 [<ffffffff811f2469>] ? sysfs_create_dir+0x39/0x50 [<ffffffff81273129>] ? kobject_add_internal+0xb9/0x260 [<ffffffff812733d8>] ? kobject_add_varg+0x38/0x60 [<ffffffff812734b4>] ? kobject_add+0x44/0x70 [<ffffffff81349de4>] ? get_device_parent+0xf4/0x1d0 [<ffffffff8134b389>] ? device_add+0xc9/0x650 -------------------8<--------------------------------------- Instead of relying on guest applications to release all references to the ports, we should go ahead and unregister the port from all the core layers. Any open/read calls on the port will then just return errors, and an unplug/plug operation on the host will succeed as expected. This also caused buggy behaviour in case of the device removal (not just a port): when the device was removed (which means all ports on that device are removed automatically as well), the ports with active users would clean up only when the last references were dropped -- and it would be too late then to be referencing char device pointers, resulting in oopses: -------------------8<--------------------------------------- PID: 6162 TASK: ffff8801147ad500 CPU: 0 COMMAND: "cat" #0 [ffff88011b9d5a90] machine_kexec at ffffffff8103232b #1 [ffff88011b9d5af0] crash_kexec at ffffffff810b9322 #2 [ffff88011b9d5bc0] oops_end at ffffffff814f4a50 #3 [ffff88011b9d5bf0] die at ffffffff8100f26b #4 [ffff88011b9d5c20] do_general_protection at ffffffff814f45e2 #5 [ffff88011b9d5c50] general_protection at ffffffff814f3db5 [exception RIP: strlen+2] RIP: ffffffff81272ae2 RSP: ffff88011b9d5d00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff880118901c18 RCX: 0000000000000000 RDX: ffff88011799982c RSI: 00000000000000d0 RDI: 3a303030302f3030 RBP: ffff88011b9d5d38 R8: 0000000000000006 R9: ffffffffa0134500 R10: 0000000000001000 R11: 0000000000001000 R12: ffff880117a1cc10 R13: 00000000000000d0 R14: 0000000000000017 R15: ffffffff81aff700 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #6 [ffff88011b9d5d00] kobject_get_path at ffffffff8126dc5d #7 [ffff88011b9d5d40] kobject_uevent_env at ffffffff8126e551 #8 [ffff88011b9d5dd0] kobject_uevent at ffffffff8126e9eb #9 [ffff88011b9d5de0] device_del at ffffffff813440c7 -------------------8<--------------------------------------- So clean up when we have all the context, and all that's left to do when the references to the port have dropped is to free up the port struct itself. CC: <[email protected]> Reported-by: chayang <[email protected]> Reported-by: YOGANANTH SUBRAMANIAN <[email protected]> Reported-by: FuXiangChun <[email protected]> Reported-by: Qunfang Zhang <[email protected]> Reported-by: Sibiao Luo <[email protected]> Signed-off-by: Amit Shah <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
Otherwise we get flooded by the kernel warning us that we are doing long sequences of IO without serialisation. For example, WARNING: CPU: 0 PID: 11136 at drivers/gpu/drm/i915/intel_sideband.c:40 vlv_sideband_rw+0x48/0x1ef() Modules linked in: CPU: 0 PID: 11136 Comm: kworker/u2:0 Tainted: G W 3.11.0-rc2+ #4 Call Trace: [<c2028564>] ? warn_slowpath_common+0x63/0x78 [<c227ad43>] ? vlv_sideband_rw+0x48/0x1ef [<c20285dd>] ? warn_slowpath_null+0xf/0x13 [<c227ad43>] ? vlv_sideband_rw+0x48/0x1ef [<c227b060>] ? vlv_dpio_write+0x1c/0x21 [<c2262b3b>] ? intel_dp_set_signal_levels+0x24a/0x385 [<c2264909>] ? intel_dp_complete_link_train+0x25/0x1d1 [<c2264c55>] ? intel_dp_check_link_status+0xf7/0x106 [<c2238ced>] ? i915_hotplug_work_func+0x17b/0x221 [<c203a204>] ? process_one_work+0x12e/0x210 [<c203a5e4>] ? worker_thread+0x116/0x1ad [<c203a4ce>] ? rescuer_thread+0x1cb/0x1cb [<c203d8f5>] ? kthread+0x67/0x6c [<c2457ebb>] ? ret_from_kernel_thread+0x1b/0x30 [<c203d88e>] ? init_completion+0x18/0x18 v2: Retire the locking in vlv_crtc_enable() and do it close to the meat. Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Jani Nikula <[email protected]> [danvet: Squash in a s/mutex_lock/mutex_unlock/ fixup spotted by the 0 day kernel build/coccinelle and reported by Dan Carpenter.] Signed-off-by: Daniel Vetter <[email protected]>
Since commit 29a241c (ACPICA: Add argument typechecking for all predefined ACPI names), _DSM parameters are validated which trigger the following warning: ACPI Warning: \_SB_.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95) ACPI Warning: \_SB_.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95) ACPI Warning: \_SB_.PCI0.P0P2.PEGP._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95) ACPI Warning: \_SB_.PCI0.P0P2.PEGP._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95) As the Intel _DSM method seems to ignore this parameter, let's comply to the ACPI spec and use a Package instead. Signed-off-by: Peter Wu <[email protected]> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=32602 Signed-off-by: Daniel Vetter <[email protected]>
We met lockdep warning when enable and disable the bearer for commands such as: tipc-config -netid=1234 -addr=1.1.3 -be=eth:eth0 tipc-config -netid=1234 -addr=1.1.3 -bd=eth:eth0 --------------------------------------------------- [ 327.693595] ====================================================== [ 327.693994] [ INFO: possible circular locking dependency detected ] [ 327.694519] 3.11.0-rc3-wwd-default #4 Tainted: G O [ 327.694882] ------------------------------------------------------- [ 327.695385] tipc-config/5825 is trying to acquire lock: [ 327.695754] (((timer))#2){+.-...}, at: [<ffffffff8105be80>] del_timer_sync+0x0/0xd0 [ 327.696018] [ 327.696018] but task is already holding lock: [ 327.696018] (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+ 0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] which lock already depends on the new lock. [ 327.696018] [ 327.696018] [ 327.696018] the existing dependency chain (in reverse order) is: [ 327.696018] [ 327.696018] -> #1 (&(&b_ptr->lock)->rlock){+.-...}: [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff814d65b1>] _raw_spin_lock_bh+0x41/0x80 [ 327.696018] [<ffffffffa02c5d48>] disc_timeout+0x18/0xd0 [tipc] [ 327.696018] [<ffffffff8105b92a>] call_timer_fn+0xda/0x1e0 [ 327.696018] [<ffffffff8105bcd7>] run_timer_softirq+0x2a7/0x2d0 [ 327.696018] [<ffffffff8105379a>] __do_softirq+0x16a/0x2e0 [ 327.696018] [<ffffffff81053a35>] irq_exit+0xd5/0xe0 [ 327.696018] [<ffffffff81033005>] smp_apic_timer_interrupt+0x45/0x60 [ 327.696018] [<ffffffff814df4af>] apic_timer_interrupt+0x6f/0x80 [ 327.696018] [<ffffffff8100b70e>] arch_cpu_idle+0x1e/0x30 [ 327.696018] [<ffffffff810a039d>] cpu_idle_loop+0x1fd/0x280 [ 327.696018] [<ffffffff810a043e>] cpu_startup_entry+0x1e/0x20 [ 327.696018] [<ffffffff81031589>] start_secondary+0x89/0x90 [ 327.696018] [ 327.696018] -> #0 (((timer))#2){+.-...}: [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b [ 327.696018] [ 327.696018] other info that might help us debug this: [ 327.696018] [ 327.696018] Possible unsafe locking scenario: [ 327.696018] [ 327.696018] CPU0 CPU1 [ 327.696018] ---- ---- [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] lock(&(&b_ptr->lock)->rlock); [ 327.696018] lock(((timer))#2); [ 327.696018] [ 327.696018] *** DEADLOCK *** [ 327.696018] [ 327.696018] 5 locks held by tipc-config/5825: [ 327.696018] #0: (cb_lock){++++++}, at: [<ffffffff8143e608>] genl_rcv+0x18/0x40 [ 327.696018] #1: (genl_mutex){+.+.+.}, at: [<ffffffff8143ed66>] genl_rcv_msg+0xa6/0xd0 [ 327.696018] #2: (config_mutex){+.+.+.}, at: [<ffffffffa02bf889>] tipc_cfg_do_cmd+0x39/ 0x550 [tipc] [ 327.696018] #3: (tipc_net_lock){++.-..}, at: [<ffffffffa02be738>] tipc_disable_bearer+ 0x18/0x60 [tipc] [ 327.696018] #4: (&(&b_ptr->lock)->rlock){+.-...}, at: [<ffffffffa02be58d>] bearer_disable+0xdd/0x120 [tipc] [ 327.696018] [ 327.696018] stack backtrace: [ 327.696018] CPU: 2 PID: 5825 Comm: tipc-config Tainted: G O 3.11.0-rc3-wwd- default #4 [ 327.696018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 327.696018] 00000000ffffffff ffff880037fa77a8 ffffffff814d03dd 0000000000000000 [ 327.696018] ffff880037fa7808 ffff880037fa77e8 ffffffff810b1c4f 0000000037fa77e8 [ 327.696018] ffff880037fa7808 ffff880037e4db40 0000000000000000 ffff880037e4e318 [ 327.696018] Call Trace: [ 327.696018] [<ffffffff814d03dd>] dump_stack+0x4d/0xa0 [ 327.696018] [<ffffffff810b1c4f>] print_circular_bug+0x10f/0x120 [ 327.696018] [<ffffffff810b33fe>] check_prev_add+0x43e/0x4b0 [ 327.696018] [<ffffffff810b3b4d>] validate_chain+0x6dd/0x870 [ 327.696018] [<ffffffff81087a28>] ? sched_clock_cpu+0xd8/0x110 [ 327.696018] [<ffffffff810b40bb>] __lock_acquire+0x3db/0x670 [ 327.696018] [<ffffffff810b4453>] lock_acquire+0x103/0x130 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffff8105bebd>] del_timer_sync+0x3d/0xd0 [ 327.696018] [<ffffffff8105be80>] ? try_to_del_timer_sync+0x70/0x70 [ 327.696018] [<ffffffffa02c5855>] tipc_disc_delete+0x15/0x30 [tipc] [ 327.696018] [<ffffffffa02be59f>] bearer_disable+0xef/0x120 [tipc] [ 327.696018] [<ffffffffa02be74f>] tipc_disable_bearer+0x2f/0x60 [tipc] [ 327.696018] [<ffffffffa02bfb32>] tipc_cfg_do_cmd+0x2e2/0x550 [tipc] [ 327.696018] [<ffffffff81218783>] ? security_capable+0x13/0x20 [ 327.696018] [<ffffffffa02c8c79>] handle_cmd+0x49/0xe0 [tipc] [ 327.696018] [<ffffffff8143e898>] genl_family_rcv_msg+0x268/0x340 [ 327.696018] [<ffffffff8143ed30>] genl_rcv_msg+0x70/0xd0 [ 327.696018] [<ffffffff8143ecc0>] ? genl_lock+0x20/0x20 [ 327.696018] [<ffffffff8143d4c9>] netlink_rcv_skb+0x89/0xb0 [ 327.696018] [<ffffffff8143e608>] ? genl_rcv+0x18/0x40 [ 327.696018] [<ffffffff8143e617>] genl_rcv+0x27/0x40 [ 327.696018] [<ffffffff8143d21e>] netlink_unicast+0x15e/0x1b0 [ 327.696018] [<ffffffff81289d7c>] ? memcpy_fromiovec+0x6c/0x90 [ 327.696018] [<ffffffff8143ddcf>] netlink_sendmsg+0x22f/0x400 [ 327.696018] [<ffffffff813f7836>] __sock_sendmsg+0x66/0x80 [ 327.696018] [<ffffffff813f7957>] sock_aio_write+0x107/0x120 [ 327.696018] [<ffffffff813fe29c>] ? release_sock+0x8c/0xa0 [ 327.696018] [<ffffffff8117f76d>] do_sync_write+0x7d/0xc0 [ 327.696018] [<ffffffff8117fa24>] ? rw_verify_area+0x54/0x100 [ 327.696018] [<ffffffff8117fc56>] vfs_write+0x186/0x190 [ 327.696018] [<ffffffff811803e0>] SyS_write+0x60/0xb0 [ 327.696018] [<ffffffff814de852>] system_call_fastpath+0x16/0x1b ----------------------------------------------------------------------- The problem is that the tipc_link_delete() will cancel the timer disc_timeout() when the b_ptr->lock is hold, but the disc_timeout() still call b_ptr->lock to finish the work, so the dead lock occurs. We should unlock the b_ptr->lock when del the disc_timeout(). Remove link_timeout() still met the same problem, the patch: http://article.gmane.org/gmane.network.tipc.general/4380 fix the problem, so no need to send patch for fix link_timeout() deadlock warming. Signed-off-by: Wang Weidong <[email protected]> Signed-off-by: Ding Tianhong <[email protected]> Acked-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Fix a NULL pointer deference while removing an empty directory, which was introduced by commit 3704412 ("[readdir] convert ocfs2"). BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<(null)>] (null) PGD 6da85067 PUD 6da89067 PMD 0 Oops: 0010 [#1] SMP CPU: 0 PID: 6564 Comm: rmdir Tainted: G O 3.11.0-rc1 #4 RIP: 0010:[<0000000000000000>] [< (null)>] (null) Call Trace: ocfs2_dir_foreach+0x49/0x50 [ocfs2] ocfs2_empty_dir+0x12c/0x3e0 [ocfs2] ocfs2_unlink+0x56e/0xc10 [ocfs2] vfs_rmdir+0xd5/0x140 do_rmdir+0x1cb/0x1e0 SyS_rmdir+0x16/0x20 system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff88006daddc10> CR2: 0000000000000000 [[email protected]: fix pointer math] Signed-off-by: Jie Liu <[email protected]> Reported-by: David Weber <[email protected]> Cc: Al Viro <[email protected]> Cc: Joel Becker <[email protected]> Cc: Mark Fasheh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
ACPI 5.0 specification requires the fourth parameter to the _DSM (Device Specific Method) to be of type package instead of integer. Failing to do that we get following warning on the console: ACPI Warning: \_SB_.PCI0.I2C1.TPL0._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20130517/nsarguments-95) Fix this by passing an empty package to the _DSM method. The HID over I2C specification doesn't require any specific values to be passed with this parameter. Signed-off-by: Mika Westerberg <[email protected]> Reviewed-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
When booting secondary CPUs, announce_cpu() is called to show which cpu has been brought up. For example: [ 0.402751] smpboot: Booting Node 0, Processors #1 #2 #3 #4 #5 OK [ 0.525667] smpboot: Booting Node 1, Processors #6 #7 #8 #9 #10 #11 OK [ 0.755592] smpboot: Booting Node 0, Processors #12 #13 #14 #15 #16 #17 OK [ 0.890495] smpboot: Booting Node 1, Processors #18 #19 #20 #21 #22 #23 But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum possible cpu id. It should use the maximum present cpu id in case not all CPUs booted up. Signed-off-by: Libin <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ] Signed-off-by: Ingo Molnar <[email protected]>
Not all I/O ASIC versions have the free-running counter implemented, an early revision used in the 5000/1xx models aka 3MIN and 4MIN did not have it. Therefore we cannot unconditionally use it as a clock source. Fortunately if not implemented its register slot has a fixed value so it is enough if we check for the value at the end of the calibration period being the same as at the beginning. This also means we need to look for another high-precision clock source on the systems affected. The 5000/1xx can have an R4000SC processor installed where the CP0 Count register can be used as a clock source. Unfortunately all the R4k DECstations suffer from the missed timer interrupt on CP0 Count reads erratum, so we cannot use the CP0 timer as a clock source and a clock event both at a time. However we never need an R4k clock event device because all DECstations have a DS1287A RTC chip whose periodic interrupt can be used as a clock source. This gives us the following four configuration possibilities for I/O ASIC DECstations: 1. No I/O ASIC counter and no CP0 timer, e.g. R3k 5000/1xx (3MIN). 2. No I/O ASIC counter but the CP0 timer, i.e. R4k 5000/150 (4MIN). 3. The I/O ASIC counter but no CP0 timer, e.g. R3k 5000/240 (3MAX+). 4. The I/O ASIC counter and the CP0 timer, e.g. R4k 5000/260 (4MAX+). For #1 and #2 this change stops the I/O ASIC free-running counter from being installed as a clock source of a 0Hz frequency. For #2 it also arranges for the CP0 timer to be used as a clock source rather than a clock event device, because having an accurate wall clock is more important than a high-precision interval timer. For #3 there is no change. For #4 the change makes the I/O ASIC free-running counter installed as a clock source so that the CP0 timer can be used as a clock event device. Unfortunately the use of the CP0 timer as a clock event device relies on a succesful completion of c0_compare_interrupt. That never happens, because while waiting for a CP0 Compare interrupt to happen the function spins in a loop reading the CP0 Count register. This makes the CP0 Count erratum trigger reliably causing the interrupt waited for to be lost in all cases. As a result #4 resorts to using the CP0 timer as a clock source as well, just as #2. However we want to keep this separate arrangement in case (hope) c0_compare_interrupt is eventually rewritten such that it avoids the erratum. Signed-off-by: Maciej W. Rozycki <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/5825/ Signed-off-by: Ralf Baechle <[email protected]>
When parsing lines from objdump a line containing source code starting with a numeric label is mistaken for a line of disassembly starting with a memory address. Current validation fails to recognise that the "memory address" is out of range and calculates an invalid offset which later causes this segfault: Program received signal SIGSEGV, Segmentation fault. 0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50) at util/annotate.c:631 631 hits += h->addr[offset++]; (gdb) bt #0 0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50) at util/annotate.c:631 #1 0x00000000004d65e3 in annotate_browser__calc_percent (browser=0x7fffffffd130, evsel=0xa01da0) at ui/browsers/annotate.c:364 #2 0x00000000004d7433 in annotate_browser__run (browser=0x7fffffffd130, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:672 #3 0x00000000004d80c9 in symbol__tui_annotate (sym=0xc989a0, map=0xa02660, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:962 #4 0x00000000004d7aa0 in hist_entry__tui_annotate (he=0xdf73f0, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:823 #5 0x00000000004dd648 in perf_evsel__hists_browse (evsel=0xa01da0, nr_events=1, helpline= 0x58b768 "For a higher level overview, try: perf report --sort comm,dso", ev_name=0xa02cd0 "cycles", left_exits=false, hbt= 0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1659 #6 0x00000000004de372 in perf_evlist__tui_browse_hists (evlist=0xa01520, help= 0x58b768 "For a higher level overview, try: perf report --sort comm,dso", hbt=0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1950 #7 0x000000000042cf6b in __cmd_report (rep=0x7fffffffd6c0) at builtin-report.c:581 #8 0x000000000042e25d in cmd_report (argc=0, argv=0x7fffffffe4b0, prefix=0x0) at builtin-report.c:965 #9 0x000000000041a0e1 in run_builtin (p=0x801548, argc=1, argv=0x7fffffffe4b0) at perf.c:319 #10 0x000000000041a319 in handle_internal_command (argc=1, argv=0x7fffffffe4b0) at perf.c:376 #11 0x000000000041a465 in run_argv (argcp=0x7fffffffe38c, argv=0x7fffffffe380) at perf.c:420 #12 0x000000000041a707 in main (argc=1, argv=0x7fffffffe4b0) at perf.c:521 After the fix is applied the symbol can be annotated showing the problematic line "1: rep" copy_user_generic_string /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux */ ENTRY(copy_user_generic_string) CFI_STARTPROC ASM_STAC andl %edx,%edx and %edx,%edx jz 4f je 37 cmpl $8,%edx cmp $0x8,%edx jb 2f /* less than 8 bytes, go to byte copy loop */ jb 33 ALIGN_DESTINATION mov %edi,%ecx and $0x7,%ecx je 28 sub $0x8,%ecx neg %ecx sub %ecx,%edx 1a: mov (%rsi),%al mov %al,(%rdi) inc %rsi inc %rdi dec %ecx jne 1a movl %edx,%ecx 28: mov %edx,%ecx shrl $3,%ecx shr $0x3,%ecx andl $7,%edx and $0x7,%edx 1: rep 100.00 rep movsq %ds:(%rsi),%es:(%rdi) movsq 2: movl %edx,%ecx 33: mov %edx,%ecx 3: rep rep movsb %ds:(%rsi),%es:(%rdi) movsb 4: xorl %eax,%eax 37: xor %eax,%eax data32 xchg %ax,%ax ASM_CLAC ret retq Signed-off-by: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]> (cherry picked from commit f112a04)
If EM Transmit bit is busy during init ata_msleep() is called. It is wrong - msleep() should be used instead of ata_msleep(), because if EM Transmit bit is busy for one port, it will be busy for all other ports too, so using ata_msleep() causes wasting tries for another ports. The most common scenario looks like that now (six ports try to transmit a LED meaasege): - port #0 tries for the 1st time and succeeds - ports #1-5 try for the 1st time and sleeps - port #1 tries for the 2nd time and succeeds - ports #2-5 try for the 2nd time and sleeps - port #2 tries for the 3rd time and succeeds - ports #3-5 try for the 3rd time and sleeps - port #3 tries for the 4th time and succeeds - ports #4-5 try for the 4th time and sleeps - port #4 tries for the 5th time and succeeds - port #5 tries for the 5th time and sleeps At this moment port #5 wasted all its five tries and failed to initialize. Because there are only 5 (EM_MAX_RETRY) tries available usually only five ports succeed to initialize. The sixth port and next ones usually will fail. If msleep() is used instead of ata_msleep() the first port succeeds to initialize in the first try and next ones usually succeed to initialize in the second try. tj: updated comment Signed-off-by: Lukasz Dorau <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
commit 057db84 upstream. Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 6ff33b7 upstream. When a task enters call_refreshresult with status 0 from call_refresh and !rpcauth_uptodatecred(task) it enters call_refresh again with no rate-limiting or max number of retries. Instead of trying forever, make use of the retry path that other errors use. This only seems to be possible when the crrefresh callback is gss_refresh_null, which only happens when destroying the context. To reproduce: 1) mount with sec=krb5 (or sec=sys with krb5 negotiated for non FSID specific operations). 2) reboot - the client will be stuck and will need to be hard rebooted BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:46] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs fscache ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw i2c_piix4 i2c_core e1000 parport_pc parport shpchp nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc autofs4 mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy irq event stamp: 195724 hardirqs last enabled at (195723): [<ffffffff814a925c>] restore_args+0x0/0x30 hardirqs last disabled at (195724): [<ffffffff814b0a6a>] apic_timer_interrupt+0x6a/0x80 softirqs last enabled at (195722): [<ffffffff8103f583>] __do_softirq+0x1df/0x276 softirqs last disabled at (195717): [<ffffffff8103f852>] irq_exit+0x53/0x9a CPU: 0 PID: 46 Comm: kworker/0:2 Not tainted 3.13.0-rc3-branch-dros_testing+ #4 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013 Workqueue: rpciod rpc_async_schedule [sunrpc] task: ffff8800799c4260 ti: ffff880079002000 task.ti: ffff880079002000 RIP: 0010:[<ffffffffa0064fd4>] [<ffffffffa0064fd4>] __rpc_execute+0x8a/0x362 [sunrpc] RSP: 0018:ffff880079003d18 EFLAGS: 00000246 RAX: 0000000000000005 RBX: 0000000000000007 RCX: 0000000000000007 RDX: 0000000000000007 RSI: ffff88007aecbae8 RDI: ffff8800783d8900 RBP: ffff880079003d78 R08: ffff88006e30e9f8 R09: ffffffffa005a3d7 R10: ffff88006e30e7b0 R11: ffff8800783d8900 R12: ffffffffa006675e R13: ffff880079003ce8 R14: ffff88006e30e7b0 R15: ffff8800783d8900 FS: 0000000000000000(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3072333000 CR3: 0000000001a0b000 CR4: 00000000001407f0 Stack: ffff880079003d98 0000000000000246 0000000000000000 ffff88007a9a4830 ffff880000000000 ffffffff81073f47 ffff88007f212b00 ffff8800799c4260 ffff8800783d8988 ffff88007f212b00 ffffe8ffff604800 0000000000000000 Call Trace: [<ffffffff81073f47>] ? trace_hardirqs_on_caller+0x145/0x1a1 [<ffffffffa00652d3>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffff81052974>] process_one_work+0x211/0x3a5 [<ffffffff810528d5>] ? process_one_work+0x172/0x3a5 [<ffffffff81052eeb>] worker_thread+0x134/0x202 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 [<ffffffff81052db7>] ? rescuer_thread+0x280/0x280 [<ffffffff810584a0>] kthread+0xc9/0xd1 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 [<ffffffff814afd6c>] ret_from_fork+0x7c/0xb0 [<ffffffff810583d7>] ? __kthread_parkme+0x61/0x61 Code: e8 87 63 fd e0 c6 05 10 dd 01 00 01 48 8b 43 70 4c 8d 6b 70 45 31 e4 a8 02 0f 85 d5 02 00 00 4c 8b 7b 48 48 c7 43 48 00 00 00 00 <4c> 8b 4b 50 4d 85 ff 75 0c 4d 85 c9 4d 89 cf 0f 84 32 01 00 00 And the output of "rpcdebug -m rpc -s all": RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refresh (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 RPC: 61 call_refreshresult (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refresh (status 0) RPC: 61 call_refreshresult (status 0) RPC: 61 refreshing RPCSEC_GSS cred ffff88007a413cf0 Signed-off-by: Weston Andros Adamson <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 6fdda9a upstream. As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty raspberrypi#4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [<ffffffff81194803>] validate_chain+0x6c3/0x7b0 [ 251.101967] [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580 [ 251.101967] [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0 [ 251.101967] [<ffffffff84398500>] _raw_spin_lock+0x40/0x80 [ 251.101967] [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0 [ 251.101967] [<ffffffff81154168>] queue_work_on+0x98/0x120 [ 251.101967] [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30 [ 251.101967] [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160 [ 251.101967] [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [<ffffffff843a4b49>] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty raspberrypi#4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Richard Cochran <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sasha Levin <[email protected]> Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 1a18a66 upstream. Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef440000 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000 GPR08: 00000000 020b8000 00000000 00000000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565 (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: Kevin Hao <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>
commit d25f06e upstream. vmxnet3's netpoll driver is incorrectly coded. It directly calls vmxnet3_do_poll, which is the driver internal napi poll routine. As the netpoll controller method doesn't block real napi polls in any way, there is a potential for race conditions in which the netpoll controller method and the napi poll method run concurrently. The result is data corruption causing panics such as this one recently observed: PID: 1371 TASK: ffff88023762caa0 CPU: 1 COMMAND: "rs:main Q:Reg" #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570 #3 [ffff88023abd58e0] die at ffffffff81010e0b #4 [ffff88023abd5910] do_trap at ffffffff8152add4 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b [exception RIP: vmxnet3_rq_rx_complete+1968] RIP: ffffffffa00f1e80 RSP: ffff88023abd5ac8 RFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88023b5dcee0 RCX: 00000000000000c0 RDX: 0000000000000000 RSI: 00000000000005f2 RDI: ffff88023b5dcee0 RBP: ffff88023abd5b48 R8: 0000000000000000 R9: ffff88023a3b6048 R10: 0000000000000000 R11: 0000000000000002 R12: ffff8802398d4cd8 R13: ffff88023af35140 R14: ffff88023b60c890 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3] #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3] #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7 The fix is to do as other drivers do, and have the poll controller call the top half interrupt handler, which schedules a napi poll properly to recieve frames Tested by myself, successfully. Signed-off-by: Neil Horman <[email protected]> CC: Shreyas Bhatewara <[email protected]> CC: "VMware, Inc." <[email protected]> CC: "David S. Miller" <[email protected]> Reviewed-by: Shreyas N Bhatewara <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>
commit 71abdc1 upstream. When kswapd exits, it can end up taking locks that were previously held by allocating tasks while they waited for reclaim. Lockdep currently warns about this: On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote: > inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. > kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes: > (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130 > {RECLAIM_FS-ON-W} state was registered at: > mark_held_locks+0xb9/0x140 > lockdep_trace_alloc+0x7a/0xe0 > kmem_cache_alloc_trace+0x37/0x240 > flex_array_alloc+0x99/0x1a0 > cgroup_attach_task+0x63/0x430 > attach_task_by_pid+0x210/0x280 > cgroup_procs_write+0x16/0x20 > cgroup_file_write+0x120/0x2c0 > vfs_write+0xc0/0x1f0 > SyS_write+0x4c/0xa0 > tracesys+0xdd/0xe2 > irq event stamp: 49 > hardirqs last enabled at (49): _raw_spin_unlock_irqrestore+0x36/0x70 > hardirqs last disabled at (48): _raw_spin_lock_irqsave+0x2b/0xa0 > softirqs last enabled at (0): copy_process.part.24+0x627/0x15f0 > softirqs last disabled at (0): (null) > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&sig->group_rwsem); > <Interrupt> > lock(&sig->group_rwsem); > > *** DEADLOCK *** > > no locks held by kswapd2/1151. > > stack backtrace: > CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4 > Call Trace: > dump_stack+0x19/0x1b > print_usage_bug+0x1f7/0x208 > mark_lock+0x21d/0x2a0 > __lock_acquire+0x52a/0xb60 > lock_acquire+0xa2/0x140 > down_read+0x51/0xa0 > exit_signals+0x24/0x130 > do_exit+0xb5/0xa50 > kthread+0xdb/0x100 > ret_from_fork+0x7c/0xb0 This is because the kswapd thread is still marked as a reclaimer at the time of exit. But because it is exiting, nobody is actually waiting on it to make reclaim progress anymore, and it's nothing but a regular thread at this point. Be tidy and strip it of all its powers (PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state) before returning from the thread function. Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Gu Zheng <[email protected]> Cc: Yasuaki Ishimatsu <[email protected]> Cc: Tang Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>
commit 3f1f9b8 upstream. This fixes the following lockdep complaint: [ INFO: possible circular locking dependency detected ] 3.16.0-rc2-mm1+ #7 Tainted: G O ------------------------------------------------------- kworker/u24:0/4356 is trying to acquire lock: (&(&sbi->s_es_lru_lock)->rlock){+.+.-.}, at: [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0 but task is already holding lock: (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180 which lock already depends on the new lock. Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_es_lock); lock(&(&sbi->s_es_lru_lock)->rlock); lock(&ei->i_es_lock); lock(&(&sbi->s_es_lru_lock)->rlock); *** DEADLOCK *** 6 locks held by kworker/u24:0/4356: #0: ("writeback"){.+.+.+}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560 #1: ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560 #2: (&type->s_umount_key#22){++++++}, at: [<ffffffff811a9c74>] grab_super_passive+0x44/0x90 #3: (jbd2_handle){+.+...}, at: [<ffffffff812979f9>] start_this_handle+0x189/0x5f0 #4: (&ei->i_data_sem){++++..}, at: [<ffffffff81247062>] ext4_map_blocks+0x132/0x550 #5: (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180 stack backtrace: CPU: 0 PID: 4356 Comm: kworker/u24:0 Tainted: G O 3.16.0-rc2-mm1+ #7 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: writeback bdi_writeback_workfn (flush-253:0) ffffffff8213dce0 ffff880014b07538 ffffffff815df0bb 0000000000000007 ffffffff8213e040 ffff880014b07588 ffffffff815db3dd ffff880014b07568 ffff880014b07610 ffff88003b868930 ffff88003b868908 ffff88003b868930 Call Trace: [<ffffffff815df0bb>] dump_stack+0x4e/0x68 [<ffffffff815db3dd>] print_circular_bug+0x1fb/0x20c [<ffffffff810a7a3e>] __lock_acquire+0x163e/0x1d00 [<ffffffff815e89dc>] ? retint_restore_args+0xe/0xe [<ffffffff815ddc7b>] ? __slab_alloc+0x4a8/0x4ce [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0 [<ffffffff810a8707>] lock_acquire+0x87/0x120 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0 [<ffffffff8128592d>] ? ext4_es_free_extent+0x5d/0x70 [<ffffffff815e6f09>] _raw_spin_lock+0x39/0x50 [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0 [<ffffffff8119760b>] ? kmem_cache_alloc+0x18b/0x1a0 [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0 [<ffffffff812869b8>] ext4_es_insert_extent+0xc8/0x180 [<ffffffff812470f4>] ext4_map_blocks+0x1c4/0x550 [<ffffffff8124c4c4>] ext4_writepages+0x6d4/0xd00 ... Reported-by: Minchan Kim <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reported-by: Minchan Kim <[email protected]> Cc: Zheng Liu <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>
[ Upstream commit 5924f17 ] When in repair-mode and TCP_RECV_QUEUE is set, we end up calling tcp_push with mss_now being 0. If data is in the send-queue and tcp_set_skb_tso_segs gets called, we crash because it will divide by mss_now: [ 347.151939] divide error: 0000 [#1] SMP [ 347.152907] Modules linked in: [ 347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4 [ 347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000 [ 347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1 [ 347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0 [ 347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000 [ 347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00 [ 347.152907] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 [ 347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0 [ 347.152907] Stack: [ 347.152907] c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080 [ 347.152907] f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85 [ 347.152907] c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8 [ 347.152907] Call Trace: [ 347.152907] [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34 [ 347.152907] [<c16013e6>] tcp_init_tso_segs+0x36/0x50 [ 347.152907] [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0 [ 347.152907] [<c10a0188>] ? up+0x28/0x40 [ 347.152907] [<c10acc85>] ? console_unlock+0x295/0x480 [ 347.152907] [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0 [ 347.152907] [<c1605716>] __tcp_push_pending_frames+0x36/0xd0 [ 347.152907] [<c15f4860>] tcp_push+0xf0/0x120 [ 347.152907] [<c15f7641>] tcp_sendmsg+0xf1/0xbf0 [ 347.152907] [<c116d920>] ? kmem_cache_free+0xf0/0x120 [ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40 [ 347.152907] [<c106a682>] ? __sigqueue_free+0x32/0x40 [ 347.152907] [<c114f0f0>] ? do_wp_page+0x3e0/0x850 [ 347.152907] [<c161c36a>] inet_sendmsg+0x4a/0xb0 [ 347.152907] [<c1150269>] ? handle_mm_fault+0x709/0xfb0 [ 347.152907] [<c15a006b>] sock_aio_write+0xbb/0xd0 [ 347.152907] [<c1180b79>] do_sync_write+0x69/0xa0 [ 347.152907] [<c1181023>] vfs_write+0x123/0x160 [ 347.152907] [<c1181d55>] SyS_write+0x55/0xb0 [ 347.152907] [<c167f0d8>] sysenter_do_call+0x12/0x28 This can easily be reproduced with the following packetdrill-script (the "magic" with netem, sk_pacing and limit_output_bytes is done to prevent the kernel from pushing all segments, because hitting the limit without doing this is not so easy with packetdrill): 0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1460> +0 > S. 0:0(0) ack 1 <mss 1460> +0.1 < . 1:1(0) ack 1 win 65000 +0 accept(3, ..., ...) = 4 // This forces that not all segments of the snd-queue will be pushed +0 `tc qdisc add dev tun0 root netem delay 10ms` +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2` +0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0 +0 write(4,...,10000) = 10000 +0 write(4,...,10000) = 10000 // Set tcp-repair stuff, particularly TCP_RECV_QUEUE +0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0 +0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0 // This now will make the write push the remaining segments +0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0 +0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000` // Now we will crash +0 write(4,...,1000) = 1000 This happens since ec34232 (tcp: fix retransmission in repair mode). Prior to that, the call to tcp_push was prevented by a check for tp->repair. The patch fixes it, by adding the new goto-label out_nopush. When exiting tcp_sendmsg and a push is not required, which is the case for tp->repair, we go to this label. When repairing and calling send() with TCP_RECV_QUEUE, the data is actually put in the receive-queue. So, no push is required because no data has been added to the send-queue. Cc: Andrew Vagin <[email protected]> Cc: Pavel Emelyanov <[email protected]> Fixes: ec34232 (tcp: fix retransmission in repair mode) Signed-off-by: Christoph Paasch <[email protected]> Acked-by: Andrew Vagin <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]> Signed-off-by: Jiri Slaby <[email protected]>
No description provided.
The text was updated successfully, but these errors were encountered: