Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8KHz/32KHz recording and playback does not work with WM8731 (PROTO AudioCODEC) #16

Open
am1337 opened this issue Sep 13, 2013 · 1 comment

Comments

@am1337
Copy link

am1337 commented Sep 13, 2013

If I open the device with 8KHz or 32KHz sample rate "cat /proc/asound/card1/pcm0p/sub0/hw_params" says that the device uses the correct rate. But all 8KHz or 32KHz files I play using aplay, mplayer or ffmpeg are played with 48kHz, independend of there origin sample rate (they are pitched, that means I play a 6 minute 8KHz file in one minute). For example I tried a 32KHz radio stream (Hard to find one with less than 44,1KHz today). It is played much to fast (with interrupts), you will hear it using "mplayer -ao alsa:device=hw=1,0 -nolirc -cache 64 http://stream.antenne1.de/stream2/livestream.mp3". Curiously playing an 44.1KHz file or stream works, e.g. "mplayer -ao alsa:device=hw=1,0 -nolirc http://1live.akacast.akamaistream.net/7/706/119434/v1/gnl.akacast.akamaistream.net/1live -cache 64".

UPDATE:
I did some further research and found out the following:

If reading from Line-In with 8kHz setting, six times of expected data are sent.

My application opens the device plughw:1,0 with SND_PCM_FORMAT_MU_LAW, 8kHz and one channel.
cat /proc/asound/card1/pcm0c/sub0/hw_params says:

access: MMAP_INTERLEAVED
format: S16_LE
subformat: STD
channels: 2
rate: 8000 (8000/1)
period_size: 160
buffer_size: 800

The setting (mostly) look like they shoul (I don't know why there are set 2 channels, maybe a bug in my software). But if I read from the device I get for example:

7f ff ff 7f ff ff (mulaw uses one byte per sample, 7f means silence)

The first and the 4th byte are always the same, byte 2,3,5 and 6 are alway ff. If I only use the first byte and drop the other 5 recording works.

I assume the wm8731 is setup correct, but the clock is still using 48kHZ. So this needs to be changed or to be independent from playback, the unneeded bytes must be dropped.

I will check if playback works if I play each byte 6 times.

koalo pushed a commit that referenced this issue Nov 19, 2013
…optimizations

Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on
assumptions about the implementation of memset and similar functions.
The current ARM optimized memset code does not return the value of
its first argument, as is usually expected from standard implementations.

For instance in the following function:

void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter)
{
	memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter));
	waiter->magic = waiter;
	INIT_LIST_HEAD(&waiter->list);
}

compiled as:

800554d0 <debug_mutex_lock_common>:
800554d0:       e92d4008        push    {r3, lr}
800554d4:       e1a00001        mov     r0, r1
800554d8:       e3a02010        mov     r2, #16 ; 0x10
800554dc:       e3a01011        mov     r1, #17 ; 0x11
800554e0:       eb04426e        bl      80165ea0 <memset>
800554e4:       e1a03000        mov     r3, r0
800554e8:       e583000c        str     r0, [r3, #12]
800554ec:       e5830000        str     r0, [r3]
800554f0:       e5830004        str     r0, [r3, #4]
800554f4:       e8bd8008        pop     {r3, pc}

GCC assumes memset returns the value of pointer 'waiter' in register r0; causing
register/memory corruptions.

This patch fixes the return value of the assembly version of memset.
It adds a 'mov' instruction and merges an additional load+store into
existing load/store instructions.
For ease of review, here is a breakdown of the patch into 4 simple steps:

Step 1
======
Perform the following substitutions:
ip -> r8, then
r0 -> ip,
and insert 'mov ip, r0' as the first statement of the function.
At this point, we have a memset() implementation returning the proper result,
but corrupting r8 on some paths (the ones that were using ip).

Step 2
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 1:

save r8:
-       str     lr, [sp, #-4]!
+       stmfd   sp!, {r8, lr}

and restore r8 on both exit paths:
-       ldmeqfd sp!, {pc}               @ Now <64 bytes to go.
+       ldmeqfd sp!, {r8, pc}           @ Now <64 bytes to go.
(...)
        tst     r2, #16
        stmneia ip!, {r1, r3, r8, lr}
-       ldr     lr, [sp], #4
+       ldmfd   sp!, {r8, lr}

Step 3
======
Make sure r8 is saved and restored when (! CALGN(1)+0) == 0:

save r8:
-       stmfd   sp!, {r4-r7, lr}
+       stmfd   sp!, {r4-r8, lr}

and restore r8 on both exit paths:
        bgt     3b
-       ldmeqfd sp!, {r4-r7, pc}
+       ldmeqfd sp!, {r4-r8, pc}
(...)
        tst     r2, #16
        stmneia ip!, {r4-r7}
-       ldmfd   sp!, {r4-r7, lr}
+       ldmfd   sp!, {r4-r8, lr}

Step 4
======
Rewrite register list "r4-r7, r8" as "r4-r8".

Signed-off-by: Ivan Djelic <[email protected]>
Reviewed-by: Nicolas Pitre <[email protected]>
Signed-off-by: Dirk Behme <[email protected]>
Signed-off-by: Russell King <[email protected]>
koalo pushed a commit that referenced this issue Nov 19, 2013
Commit 28c70f1 ("drm/i915: use the gmbus irq for waits") switched to
using GMBUS irqs instead of GPIO bit-banging for chipset generations 4
and above.

It turns out though that on many systems this leads to spurious interrupts
being generated, long after the register write to disable the IRQs has been
issued.

Typically this results in the spurious interrupt source getting
disabled:

[    9.636345] irq 16: nobody cared (try booting with the "irqpoll" option)
[    9.637915] Pid: 4157, comm: ifup Tainted: GF            3.9.0-rc2-00341-g0863702 raspberrypi#422
[    9.639484] Call Trace:
[    9.640731]  <IRQ>  [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7
[    9.640731]  [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8
[    9.640731]  [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214
[    9.640731]  [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c
[    9.640731]  [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0
[    9.640731]  [<ffffffff8100400e>] handle_irq+0x1a/0x24
[    9.640731]  [<ffffffff81003d17>] do_IRQ+0x48/0xaf
[    9.640731]  [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a
[    9.640731]  <EOI>  [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b
[    9.640731] handlers:
[    9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore]
[    9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket]
[    9.640731] Disabling IRQ #16

The really curious thing is now that irq 16 is _not_ the interrupt for
the i915 driver when using MSI, but it _is_ the interrupt when not
using MSI. So by all indications it seems like gmbus is able to
generate a legacy (shared) interrupt in MSI mode on some
configurations. I've tried to reproduce this and the differentiating
thing seems to be that on unaffected systems no other device uses irq
16 (which seems to be the non-MSI intel gfx interrupt on all gm45).

I have no idea how that even can happen.

To avoid tempting this elephant into a rage, just disable gmbus
interrupt support on gen 4.

v2: Improve the commit message with exact details of what's going on.
Also add a comment in the code to warn against this particular
elephant in the room.

v3: Move the comment explaing how gen4 blows up next to the definition
of HAS_GMBUS_IRQ to keep the code-flow straight. Suggested by Chris
Wilson.

Signed-off-by: Jiri Kosina <[email protected]> (v1)
Acked-by: Chris Wilson <[email protected]>
References: https://lkml.org/lkml/2013/3/8/325
Signed-off-by: Daniel Vetter <[email protected]>
koalo pushed a commit that referenced this issue Nov 19, 2013
Now that the tty port owns the flip buffers and i/o is allowed
from the driver even when no tty is attached, the destruction
of the tty port (and the flip buffers) must ensure that no
outstanding work is pending.

Unfortunately, this creates a lock order problem with the
console_lock (see attached lockdep report [1] below).

For single console deallocation, drop the console_lock prior
to port destruction. When multiple console deallocation,
defer port destruction until the consoles have been
deallocated.

tty_port_destroy() is not required if the port has not
been used; remove from vc_allocate() failure path.

[1] lockdep report from Dave Jones <[email protected]>

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.9.0+ #16 Not tainted
 -------------------------------------------------------
 (agetty)/26163 is trying to acquire lock:
 blocked:  ((&buf->work)){+.+...}, instance: ffff88011c8b0020, at: [<ffffffff81062065>] flush_work+0x5/0x2e0

 but task is already holding lock:
 blocked:  (console_lock){+.+.+.}, instance: ffffffff81c2fde0, at: [<ffffffff813bc201>] vt_ioctl+0xb61/0x1230

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (console_lock){+.+.+.}:
        [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
        [<ffffffff810416c7>] console_lock+0x77/0x80
        [<ffffffff813c3dcd>] con_flush_chars+0x2d/0x50
        [<ffffffff813b32b2>] n_tty_receive_buf+0x122/0x14d0
        [<ffffffff813b7709>] flush_to_ldisc+0x119/0x170
        [<ffffffff81064381>] process_one_work+0x211/0x700
        [<ffffffff8106498b>] worker_thread+0x11b/0x3a0
        [<ffffffff8106ce5d>] kthread+0xed/0x100
        [<ffffffff81601cac>] ret_from_fork+0x7c/0xb0

 -> #0 ((&buf->work)){+.+...}:
        [<ffffffff810b349a>] __lock_acquire+0x193a/0x1c00
        [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
        [<ffffffff810620ae>] flush_work+0x4e/0x2e0
        [<ffffffff81065305>] __cancel_work_timer+0x95/0x130
        [<ffffffff810653b0>] cancel_work_sync+0x10/0x20
        [<ffffffff813b8212>] tty_port_destroy+0x12/0x20
        [<ffffffff813c65e8>] vc_deallocate+0xf8/0x110
        [<ffffffff813bc20c>] vt_ioctl+0xb6c/0x1230
        [<ffffffff813b01a5>] tty_ioctl+0x285/0xd50
        [<ffffffff811ba825>] do_vfs_ioctl+0x305/0x530
        [<ffffffff811baad1>] sys_ioctl+0x81/0xa0
        [<ffffffff81601d59>] system_call_fastpath+0x16/0x1b

 other info that might help us debug this:

 [ 6760.076175]  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(console_lock);
                                lock((&buf->work));
                                lock(console_lock);
   lock((&buf->work));

  *** DEADLOCK ***

 1 lock on stack by (agetty)/26163:
  #0: blocked:  (console_lock){+.+.+.}, instance: ffffffff81c2fde0, at: [<ffffffff813bc201>] vt_ioctl+0xb61/0x1230
 stack backtrace:
 Pid: 26163, comm: (agetty) Not tainted 3.9.0+ #16
 Call Trace:
  [<ffffffff815edb14>] print_circular_bug+0x200/0x20e
  [<ffffffff810b349a>] __lock_acquire+0x193a/0x1c00
  [<ffffffff8100a269>] ? sched_clock+0x9/0x10
  [<ffffffff8100a269>] ? sched_clock+0x9/0x10
  [<ffffffff8100a200>] ? native_sched_clock+0x20/0x80
  [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
  [<ffffffff81062065>] ? flush_work+0x5/0x2e0
  [<ffffffff810620ae>] flush_work+0x4e/0x2e0
  [<ffffffff81062065>] ? flush_work+0x5/0x2e0
  [<ffffffff810b15db>] ? mark_held_locks+0xbb/0x140
  [<ffffffff8113c8a3>] ? __free_pages_ok.part.57+0x93/0xc0
  [<ffffffff810b15db>] ? mark_held_locks+0xbb/0x140
  [<ffffffff810652f2>] ? __cancel_work_timer+0x82/0x130
  [<ffffffff81065305>] __cancel_work_timer+0x95/0x130
  [<ffffffff810653b0>] cancel_work_sync+0x10/0x20
  [<ffffffff813b8212>] tty_port_destroy+0x12/0x20
  [<ffffffff813c65e8>] vc_deallocate+0xf8/0x110
  [<ffffffff813bc20c>] vt_ioctl+0xb6c/0x1230
  [<ffffffff810aec41>] ? lock_release_holdtime.part.30+0xa1/0x170
  [<ffffffff813b01a5>] tty_ioctl+0x285/0xd50
  [<ffffffff812b00f6>] ? inode_has_perm.isra.46.constprop.61+0x56/0x80
  [<ffffffff811ba825>] do_vfs_ioctl+0x305/0x530
  [<ffffffff812b04db>] ? selinux_file_ioctl+0x5b/0x110
  [<ffffffff811baad1>] sys_ioctl+0x81/0xa0
  [<ffffffff81601d59>] system_call_fastpath+0x16/0x1b

Cc: Dave Jones <[email protected]>
Signed-off-by: Peter Hurley <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koalo pushed a commit that referenced this issue Nov 19, 2013
…s struct file

commit e4daf1f upstream.

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Signed-off-by: Harshula Jayasuriya <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@am1337
Copy link
Author

am1337 commented Apr 16, 2014

Just another hint. If I use a virtual device, e.g. dshare, Alsa resamples audio and it works.

koalo pushed a commit that referenced this issue Apr 16, 2014
[ Upstream commit 7fe0ee0 ]

Using iperf to send packets(GSO mode is on), a bug is triggered:

[  212.672781] kernel BUG at lib/dynamic_queue_limits.c:26!
[  212.673396] invalid opcode: 0000 [#1] SMP
[  212.673882] Modules linked in: 8139cp(O) nls_utf8 edd fuse loop dm_mod ipv6 i2c_piix4 8139too i2c_core intel_agp joydev pcspkr hid_generic intel_gtt floppy sr_mod mii button sg cdrom ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata scsi_mod [last unloaded: 8139cp]
[  212.676084] CPU: 0 PID: 4124 Comm: iperf Tainted: G           O 3.12.0-0.7-default+ #16
[  212.676084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  212.676084] task: ffff8800d83966c0 ti: ffff8800db4c8000 task.ti: ffff8800db4c8000
[  212.676084] RIP: 0010:[<ffffffff8122e23f>]  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
[  212.676084] RSP: 0018:ffff880116e03e30  EFLAGS: 00010083
[  212.676084] RAX: 00000000000005ea RBX: 0000000000000f7c RCX: 0000000000000002
[  212.676084] RDX: ffff880111dd0dc0 RSI: 0000000000000bd4 RDI: ffff8800db6ffcc0
[  212.676084] RBP: ffff880116e03e48 R08: 0000000000000992 R09: 0000000000000000
[  212.676084] R10: ffffffff8181e400 R11: 0000000000000004 R12: 000000000000000f
[  212.676084] R13: ffff8800d94ec840 R14: ffff8800db440c80 R15: 000000000000000e
[  212.676084] FS:  00007f6685a3c700(0000) GS:ffff880116e00000(0000) knlGS:0000000000000000
[  212.676084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  212.676084] CR2: 00007f6685ad6460 CR3: 00000000db714000 CR4: 00000000000006f0
[  212.676084] Stack:
[  212.676084]  ffff8800db6ffc00 000000000000000f ffff8800d94ec840 ffff880116e03eb8
[  212.676084]  ffffffffa041509f ffff880116e03e88 0000000f16e03e88 ffff8800d94ec000
[  212.676084]  00000bd400059858 000000050000000f ffffffff81094c36 ffff880116e03eb8
[  212.676084] Call Trace:
[  212.676084]  <IRQ>
[  212.676084]  [<ffffffffa041509f>] cp_interrupt+0x4ef/0x590 [8139cp]
[  212.676084]  [<ffffffff81094c36>] ? ktime_get+0x56/0xd0
[  212.676084]  [<ffffffff8108cf73>] handle_irq_event_percpu+0x53/0x170
[  212.676084]  [<ffffffff8108d0cc>] handle_irq_event+0x3c/0x60
[  212.676084]  [<ffffffff8108fdb5>] handle_fasteoi_irq+0x55/0xf0
[  212.676084]  [<ffffffff810045df>] handle_irq+0x1f/0x30
[  212.676084]  [<ffffffff81003c8b>] do_IRQ+0x5b/0xe0
[  212.676084]  [<ffffffff8142beaa>] common_interrupt+0x6a/0x6a
[  212.676084]  <EOI>
[  212.676084]  [<ffffffffa0416a21>] ? cp_start_xmit+0x621/0x97c [8139cp]
[  212.676084]  [<ffffffffa0416a09>] ? cp_start_xmit+0x609/0x97c [8139cp]
[  212.676084]  [<ffffffff81378ed9>] dev_hard_start_xmit+0x2c9/0x550
[  212.676084]  [<ffffffff813960a9>] sch_direct_xmit+0x179/0x1d0
[  212.676084]  [<ffffffff813793f3>] dev_queue_xmit+0x293/0x440
[  212.676084]  [<ffffffff813b0e46>] ip_finish_output+0x236/0x450
[  212.676084]  [<ffffffff810e59e7>] ? __alloc_pages_nodemask+0x187/0xb10
[  212.676084]  [<ffffffff813b10e8>] ip_output+0x88/0x90
[  212.676084]  [<ffffffff813afa64>] ip_local_out+0x24/0x30
[  212.676084]  [<ffffffff813aff0d>] ip_queue_xmit+0x14d/0x3e0
[  212.676084]  [<ffffffff813c6fd1>] tcp_transmit_skb+0x501/0x840
[  212.676084]  [<ffffffff813c8323>] tcp_write_xmit+0x1e3/0xb20
[  212.676084]  [<ffffffff81363237>] ? skb_page_frag_refill+0x87/0xd0
[  212.676084]  [<ffffffff813c8c8b>] tcp_push_one+0x2b/0x40
[  212.676084]  [<ffffffff813bb7e6>] tcp_sendmsg+0x926/0xc90
[  212.676084]  [<ffffffff813e1d21>] inet_sendmsg+0x61/0xc0
[  212.676084]  [<ffffffff8135e861>] sock_aio_write+0x101/0x120
[  212.676084]  [<ffffffff81107cf1>] ? vma_adjust+0x2e1/0x5d0
[  212.676084]  [<ffffffff812163e0>] ? timerqueue_add+0x60/0xb0
[  212.676084]  [<ffffffff81130b60>] do_sync_write+0x60/0x90
[  212.676084]  [<ffffffff81130d44>] ? rw_verify_area+0x54/0xf0
[  212.676084]  [<ffffffff81130f66>] vfs_write+0x186/0x190
[  212.676084]  [<ffffffff811317fd>] SyS_write+0x5d/0xa0
[  212.676084]  [<ffffffff814321e2>] system_call_fastpath+0x16/0x1b
[  212.676084] Code: ca 41 89 dc 41 29 cc 45 31 db 29 c2 41 89 c5 89 d0 45 29 c5 f7 d0 c1 e8 1f e9 43 ff ff ff 66 0f 1f 44 00 00 31 c0 e9 7b ff ff ff <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 c7 47 40 00
[  212.676084] RIP  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
------------[ cut here ]------------

When a skb has frags, bytes_compl plus skb->len nr_frags times in cp_tx().
It's not the correct value(actually, it should plus skb->len once) and it
will trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed).
So only increase bytes_compl when finish sending all frags. pkts_compl also
has a wrong value, fix it too.

It's introduced by commit 871f0d4 ("8139cp: enable bql").

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koalo pushed a commit that referenced this issue Apr 16, 2014
[ Upstream commit 7fe0ee0 ]

Using iperf to send packets(GSO mode is on), a bug is triggered:

[  212.672781] kernel BUG at lib/dynamic_queue_limits.c:26!
[  212.673396] invalid opcode: 0000 [#1] SMP
[  212.673882] Modules linked in: 8139cp(O) nls_utf8 edd fuse loop dm_mod ipv6 i2c_piix4 8139too i2c_core intel_agp joydev pcspkr hid_generic intel_gtt floppy sr_mod mii button sg cdrom ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata scsi_mod [last unloaded: 8139cp]
[  212.676084] CPU: 0 PID: 4124 Comm: iperf Tainted: G           O 3.12.0-0.7-default+ #16
[  212.676084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  212.676084] task: ffff8800d83966c0 ti: ffff8800db4c8000 task.ti: ffff8800db4c8000
[  212.676084] RIP: 0010:[<ffffffff8122e23f>]  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
[  212.676084] RSP: 0018:ffff880116e03e30  EFLAGS: 00010083
[  212.676084] RAX: 00000000000005ea RBX: 0000000000000f7c RCX: 0000000000000002
[  212.676084] RDX: ffff880111dd0dc0 RSI: 0000000000000bd4 RDI: ffff8800db6ffcc0
[  212.676084] RBP: ffff880116e03e48 R08: 0000000000000992 R09: 0000000000000000
[  212.676084] R10: ffffffff8181e400 R11: 0000000000000004 R12: 000000000000000f
[  212.676084] R13: ffff8800d94ec840 R14: ffff8800db440c80 R15: 000000000000000e
[  212.676084] FS:  00007f6685a3c700(0000) GS:ffff880116e00000(0000) knlGS:0000000000000000
[  212.676084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  212.676084] CR2: 00007f6685ad6460 CR3: 00000000db714000 CR4: 00000000000006f0
[  212.676084] Stack:
[  212.676084]  ffff8800db6ffc00 000000000000000f ffff8800d94ec840 ffff880116e03eb8
[  212.676084]  ffffffffa041509f ffff880116e03e88 0000000f16e03e88 ffff8800d94ec000
[  212.676084]  00000bd400059858 000000050000000f ffffffff81094c36 ffff880116e03eb8
[  212.676084] Call Trace:
[  212.676084]  <IRQ>
[  212.676084]  [<ffffffffa041509f>] cp_interrupt+0x4ef/0x590 [8139cp]
[  212.676084]  [<ffffffff81094c36>] ? ktime_get+0x56/0xd0
[  212.676084]  [<ffffffff8108cf73>] handle_irq_event_percpu+0x53/0x170
[  212.676084]  [<ffffffff8108d0cc>] handle_irq_event+0x3c/0x60
[  212.676084]  [<ffffffff8108fdb5>] handle_fasteoi_irq+0x55/0xf0
[  212.676084]  [<ffffffff810045df>] handle_irq+0x1f/0x30
[  212.676084]  [<ffffffff81003c8b>] do_IRQ+0x5b/0xe0
[  212.676084]  [<ffffffff8142beaa>] common_interrupt+0x6a/0x6a
[  212.676084]  <EOI>
[  212.676084]  [<ffffffffa0416a21>] ? cp_start_xmit+0x621/0x97c [8139cp]
[  212.676084]  [<ffffffffa0416a09>] ? cp_start_xmit+0x609/0x97c [8139cp]
[  212.676084]  [<ffffffff81378ed9>] dev_hard_start_xmit+0x2c9/0x550
[  212.676084]  [<ffffffff813960a9>] sch_direct_xmit+0x179/0x1d0
[  212.676084]  [<ffffffff813793f3>] dev_queue_xmit+0x293/0x440
[  212.676084]  [<ffffffff813b0e46>] ip_finish_output+0x236/0x450
[  212.676084]  [<ffffffff810e59e7>] ? __alloc_pages_nodemask+0x187/0xb10
[  212.676084]  [<ffffffff813b10e8>] ip_output+0x88/0x90
[  212.676084]  [<ffffffff813afa64>] ip_local_out+0x24/0x30
[  212.676084]  [<ffffffff813aff0d>] ip_queue_xmit+0x14d/0x3e0
[  212.676084]  [<ffffffff813c6fd1>] tcp_transmit_skb+0x501/0x840
[  212.676084]  [<ffffffff813c8323>] tcp_write_xmit+0x1e3/0xb20
[  212.676084]  [<ffffffff81363237>] ? skb_page_frag_refill+0x87/0xd0
[  212.676084]  [<ffffffff813c8c8b>] tcp_push_one+0x2b/0x40
[  212.676084]  [<ffffffff813bb7e6>] tcp_sendmsg+0x926/0xc90
[  212.676084]  [<ffffffff813e1d21>] inet_sendmsg+0x61/0xc0
[  212.676084]  [<ffffffff8135e861>] sock_aio_write+0x101/0x120
[  212.676084]  [<ffffffff81107cf1>] ? vma_adjust+0x2e1/0x5d0
[  212.676084]  [<ffffffff812163e0>] ? timerqueue_add+0x60/0xb0
[  212.676084]  [<ffffffff81130b60>] do_sync_write+0x60/0x90
[  212.676084]  [<ffffffff81130d44>] ? rw_verify_area+0x54/0xf0
[  212.676084]  [<ffffffff81130f66>] vfs_write+0x186/0x190
[  212.676084]  [<ffffffff811317fd>] SyS_write+0x5d/0xa0
[  212.676084]  [<ffffffff814321e2>] system_call_fastpath+0x16/0x1b
[  212.676084] Code: ca 41 89 dc 41 29 cc 45 31 db 29 c2 41 89 c5 89 d0 45 29 c5 f7 d0 c1 e8 1f e9 43 ff ff ff 66 0f 1f 44 00 00 31 c0 e9 7b ff ff ff <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 c7 47 40 00
[  212.676084] RIP  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
------------[ cut here ]------------

When a skb has frags, bytes_compl plus skb->len nr_frags times in cp_tx().
It's not the correct value(actually, it should plus skb->len once) and it
will trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed).
So only increase bytes_compl when finish sending all frags. pkts_compl also
has a wrong value, fix it too.

It's introduced by commit 871f0d4 ("8139cp: enable bql").

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
koalo pushed a commit that referenced this issue Aug 4, 2014
It seems following race is possible:

	cpu0					cpux
smp_init->cpu_up->_cpu_up
	__cpu_up
		kick_cpu(1)
-------------------------------------------------------------------------
		waiting online			...
		...				notify CPU_STARTING
							set cpux active
						set cpux online
-------------------------------------------------------------------------
		finish waiting online
		...
sched_init_smp
	init_sched_domains(cpu_active_mask)
		build_sched_domains
						set cpux sibling info
-------------------------------------------------------------------------

Execution of cpu0 and cpux could be concurrent between two separator
lines.

So if the cpux sibling information was set too late (normally
impossible, but could be triggered by adding some delay in
start_secondary, after setting cpu online), build_sched_domains()
running on cpu0 might see cpux active, with an empty sibling mask, then
cause some bad address accessing like following:

[    0.099855] Unable to handle kernel paging request for data at address 0xc00000038518078f
[    0.099868] Faulting instruction address: 0xc0000000000b7a64
[    0.099883] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
[    0.099922] Modules linked in:
[    0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-rc1-00120-gb973425-dirty #16
[    0.099956] task: c0000001fed80000 ti: c0000001fed7c000 task.ti: c0000001fed7c000
[    0.099971] NIP: c0000000000b7a64 LR: c0000000000b7a40 CTR: c0000000000b4934
[    0.099985] REGS: c0000001fed7f760 TRAP: 0300   Not tainted  (3.10.0-rc1-00120-gb973425-dirty)
[    0.099997] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24272828  XER: 20000003
[    0.100045] SOFTE: 1
[    0.100053] CFAR: c000000000445ee8
[    0.100064] DAR: c00000038518078f, DSISR: 40000000
[    0.100073]
GPR00: 0000000000000080 c0000001fed7f9e0 c000000000c84d48 0000000000000010
GPR04: 0000000000000010 0000000000000000 c0000001fc55e090 0000000000000000
GPR08: ffffffffffffffff c000000000b80b30 c000000000c962d8 00000003845ffc5f
GPR12: 0000000000000000 c00000000f33d000 c00000000000b9e4 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
GPR20: c000000000ccf750 0000000000000000 c000000000c94d48 c0000001fc504000
GPR24: c0000001fc504000 c0000001fecef848 c000000000c94d48 c000000000ccf000
GPR28: c0000001fc522090 0000000000000010 c0000001fecef848 c0000001fed7fae0
[    0.100293] NIP [c0000000000b7a64] .get_group+0x84/0xc4
[    0.100307] LR [c0000000000b7a40] .get_group+0x60/0xc4
[    0.100318] Call Trace:
[    0.100332] [c0000001fed7f9e0] [c0000000000dbce4] .lock_is_held+0xa8/0xd0 (unreliable)
[    0.100354] [c0000001fed7fa70] [c0000000000bf62c] .build_sched_domains+0x728/0xd14
[    0.100375] [c0000001fed7fbe0] [c000000000af67bc] .sched_init_smp+0x4fc/0x654
[    0.100394] [c0000001fed7fce0] [c000000000adce24] .kernel_init_freeable+0x17c/0x30c
[    0.100413] [c0000001fed7fdb0] [c00000000000ba08] .kernel_init+0x24/0x12c
[    0.100431] [c0000001fed7fe30] [c000000000009f74] .ret_from_kernel_thread+0x5c/0x68
[    0.100445] Instruction dump:
[    0.100456] 38800010 38a00000 4838e3f5 60000000 7c6307b4 2fbf0000 419e0040 3d220001
[    0.100496] 78601f24 39491590 e93e0008 7d6a002a <7d69582a> f97f0000 7d4a002a e93e0010
[    0.100559] ---[ end trace 31fd0ba7d8756001 ]---

This patch tries to move the sibling maps updating before
notify_cpu_starting() and cpu online, and a write barrier there to make
sure sibling maps are updated before active and online mask.

Signed-off-by: Li Zhong <[email protected]>
Reviewed-by: Srivatsa S. Bhat <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
koalo pushed a commit that referenced this issue Aug 4, 2014
Several people reported the warning: "kernel BUG at kernel/timer.c:729!"
and the stack trace is:

	#7 [ffff880214d25c10] mod_timer+501 at ffffffff8106d905
	#8 [ffff880214d25c50] br_multicast_del_pg.isra.20+261 at ffffffffa0731d25 [bridge]
	#9 [ffff880214d25c80] br_multicast_disable_port+88 at ffffffffa0732948 [bridge]
	#10 [ffff880214d25cb0] br_stp_disable_port+154 at ffffffffa072bcca [bridge]
	#11 [ffff880214d25ce8] br_device_event+520 at ffffffffa072a4e8 [bridge]
	#12 [ffff880214d25d18] notifier_call_chain+76 at ffffffff8164aafc
	#13 [ffff880214d25d50] raw_notifier_call_chain+22 at ffffffff810858f6
	#14 [ffff880214d25d60] call_netdevice_notifiers+45 at ffffffff81536aad
	#15 [ffff880214d25d80] dev_close_many+183 at ffffffff81536d17
	#16 [ffff880214d25dc0] rollback_registered_many+168 at ffffffff81537f68
	#17 [ffff880214d25de8] rollback_registered+49 at ffffffff81538101
	#18 [ffff880214d25e10] unregister_netdevice_queue+72 at ffffffff815390d8
	#19 [ffff880214d25e30] __tun_detach+272 at ffffffffa074c2f0 [tun]
	#20 [ffff880214d25e88] tun_chr_close+45 at ffffffffa074c4bd [tun]
	#21 [ffff880214d25ea8] __fput+225 at ffffffff8119b1f1
	#22 [ffff880214d25ef0] ____fput+14 at ffffffff8119b3fe
	#23 [ffff880214d25f00] task_work_run+159 at ffffffff8107cf7f
	#24 [ffff880214d25f30] do_notify_resume+97 at ffffffff810139e1
	#25 [ffff880214d25f50] int_signal+18 at ffffffff8164f292

this is due to I forgot to check if mp->timer is armed in
br_multicast_del_pg(). This bug is introduced by
commit 9f00b2e (bridge: only expire the mdb entry
when query is received).

Same for __br_mdb_del().

Tested-by: poma <[email protected]>
Reported-by: LiYonghua <[email protected]>
Reported-by: Robert Hancock <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Cc: "David S. Miller" <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
koalo pushed a commit that referenced this issue Aug 4, 2014
…s struct file

The following call chain:
------------------------------------------------------------
nfs4_get_vfs_file
- nfsd_open
  - dentry_open
    - do_dentry_open
      - __get_file_write_access
        - get_write_access
          - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
------------------------------------------------------------

can result in the following state:
------------------------------------------------------------
struct nfs4_file {
...
  fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
  fi_access = {{
      counter = 0x1
    }, {
      counter = 0x0
    }},
...
------------------------------------------------------------

1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NULL, hence nfsd_open() is called where we get status set to an error
and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.

2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
Thus we leave a landmine in the form of the nfs4_file data structure in
an incorrect state.

3) Eventually, when __nfs4_file_put_access() is called it finds
fi_access[O_WRONLY] being non-zero, it decrements it and calls
nfs4_file_put_fd() which tries to fput -ETXTBSY.
------------------------------------------------------------
...
     [exception RIP: fput+0x9]
     RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
     RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
     RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
     RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
     R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
     R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
 #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
 #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
 #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
 #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
 #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
 #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
 #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
 #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
 #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
 #19 [ffff88062e365ee8] kthread at ffffffff81090886
 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
------------------------------------------------------------

Cc: [email protected]
Signed-off-by: Harshula Jayasuriya <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
koalo pushed a commit that referenced this issue Aug 4, 2014
Since commit ac4e4af ("macvtap: Consistently use rcu functions"),
Thomas gets two different warnings :

BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892
caller is macvtap_do_read+0x45c/0x600 [macvtap]
CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13
Call Trace:
([<00000000001126ee>] show_trace+0x126/0x144)
 [<00000000001127d2>] show_stack+0xc6/0xd4
 [<000000000068bcec>] dump_stack+0x74/0xd8
 [<0000000000481066>] debug_smp_processor_id+0xf6/0x114
 [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap]
 [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap]
 [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net]
 [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost]
 [<000000000015f3ac>] kthread+0xd8/0xe4
 [<00000000006934a6>] kernel_thread_starter+0x6/0xc
 [<00000000006934a0>] kernel_thread_starter+0x0/0xc

And

BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898
caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan]
CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16
Call Trace:
([<00000000001126ee>] show_trace+0x126/0x144)
 [<00000000001127d2>] show_stack+0xc6/0xd4
 [<000000000068bdb8>] dump_stack+0x74/0xd4
 [<0000000000481132>] debug_smp_processor_id+0xf6/0x114
 [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan]
 [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap]
 [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap]
 [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net]
 [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost]
 [<000000000015f3ac>] kthread+0xd8/0xe4
 [<000000000069356e>] kernel_thread_starter+0x6/0xc
 [<0000000000693568>] kernel_thread_starter+0x0/0xc
2 locks held by vhost-45897/45898:
 #0:  (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net]
 #1:  (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap]

In the first case, macvtap_put_user() calls macvlan_count_rx()
in a preempt-able context, and this is not allowed.

In the second case, macvtap_get_user() calls
macvlan_start_xmit() with BH enabled, and this is not allowed.

Reported-by: Thomas Huth <[email protected]>
Bisected-by: Thomas Huth <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Thomas Huth <[email protected]>
Cc: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
koalo pushed a commit that referenced this issue Aug 4, 2014
When booting secondary CPUs, announce_cpu() is called to show which cpu has
been brought up. For example:

[    0.402751] smpboot: Booting Node   0, Processors  #1 #2 #3 #4 #5 OK
[    0.525667] smpboot: Booting Node   1, Processors  #6 #7 #8 #9 #10 #11 OK
[    0.755592] smpboot: Booting Node   0, Processors  #12 #13 #14 #15 #16 #17 OK
[    0.890495] smpboot: Booting Node   1, Processors  #18 #19 #20 #21 #22 #23

But the last "OK" is lost, because 'nr_cpu_ids-1' represents the maximum
possible cpu id. It should use the maximum present cpu id in case not all
CPUs booted up.

Signed-off-by: Libin <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ tweaked the changelog, removed unnecessary line break, tweaked the format to align the fields vertically. ]
Signed-off-by: Ingo Molnar <[email protected]>
koalo pushed a commit that referenced this issue Aug 4, 2014
[ Upstream commit 7fe0ee0 ]

Using iperf to send packets(GSO mode is on), a bug is triggered:

[  212.672781] kernel BUG at lib/dynamic_queue_limits.c:26!
[  212.673396] invalid opcode: 0000 [#1] SMP
[  212.673882] Modules linked in: 8139cp(O) nls_utf8 edd fuse loop dm_mod ipv6 i2c_piix4 8139too i2c_core intel_agp joydev pcspkr hid_generic intel_gtt floppy sr_mod mii button sg cdrom ext3 jbd mbcache usbhid hid uhci_hcd ehci_hcd usbcore sd_mod usb_common crc_t10dif crct10dif_common processor thermal_sys hwmon scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh ata_generic ata_piix libata scsi_mod [last unloaded: 8139cp]
[  212.676084] CPU: 0 PID: 4124 Comm: iperf Tainted: G           O 3.12.0-0.7-default+ #16
[  212.676084] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  212.676084] task: ffff8800d83966c0 ti: ffff8800db4c8000 task.ti: ffff8800db4c8000
[  212.676084] RIP: 0010:[<ffffffff8122e23f>]  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
[  212.676084] RSP: 0018:ffff880116e03e30  EFLAGS: 00010083
[  212.676084] RAX: 00000000000005ea RBX: 0000000000000f7c RCX: 0000000000000002
[  212.676084] RDX: ffff880111dd0dc0 RSI: 0000000000000bd4 RDI: ffff8800db6ffcc0
[  212.676084] RBP: ffff880116e03e48 R08: 0000000000000992 R09: 0000000000000000
[  212.676084] R10: ffffffff8181e400 R11: 0000000000000004 R12: 000000000000000f
[  212.676084] R13: ffff8800d94ec840 R14: ffff8800db440c80 R15: 000000000000000e
[  212.676084] FS:  00007f6685a3c700(0000) GS:ffff880116e00000(0000) knlGS:0000000000000000
[  212.676084] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  212.676084] CR2: 00007f6685ad6460 CR3: 00000000db714000 CR4: 00000000000006f0
[  212.676084] Stack:
[  212.676084]  ffff8800db6ffc00 000000000000000f ffff8800d94ec840 ffff880116e03eb8
[  212.676084]  ffffffffa041509f ffff880116e03e88 0000000f16e03e88 ffff8800d94ec000
[  212.676084]  00000bd400059858 000000050000000f ffffffff81094c36 ffff880116e03eb8
[  212.676084] Call Trace:
[  212.676084]  <IRQ>
[  212.676084]  [<ffffffffa041509f>] cp_interrupt+0x4ef/0x590 [8139cp]
[  212.676084]  [<ffffffff81094c36>] ? ktime_get+0x56/0xd0
[  212.676084]  [<ffffffff8108cf73>] handle_irq_event_percpu+0x53/0x170
[  212.676084]  [<ffffffff8108d0cc>] handle_irq_event+0x3c/0x60
[  212.676084]  [<ffffffff8108fdb5>] handle_fasteoi_irq+0x55/0xf0
[  212.676084]  [<ffffffff810045df>] handle_irq+0x1f/0x30
[  212.676084]  [<ffffffff81003c8b>] do_IRQ+0x5b/0xe0
[  212.676084]  [<ffffffff8142beaa>] common_interrupt+0x6a/0x6a
[  212.676084]  <EOI>
[  212.676084]  [<ffffffffa0416a21>] ? cp_start_xmit+0x621/0x97c [8139cp]
[  212.676084]  [<ffffffffa0416a09>] ? cp_start_xmit+0x609/0x97c [8139cp]
[  212.676084]  [<ffffffff81378ed9>] dev_hard_start_xmit+0x2c9/0x550
[  212.676084]  [<ffffffff813960a9>] sch_direct_xmit+0x179/0x1d0
[  212.676084]  [<ffffffff813793f3>] dev_queue_xmit+0x293/0x440
[  212.676084]  [<ffffffff813b0e46>] ip_finish_output+0x236/0x450
[  212.676084]  [<ffffffff810e59e7>] ? __alloc_pages_nodemask+0x187/0xb10
[  212.676084]  [<ffffffff813b10e8>] ip_output+0x88/0x90
[  212.676084]  [<ffffffff813afa64>] ip_local_out+0x24/0x30
[  212.676084]  [<ffffffff813aff0d>] ip_queue_xmit+0x14d/0x3e0
[  212.676084]  [<ffffffff813c6fd1>] tcp_transmit_skb+0x501/0x840
[  212.676084]  [<ffffffff813c8323>] tcp_write_xmit+0x1e3/0xb20
[  212.676084]  [<ffffffff81363237>] ? skb_page_frag_refill+0x87/0xd0
[  212.676084]  [<ffffffff813c8c8b>] tcp_push_one+0x2b/0x40
[  212.676084]  [<ffffffff813bb7e6>] tcp_sendmsg+0x926/0xc90
[  212.676084]  [<ffffffff813e1d21>] inet_sendmsg+0x61/0xc0
[  212.676084]  [<ffffffff8135e861>] sock_aio_write+0x101/0x120
[  212.676084]  [<ffffffff81107cf1>] ? vma_adjust+0x2e1/0x5d0
[  212.676084]  [<ffffffff812163e0>] ? timerqueue_add+0x60/0xb0
[  212.676084]  [<ffffffff81130b60>] do_sync_write+0x60/0x90
[  212.676084]  [<ffffffff81130d44>] ? rw_verify_area+0x54/0xf0
[  212.676084]  [<ffffffff81130f66>] vfs_write+0x186/0x190
[  212.676084]  [<ffffffff811317fd>] SyS_write+0x5d/0xa0
[  212.676084]  [<ffffffff814321e2>] system_call_fastpath+0x16/0x1b
[  212.676084] Code: ca 41 89 dc 41 29 cc 45 31 db 29 c2 41 89 c5 89 d0 45 29 c5 f7 d0 c1 e8 1f e9 43 ff ff ff 66 0f 1f 44 00 00 31 c0 e9 7b ff ff ff <0f> 0b eb fe 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 c7 47 40 00
[  212.676084] RIP  [<ffffffff8122e23f>] dql_completed+0x17f/0x190
------------[ cut here ]------------

When a skb has frags, bytes_compl plus skb->len nr_frags times in cp_tx().
It's not the correct value(actually, it should plus skb->len once) and it
will trigger the BUG_ON(bytes_compl > num_queued - dql->num_completed).
So only increase bytes_compl when finish sending all frags. pkts_compl also
has a wrong value, fix it too.

It's introduced by commit 871f0d4 ("8139cp: enable bql").

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant