-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
uvcvideo: USB isochronous frame lost #238
Comments
Having spent a while with GPIO/oscilloscope debugging while sorting out other issues, I can say the following: There is a very large chunk of time spent in_interrupt() when a set of isoc transcations complete, especially with uvcvideo transfers. The driver by default queues 5x URBs of 32 transactions each. Once a 32-burst transaction is complete, the uvcvideo completion handler is called from within the dwc_otg interrupt handler and it blocks for anything up to 700uS. This will cause multiple queued isoc transactions to get dropped due to frame overrun, as seen with the -63 return code. Typical time spent in_interrupt() for a normal xfercomp interrupt handle (no URB completion) is approx 30uS. Still huge, but at least less than a microframe. I believe this is a rather large smoking gun - given that the dwc driver coders apparently have never heard of bottom-half interrupt handlers, it would definitely be worth splitting completion functions off into a softirq/tasklet. |
Except the xfer completion interrupt needs to be called in interrupt Gordon On Wednesday, 13 March 2013, P33M wrote:
|
Sorry I mean the transfer completion callback Gordon Gordon Hollingworth, Director of Engineering On 13 Mar 2013, at 10:44, Gordon Hollingworth [email protected] wrote:
|
So what would break if we split URB completion off into a tasklet? If using the high priority softirq, nothing else scheduling-wise would be allowed to run until it completed but the myriad of hardware interrupts that the dwc usb core generates would still get serviced. The only thing I could think of would be multiple calls to a completion tasklet if we got multiple URBs completing in a very short space of time - needing some sort of queuing mechanism should there be a pileup of completed transfers. @Berryfier: have you tried tweaking the driver parameters? Try modprobe with nodrop=1 which should deliver corrupted frames rather than no frames. |
@P33M : thank you for your suggestion to tweak the driver parameters. With the camera unplugged, and as root, I entered 'sudo modprobe uvcvideo nodrop=1'. I then plugged in the camera, and started guvcview. The image motion was much more fluid, there were no freeze frames, and the frame rates achieved were much greater; 15fps@320x240, 7 fps@640x480. This is all with the camera sending MJPEG. I notice that with the tweak the CPU is maxed out with the 640x480 picture@7fps. Without the tweak, CPU use is always low, whatever the picture size, the display rarely exceeding 1fps. I also ran the trace and checked the kern.log. I was surprised to still see the large amount of uvcvideo: USB isochronous frame lost (-63) which appears as frequently as occurs without the tweak. It seems a fix is still needed. I've not (yet) noticed any corruption of the pictures either with or without the tweak. |
Is it possible this issue is somehow linked to issue #199 ? |
Uh I think I fixed it... https://gist.github.com/P33M/5213278 What I did was implement a tasklet to split the return of the URB to the various device drivers from the time we spend sitting with interrupts disabled. Previously my crummy uncompressed-frame webcam would fail repeatedly with lost isoc packets when trying uvccapture, now everything works first time. Comments on the code would be appreciated before I submit a pull request. This is a fairly big structural change for the driver. |
I'll test it this evening and let you know. Thanks for the effort. |
Would be interesting to now measure the worst case latency with this change, if this reduces down to <uframe time then it make the FIQ stuff not really needed… Gordon Gordon Hollingworth, Director of Engineering On 21 Mar 2013, at 14:13, RKlauco [email protected] wrote:
|
After playing with it a bit it seems stable with ethernet/usb disk/webcam activity. I installed guvcview to test streaming from the cam - at 640x480 there is still the odd corrupted frame with nodrop=1 but with corrupted frames dropped I am getting approx 60% valid frames (camera is spitting out 30fps) of uncompressed video at 640x480 - a data rate of 17MB/s. Guvcview is maxing out the CPU anyways so I can't tell if the drops are more due to corruption or lack of cycles. Increasing USB loading, as expected, does not alter the error rate of the video. Other interrupt sources (sd card, timers, etc) appear to cause a packet drop if they take longer than 125uS to complete, or the isoc transaction happens near the end of a uframe and servicing the interrupt takes it past the SOF for the next frame. My Sidewinder X4 keyboard - previously unusuable on the pi, as of recent kernels more useable - is still missing keypresses though I think slightly less than before. |
Can someone, please, point me out to some guide how to compile the kernel including the patch from P33M? |
I've pushed the commit to my repo. https://github.com/P33M/linux If you follow the standard kernel compilation guide: http://elinux.org/RPi_Kernel_Compilation But substitute my repo for the official one, you will get the patch. I strongly recommend using cross-compilation from another machine unless you have 12 hours to spare. Note that you will need to install both the compiled modules AND the kernel.img to your SD card. |
Thanks for the effort and the guide. And the tip to cross-compile ;) |
@RKlauco That's interesting. It's possible there's another issue - the errors that I get are -63 (they do still occur, but much less than previously) which are due to frame overruns.
Looking at the driver, -5 is returned for periodic transfers when the transfer halted but there are no other interrupts set. I've not seen them in my case - further testing required! Edit: -5 is a URB state, -63 is a individual iso_desc state - so rather than individual transfers failing, your URB is being aborted. |
I'll test this night with different camera - Microsoft LifeCam HD-3000, On Fri, Mar 22, 2013 at 10:28 AM, P33M [email protected] wrote:
|
I've done some cross comparison between the current HEAD and #255 - and borrowed a Logitech Quickcam Pro 5000 (uvcvideo device, but with a much broader feature set than my ebay special) which supports multiple framerates and mjpeg capture. For both cameras the difference is, well, an order of magnitude. With YUYV mode on the Logitech I was getting about 1/10th of a frame transmitted OK on any resolution >320x240 - and a substantially reduced framerate with mjpg. CPU usage was also much higher I assume due to uvcvideo trying to reassemble frames which included lost packets. I can also confirm that this change significantly improves the usability of my Sidewinder X4 keyboard used in X11 - from maddeningly frequent keystroke misses to a couple every sentence. |
Funny thing - I am doing some tests with the mentioned MS LifeCam HD-3000 On Sat, Mar 23, 2013 at 9:26 PM, P33M [email protected] wrote:
|
After rpi-update with the tasklet there are now on average just 1% uvcvideo: USB isochronous frame lost (-63). The ones that are lost seem to be grouped. To help a decision on whether the bug issue can be closed here is a kern.log extract with the largest grouping found. |
@Berryfier - what are you doing with the webcam output - saving to the SD card? Are you also in LXDE or just terminal mode? I have noticed that other interrupt sources, if active, can cause dropped isoc packets. I get marginally fewer lost isoc packets if saving to a USB pendrive than the SD card. |
@P33M. I was using mjpg-streamer 960x720 10fps, displaying pictures with omxplayer, and all done on the pi in terminal mode (using HDMI, USB keyboard and mouse, and two terminals F1 and F2 ). An ethernet wired network is connected, but I wasn't using the pi for anything else. Maybe the pi's own housekeeping activity is sufficient to cause these now rare dropped isoc packets. As regards picture flow, I don't see any further improvement since the nodrop tweak, which I still have in place. The noticeable difference is the elimination of most 'frame lost' messages from the log. |
@P33M BTW, I would also say that later on, when using guvcview in the LXDE, I noticed that CPU indicator was lower than before for the various frame rates and picture sizes, and that higher frame rates were achieved as the CPU maxed out. |
OK. I tested the latest changes. I installed everything according the instructions (last time I forgot the files from firmware to put into /boot directory. Congratulations on the fix, btw. |
While much improved, the story with 2 webcams at once is a bit different, especially with large transfer sizes (used with uncompressed video streams). The core USB controller will "schedule" each host channel's access to the bus during a microframe. Most of the time the transactions are bunched up toward the start of the microframe (and successful) but for some reason sometimes they are pushed to start in the latter half. This effect is exacerbated with more HCs enabled/demanding bus time. Even with the tasklet patch, the dwc_otg irq handler still takes a good 15-20uS on a xfercomp interrupt which has consequences for the next HC assigned an isoc transaction further along in the microframe - by the time the ARM has finished with the first transaction, the microframe has already overrun causing a packet drop. Suggested mitigations are:-
Reducing the packet size appears to drastically reduce the chance of an overrun error occuring. I've managed to get 3 simultaneous streams from 3 webcams with |
Using the patched kernel under https://github.com/P33M/linux with uvccapture on my Raspberry Pi, now delivers un-garbled images at 1920x1080 very quickly. For me this patch fixed the webcam related issues I had before. |
@Rayn0r |
@popcornmix |
Block layer will allocate a spinlock for the queue if the driver does not provide one in blk_init_queue(). The reason to use the internal spinlock is that blk_cleanup_queue() will switch to use the internal spinlock in the cleanup code path. if (q->queue_lock != &q->__queue_lock) q->queue_lock = &q->__queue_lock; However, processes which are in D state might have taken the driver provided spinlock, when the processes wake up, they would release the block provided spinlock. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0-rc7+ raspberrypi#238 Not tainted ------------------------------------- fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/3587: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff8132661a>] get_request_wait+0x19a/0x250 Other drivers use block layer provided spinlock as well, e.g. SCSI. Switching to the block layer provided spinlock saves a bit of memory and does not increase lock contention. Performance test shows no real difference is observed before and after this patch. Changes in v2: Improve commit log as Michael suggested. Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Asias He <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
Original implementation commit e54bcde ("arm64: eBPF JIT compiler") had the relevant code paths, but due to an oversight always fail jiting. As a result, we had been falling back to BPF interpreter whenever a BPF program has JMP_JSET_{X,K} instructions. With this fix, we confirm that the corresponding tests in lib/test_bpf continue to pass, and also jited. ... [ 2.784553] test_bpf: raspberrypi#30 JSET jited:1 188 192 197 PASS [ 2.791373] test_bpf: raspberrypi#31 tcpdump port 22 jited:1 325 677 625 PASS [ 2.808800] test_bpf: raspberrypi#32 tcpdump complex jited:1 323 731 991 PASS ... [ 3.190759] test_bpf: raspberrypi#237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS [ 3.192524] test_bpf: raspberrypi#238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS [ 3.211014] test_bpf: raspberrypi#249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS [ 3.212973] test_bpf: raspberrypi#250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS ... Fixes: e54bcde ("arm64: eBPF JIT compiler") Signed-off-by: Zi Shen Lim <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Yang Shi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The error path in podhd_init() tries to clear the pending timer, while the timer object is initialized at the end of init sequence, thus it may hit the uninitialized object, as spotted by syzkaller: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769 __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385 lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002 del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237 podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299 line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783 podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474 .... For addressing it, assure the initializations of timer and work by moving them to the beginning of podhd_init(). Fixes: 790869d ("ALSA: line6: Add support for POD X3") Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Cc: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
commit cb02ffc upstream. The error path in podhd_init() tries to clear the pending timer, while the timer object is initialized at the end of init sequence, thus it may hit the uninitialized object, as spotted by syzkaller: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769 __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385 lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002 del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237 podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299 line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783 podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474 .... For addressing it, assure the initializations of timer and work by moving them to the beginning of podhd_init(). Fixes: 790869d ("ALSA: line6: Add support for POD X3") Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit cb02ffc upstream. The error path in podhd_init() tries to clear the pending timer, while the timer object is initialized at the end of init sequence, thus it may hit the uninitialized object, as spotted by syzkaller: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769 __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385 lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002 del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237 podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299 line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783 podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474 .... For addressing it, assure the initializations of timer and work by moving them to the beginning of podhd_init(). Fixes: 790869d ("ALSA: line6: Add support for POD X3") Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 2c95a32 upstream. Block layer will allocate a spinlock for the queue if the driver does not provide one in blk_init_queue(). The reason to use the internal spinlock is that blk_cleanup_queue() will switch to use the internal spinlock in the cleanup code path. if (q->queue_lock != &q->__queue_lock) q->queue_lock = &q->__queue_lock; However, processes which are in D state might have taken the driver provided spinlock, when the processes wake up, they would release the block provided spinlock. ===================================== [ BUG: bad unlock balance detected! ] 3.4.0-rc7+ raspberrypi#238 Not tainted ------------------------------------- fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at: [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380 but there are no more locks to release! other info that might help us debug this: 1 lock held by fio/3587: #0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff8132661a>] get_request_wait+0x19a/0x250 Other drivers use block layer provided spinlock as well, e.g. SCSI. Switching to the block layer provided spinlock saves a bit of memory and does not increase lock contention. Performance test shows no real difference is observed before and after this patch. Changes in v2: Improve commit log as Michael suggested. Cc: [email protected] Cc: [email protected] Signed-off-by: Asias He <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <[email protected]>
commit 2a4340c upstream. As syzkaller spotted, currently line6 drivers submit a URB with the fixed EP without checking whether it's actually available, which may result in a kernel warning like: usb 1-1: BOGUS urb xfer, pipe 3 != type 1 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 24 at drivers/usb/core/urb.c:449 usb_submit_urb+0xf8a/0x11d0 Modules linked in: CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: line6_start_listen+0x55f/0x9e0 sound/usb/line6/driver.c:82 line6_init_cap_control sound/usb/line6/driver.c:690 line6_probe+0x7c9/0x1310 sound/usb/line6/driver.c:764 podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 .... This patch adds a sanity check of validity of EPs at the device initialization phase for avoiding the call with an invalid EP. Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
commit 6815a0b upstream. As syzkaller spotted, currently bcd2000 driver submits a URB with the fixed EP without checking whether it's actually available, which may result in a kernel warning like: usb 1-1: BOGUS urb xfer, pipe 1 != type 3 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1846 at drivers/usb/core/urb.c:449 usb_submit_urb+0xf8a/0x11d0 Modules linked in: CPU: 0 PID: 1846 Comm: kworker/0:2 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: bcd2000_init_device sound/usb/bcd2000/bcd2000.c:289 bcd2000_init_midi sound/usb/bcd2000/bcd2000.c:345 bcd2000_probe+0xe64/0x19e0 sound/usb/bcd2000/bcd2000.c:406 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 .... This patch adds a sanity check of validity of EPs at the device initialization phase for avoiding the call with an invalid EP. Reported-by: Andrey Konovalov <[email protected]> Tested-by: Andrey Konovalov <[email protected]> Signed-off-by: Takashi Iwai <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit c42a0e2 ] We met NULL pointer BUG as follow: [ 151.760358] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 [ 151.761340] PGD 80000001011eb067 P4D 80000001011eb067 PUD 1011ea067 PMD 0 [ 151.762039] Oops: 0000 [#1] SMP PTI [ 151.762406] Modules linked in: [ 151.762723] CPU: 2 PID: 3561 Comm: mdadm-test Kdump: loaded Not tainted 4.17.0-rc1+ #238 [ 151.763542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 151.764432] RIP: 0010:remove_and_add_spares.part.56+0x13c/0x3a0 [ 151.765061] RSP: 0018:ffffc90001d7fcd8 EFLAGS: 00010246 [ 151.765590] RAX: 0000000000000000 RBX: ffff88013601d600 RCX: 0000000000000000 [ 151.766306] RDX: 0000000000000000 RSI: ffff88013601d600 RDI: ffff880136187000 [ 151.767014] RBP: ffff880136187018 R08: 0000000000000003 R09: 0000000000000051 [ 151.767728] R10: ffffc90001d7fed8 R11: 0000000000000000 R12: ffff88013601d600 [ 151.768447] R13: ffff8801298b1300 R14: ffff880136187000 R15: 0000000000000000 [ 151.769160] FS: 00007f2624276700(0000) GS:ffff88013ae80000(0000) knlGS:0000000000000000 [ 151.769971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 151.770554] CR2: 0000000000000060 CR3: 0000000111aac000 CR4: 00000000000006e0 [ 151.771272] Call Trace: [ 151.771542] md_ioctl+0x1df2/0x1e10 [ 151.771906] ? __switch_to+0x129/0x440 [ 151.772295] ? __schedule+0x244/0x850 [ 151.772672] blkdev_ioctl+0x4bd/0x970 [ 151.773048] block_ioctl+0x39/0x40 [ 151.773402] do_vfs_ioctl+0xa4/0x610 [ 151.773770] ? dput.part.23+0x87/0x100 [ 151.774151] ksys_ioctl+0x70/0x80 [ 151.774493] __x64_sys_ioctl+0x16/0x20 [ 151.774877] do_syscall_64+0x5b/0x180 [ 151.775258] entry_SYSCALL_64_after_hwframe+0x44/0xa9 For raid6, when two disk of the array are offline, two spare disks can be added into the array. Before spare disks recovery completing, system reboot and mdadm thinks it is ok to restart the degraded array by md_ioctl(). Since disks in raid6 is not only_parity(), raid5_run() will abort, when there is no PPL feature or not setting 'start_dirty_degraded' parameter. Therefore, mddev->pers is NULL. But, mddev->raid_disks has been set and it will not be cleared when raid5_run abort. md_ioctl() can execute cmd 'HOT_REMOVE_DISK' to remove a disk by mdadm, which will cause NULL pointer dereference in remove_and_add_spares() finally. Signed-off-by: Yufen Yu <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
[ Upstream commit c42a0e2 ] We met NULL pointer BUG as follow: [ 151.760358] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060 [ 151.761340] PGD 80000001011eb067 P4D 80000001011eb067 PUD 1011ea067 PMD 0 [ 151.762039] Oops: 0000 [#1] SMP PTI [ 151.762406] Modules linked in: [ 151.762723] CPU: 2 PID: 3561 Comm: mdadm-test Kdump: loaded Not tainted 4.17.0-rc1+ #238 [ 151.763542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 151.764432] RIP: 0010:remove_and_add_spares.part.56+0x13c/0x3a0 [ 151.765061] RSP: 0018:ffffc90001d7fcd8 EFLAGS: 00010246 [ 151.765590] RAX: 0000000000000000 RBX: ffff88013601d600 RCX: 0000000000000000 [ 151.766306] RDX: 0000000000000000 RSI: ffff88013601d600 RDI: ffff880136187000 [ 151.767014] RBP: ffff880136187018 R08: 0000000000000003 R09: 0000000000000051 [ 151.767728] R10: ffffc90001d7fed8 R11: 0000000000000000 R12: ffff88013601d600 [ 151.768447] R13: ffff8801298b1300 R14: ffff880136187000 R15: 0000000000000000 [ 151.769160] FS: 00007f2624276700(0000) GS:ffff88013ae80000(0000) knlGS:0000000000000000 [ 151.769971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 151.770554] CR2: 0000000000000060 CR3: 0000000111aac000 CR4: 00000000000006e0 [ 151.771272] Call Trace: [ 151.771542] md_ioctl+0x1df2/0x1e10 [ 151.771906] ? __switch_to+0x129/0x440 [ 151.772295] ? __schedule+0x244/0x850 [ 151.772672] blkdev_ioctl+0x4bd/0x970 [ 151.773048] block_ioctl+0x39/0x40 [ 151.773402] do_vfs_ioctl+0xa4/0x610 [ 151.773770] ? dput.part.23+0x87/0x100 [ 151.774151] ksys_ioctl+0x70/0x80 [ 151.774493] __x64_sys_ioctl+0x16/0x20 [ 151.774877] do_syscall_64+0x5b/0x180 [ 151.775258] entry_SYSCALL_64_after_hwframe+0x44/0xa9 For raid6, when two disk of the array are offline, two spare disks can be added into the array. Before spare disks recovery completing, system reboot and mdadm thinks it is ok to restart the degraded array by md_ioctl(). Since disks in raid6 is not only_parity(), raid5_run() will abort, when there is no PPL feature or not setting 'start_dirty_degraded' parameter. Therefore, mddev->pers is NULL. But, mddev->raid_disks has been set and it will not be cleared when raid5_run abort. md_ioctl() can execute cmd 'HOT_REMOVE_DISK' to remove a disk by mdadm, which will cause NULL pointer dereference in remove_and_add_spares() finally. Signed-off-by: Yufen Yu <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Sasha Levin <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…riptors commit 68035c8 upstream. Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked up the following WARNING from the UVC chain scanning code: | list_add double add: new=ffff880069084010, prev=ffff880069084010, | next=ffff880067d22298. | ------------[ cut here ]------------ | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0 | Modules linked in: | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted | 4.14.0-rc2-42613-g1488251d1a98 #238 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 | Workqueue: usb_hub_wq hub_event | task: ffff88006b01ca40 task.stack: ffff880064358000 | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29 | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286 | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000 | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000 | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010 | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0 | FS: 0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0 | Call Trace: | __list_add ./include/linux/list.h:59 | list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92 | uvc_scan_chain_forward.isra.8+0x373/0x416 | drivers/media/usb/uvc/uvc_driver.c:1471 | uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585 | uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769 | uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104 Looking into the output from usbmon, the interesting part is the following data packet: ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080 00090403 00000e01 00000924 03000103 7c003328 010204db If we drop the lead configuration and interface descriptors, we're left with an output terminal descriptor describing a generic display: /* Output terminal descriptor */ buf[0] 09 buf[1] 24 buf[2] 03 /* UVC_VC_OUTPUT_TERMINAL */ buf[3] 00 /* ID */ buf[4] 01 /* type == 0x0301 (UVC_OTT_DISPLAY) */ buf[5] 03 buf[6] 7c buf[7] 00 /* source ID refers to self! */ buf[8] 33 The problem with this descriptor is that it is self-referential: the source ID of 0 matches itself! This causes the 'struct uvc_entity' representing the display to be added to its chain list twice during 'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is processed directly from the 'dev->entities' list and then again immediately afterwards when trying to follow the source ID in 'uvc_scan_chain_forward()' Add a check before adding an entity to a chain list to ensure that the entity is not already part of a chain. Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/ Cc: <[email protected]> Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver") Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…riptors commit 68035c8 upstream. Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked up the following WARNING from the UVC chain scanning code: | list_add double add: new=ffff880069084010, prev=ffff880069084010, | next=ffff880067d22298. | ------------[ cut here ]------------ | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0 | Modules linked in: | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted | 4.14.0-rc2-42613-g1488251d1a98 #238 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 | Workqueue: usb_hub_wq hub_event | task: ffff88006b01ca40 task.stack: ffff880064358000 | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29 | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286 | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000 | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000 | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010 | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0 | FS: 0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0 | Call Trace: | __list_add ./include/linux/list.h:59 | list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92 | uvc_scan_chain_forward.isra.8+0x373/0x416 | drivers/media/usb/uvc/uvc_driver.c:1471 | uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585 | uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769 | uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104 Looking into the output from usbmon, the interesting part is the following data packet: ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080 00090403 00000e01 00000924 03000103 7c003328 010204db If we drop the lead configuration and interface descriptors, we're left with an output terminal descriptor describing a generic display: /* Output terminal descriptor */ buf[0] 09 buf[1] 24 buf[2] 03 /* UVC_VC_OUTPUT_TERMINAL */ buf[3] 00 /* ID */ buf[4] 01 /* type == 0x0301 (UVC_OTT_DISPLAY) */ buf[5] 03 buf[6] 7c buf[7] 00 /* source ID refers to self! */ buf[8] 33 The problem with this descriptor is that it is self-referential: the source ID of 0 matches itself! This causes the 'struct uvc_entity' representing the display to be added to its chain list twice during 'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is processed directly from the 'dev->entities' list and then again immediately afterwards when trying to follow the source ID in 'uvc_scan_chain_forward()' Add a check before adding an entity to a chain list to ensure that the entity is not already part of a chain. Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/ Cc: <[email protected]> Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver") Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…riptors commit 68035c8 upstream. Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked up the following WARNING from the UVC chain scanning code: | list_add double add: new=ffff880069084010, prev=ffff880069084010, | next=ffff880067d22298. | ------------[ cut here ]------------ | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0 | Modules linked in: | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted | 4.14.0-rc2-42613-g1488251d1a98 #238 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 | Workqueue: usb_hub_wq hub_event | task: ffff88006b01ca40 task.stack: ffff880064358000 | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29 | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286 | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000 | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000 | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010 | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0 | FS: 0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0 | Call Trace: | __list_add ./include/linux/list.h:59 | list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92 | uvc_scan_chain_forward.isra.8+0x373/0x416 | drivers/media/usb/uvc/uvc_driver.c:1471 | uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585 | uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769 | uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104 Looking into the output from usbmon, the interesting part is the following data packet: ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080 00090403 00000e01 00000924 03000103 7c003328 010204db If we drop the lead configuration and interface descriptors, we're left with an output terminal descriptor describing a generic display: /* Output terminal descriptor */ buf[0] 09 buf[1] 24 buf[2] 03 /* UVC_VC_OUTPUT_TERMINAL */ buf[3] 00 /* ID */ buf[4] 01 /* type == 0x0301 (UVC_OTT_DISPLAY) */ buf[5] 03 buf[6] 7c buf[7] 00 /* source ID refers to self! */ buf[8] 33 The problem with this descriptor is that it is self-referential: the source ID of 0 matches itself! This causes the 'struct uvc_entity' representing the display to be added to its chain list twice during 'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is processed directly from the 'dev->entities' list and then again immediately afterwards when trying to follow the source ID in 'uvc_scan_chain_forward()' Add a check before adding an entity to a chain list to ensure that the entity is not already part of a chain. Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/ Cc: <[email protected]> Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver") Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
…riptors commit 68035c8 upstream. Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked up the following WARNING from the UVC chain scanning code: | list_add double add: new=ffff880069084010, prev=ffff880069084010, | next=ffff880067d22298. | ------------[ cut here ]------------ | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0 | Modules linked in: | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted | 4.14.0-rc2-42613-g1488251d1a98 raspberrypi#238 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 | Workqueue: usb_hub_wq hub_event | task: ffff88006b01ca40 task.stack: ffff880064358000 | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29 | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286 | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000 | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000 | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010 | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0 | FS: 0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000 | CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0 | Call Trace: | __list_add ./include/linux/list.h:59 | list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92 | uvc_scan_chain_forward.isra.8+0x373/0x416 | drivers/media/usb/uvc/uvc_driver.c:1471 | uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585 | uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769 | uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104 Looking into the output from usbmon, the interesting part is the following data packet: ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080 00090403 00000e01 00000924 03000103 7c003328 010204db If we drop the lead configuration and interface descriptors, we're left with an output terminal descriptor describing a generic display: /* Output terminal descriptor */ buf[0] 09 buf[1] 24 buf[2] 03 /* UVC_VC_OUTPUT_TERMINAL */ buf[3] 00 /* ID */ buf[4] 01 /* type == 0x0301 (UVC_OTT_DISPLAY) */ buf[5] 03 buf[6] 7c buf[7] 00 /* source ID refers to self! */ buf[8] 33 The problem with this descriptor is that it is self-referential: the source ID of 0 matches itself! This causes the 'struct uvc_entity' representing the display to be added to its chain list twice during 'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is processed directly from the 'dev->entities' list and then again immediately afterwards when trying to follow the source ID in 'uvc_scan_chain_forward()' Add a check before adding an entity to a chain list to ensure that the entity is not already part of a chain. Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/ Cc: <[email protected]> Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver") Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
Symptoms: Moving pictures from a USB webcam are not fluid, they appear jerky or freeze frame. Individual images are delivered with intervals of various lengths between images.
To reproduce the problem:
sudo echo 0xFFFF > /sys/module/uvcvideo/parameters/trace
sudo echo 0x0000 > /sys/module/uvcvideo/parameters/trace
Feb 27 22:09:12 raspberrypi kernel: [ 5359.989325] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:12 raspberrypi kernel: [ 5359.989374] uvcvideo: frame 20 stats: 59/1267/1593 packets, 1/39/1593 pts (early initial), 1592/1593 scr, last pts/stc/sof 3111250099/3120608861/1217
Feb 27 22:09:12 raspberrypi kernel: [ 5360.154221] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.166469] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.174708] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.186959] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.187038] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.187069] uvcvideo: frame 21 stats: 59/1259/1583 packets, 1/39/1583 pts (early initial), 1582/1583 scr, last pts/stc/sof 3120851941/3130210964/1417
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387163] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387237] uvcvideo: frame 22 stats: 59/1274/1599 packets, 1/39/1599 pts (early initial), 1598/1599 scr, last pts/stc/sof 3130453789/3139813066/1617
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387420] uvcvideo: UVC Camera (046d:0991): PTS 3130453789 y 3720.807754 SOF 3720.807754 (x1 2155366602 x2 2156854928 y1 254607360 y2 256638976 SOF offset 250)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387465] uvcvideo: UVC Camera (046d:0991): SOF 3720.807754 y 994343961 ts 5360.593856 buf ts 5360.427484 (x1 246808576/12726/1837 x2 261554176/12951/1868 y1 1000000000 y2 1028159890)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387772] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.388090] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.395368] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.551643] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.568016] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.576270] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.584517] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.588666] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.588705] uvcvideo: frame 23 stats: 57/1265/1583 packets, 1/39/1583 pts (early initial), 1582/1583 scr, last pts/stc/sof 3140055631/3149415169/1817'''
The text was updated successfully, but these errors were encountered: