Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uvcvideo: USB isochronous frame lost #238

Closed
Berryfier opened this issue Mar 1, 2013 · 28 comments
Closed

uvcvideo: USB isochronous frame lost #238

Berryfier opened this issue Mar 1, 2013 · 28 comments
Assignees

Comments

@Berryfier
Copy link

Symptoms: Moving pictures from a USB webcam are not fluid, they appear jerky or freeze frame. Individual images are delivered with intervals of various lengths between images.

To reproduce the problem:

  • use 2013-02-09 raspbian
  • plug in a webcam camera to the Pi USB (camera powered seperately - not from Pi)
  • at a terminal enter as root:
    sudo echo 0xFFFF > /sys/module/uvcvideo/parameters/trace
  • start an application which uses uvcvideo (e.g. mjpg-streamer)
  • after a few seconds, stop the application
  • at a terminal enter as root:
    sudo echo 0x0000 > /sys/module/uvcvideo/parameters/trace
  • look in /var/log/kern.log (example below using UVC Camera Logitech 046d:0991)

Feb 27 22:09:12 raspberrypi kernel: [ 5359.989325] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:12 raspberrypi kernel: [ 5359.989374] uvcvideo: frame 20 stats: 59/1267/1593 packets, 1/39/1593 pts (early initial), 1592/1593 scr, last pts/stc/sof 3111250099/3120608861/1217
Feb 27 22:09:12 raspberrypi kernel: [ 5360.154221] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.166469] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.174708] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.186959] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.187038] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.187069] uvcvideo: frame 21 stats: 59/1259/1583 packets, 1/39/1583 pts (early initial), 1582/1583 scr, last pts/stc/sof 3120851941/3130210964/1417
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387163] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387237] uvcvideo: frame 22 stats: 59/1274/1599 packets, 1/39/1599 pts (early initial), 1598/1599 scr, last pts/stc/sof 3130453789/3139813066/1617
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387420] uvcvideo: UVC Camera (046d:0991): PTS 3130453789 y 3720.807754 SOF 3720.807754 (x1 2155366602 x2 2156854928 y1 254607360 y2 256638976 SOF offset 250)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387465] uvcvideo: UVC Camera (046d:0991): SOF 3720.807754 y 994343961 ts 5360.593856 buf ts 5360.427484 (x1 246808576/12726/1837 x2 261554176/12951/1868 y1 1000000000 y2 1028159890)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.387772] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.388090] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Feb 27 22:09:12 raspberrypi kernel: [ 5360.395368] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.551643] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.568016] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.576270] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.584517] uvcvideo: USB isochronous frame lost (-63).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.588666] uvcvideo: Frame complete (EOF found).
Feb 27 22:09:13 raspberrypi kernel: [ 5360.588705] uvcvideo: frame 23 stats: 57/1265/1583 packets, 1/39/1583 pts (early initial), 1582/1583 scr, last pts/stc/sof 3140055631/3149415169/1817'''

@ghost ghost assigned ghollingworth Mar 1, 2013
@P33M
Copy link
Contributor

P33M commented Mar 13, 2013

Having spent a while with GPIO/oscilloscope debugging while sorting out other issues, I can say the following:

There is a very large chunk of time spent in_interrupt() when a set of isoc transcations complete, especially with uvcvideo transfers. The driver by default queues 5x URBs of 32 transactions each. Once a 32-burst transaction is complete, the uvcvideo completion handler is called from within the dwc_otg interrupt handler and it blocks for anything up to 700uS. This will cause multiple queued isoc transactions to get dropped due to frame overrun, as seen with the -63 return code. Typical time spent in_interrupt() for a normal xfercomp interrupt handle (no URB completion) is approx 30uS. Still huge, but at least less than a microframe.

I believe this is a rather large smoking gun - given that the dwc driver coders apparently have never heard of bottom-half interrupt handlers, it would definitely be worth splitting completion functions off into a softirq/tasklet.

@ghollingworth
Copy link

Except the xfer completion interrupt needs to be called in interrupt
context...

Gordon

On Wednesday, 13 March 2013, P33M wrote:

Having spent a while with GPIO/oscilloscope debugging while sorting out
other issues, I can say the following:

There is a very large chunk of time spent in_interrupt() when a set of
isoc transcations complete, especially with uvcvideo transfers. The driver
by default queues 5x URBs of 32 transactions each. Once a 32-burst
transaction is complete, the uvcvideo completion handler is called from
within the dwc_otg interrupt handler and it blocks for anything up to
700uS. This will cause multiple queued isoc transactions to get dropped due
to frame overrun, as seen with the -63 return code. Typical time spent
in_interrupt() for a normal xfercomp interrupt handle (no URB completion)
is approx 30uS. Still huge, but at least less than a microframe.

I believe this is a rather large smoking gun - given that the dwc driver
coders apparently have never heard of bottom-half interrupt handlers, it
would definitely be worth splitting completion functions off into a
softirq/tasklet.


Reply to this email directly or view it on GitHubhttps://github.com//issues/238#issuecomment-14834492
.

@ghollingworth
Copy link

ghollingworth commented Mar 13, 2013

Sorry I mean the transfer completion callback

Gordon

Gordon Hollingworth, Director of Engineering

On 13 Mar 2013, at 10:44, Gordon Hollingworth [email protected] wrote:

Except the xfer completion interrupt needs to be called in interrupt context...

Gordon

On Wednesday, 13 March 2013, P33M wrote:
Having spent a while with GPIO/oscilloscope debugging while sorting out other issues, I can say the following:

There is a very large chunk of time spent in_interrupt() when a set of isoc transcations complete, especially with uvcvideo transfers. The driver by default queues 5x URBs of 32 transactions each. Once a 32-burst transaction is complete, the uvcvideo completion handler is called from within the dwc_otg interrupt handler and it blocks for anything up to 700uS. This will cause multiple queued isoc transactions to get dropped due to frame overrun, as seen with the -63 return code. Typical time spent in_interrupt() for a normal xfercomp interrupt handle (no URB completion) is approx 30uS. Still huge, but at least less than a microframe.

I believe this is a rather large smoking gun - given that the dwc driver coders apparently have never heard of bottom-half interrupt handlers, it would definitely be worth splitting completion functions off into a softirq/tasklet.


Reply to this email directly or view it on GitHub.

@P33M
Copy link
Contributor

P33M commented Mar 13, 2013

So what would break if we split URB completion off into a tasklet? If using the high priority softirq, nothing else scheduling-wise would be allowed to run until it completed but the myriad of hardware interrupts that the dwc usb core generates would still get serviced.

The only thing I could think of would be multiple calls to a completion tasklet if we got multiple URBs completing in a very short space of time - needing some sort of queuing mechanism should there be a pileup of completed transfers.

@Berryfier: have you tried tweaking the driver parameters? Try modprobe with nodrop=1 which should deliver corrupted frames rather than no frames.

@Berryfier
Copy link
Author

@P33M : thank you for your suggestion to tweak the driver parameters. With the camera unplugged, and as root, I entered 'sudo modprobe uvcvideo nodrop=1'. I then plugged in the camera, and started guvcview. The image motion was much more fluid, there were no freeze frames, and the frame rates achieved were much greater; 15fps@320x240, 7 fps@640x480. This is all with the camera sending MJPEG. I notice that with the tweak the CPU is maxed out with the 640x480 picture@7fps. Without the tweak, CPU use is always low, whatever the picture size, the display rarely exceeding 1fps. I also ran the trace and checked the kern.log. I was surprised to still see the large amount of uvcvideo: USB isochronous frame lost (-63) which appears as frequently as occurs without the tweak. It seems a fix is still needed. I've not (yet) noticed any corruption of the pictures either with or without the tweak.

@ghost
Copy link

ghost commented Mar 19, 2013

Is it possible this issue is somehow linked to issue #199 ?

@P33M
Copy link
Contributor

P33M commented Mar 21, 2013

Uh

I think I fixed it...

https://gist.github.com/P33M/5213278

What I did was implement a tasklet to split the return of the URB to the various device drivers from the time we spend sitting with interrupts disabled. Previously my crummy uncompressed-frame webcam would fail repeatedly with lost isoc packets when trying uvccapture, now everything works first time.

Comments on the code would be appreciated before I submit a pull request. This is a fairly big structural change for the driver.

@ghost
Copy link

ghost commented Mar 21, 2013

I'll test it this evening and let you know. Thanks for the effort.

@ghollingworth
Copy link

ghollingworth commented Mar 21, 2013

Would be interesting to now measure the worst case latency with this change, if this reduces down to <uframe time then it make the FIQ stuff not really needed…

Gordon

Gordon Hollingworth, Director of Engineering

On 21 Mar 2013, at 14:13, RKlauco [email protected] wrote:

I'll test it this evening and let you know. Thanks for the effort.


Reply to this email directly or view it on GitHub.

@P33M
Copy link
Contributor

P33M commented Mar 21, 2013

After playing with it a bit it seems stable with ethernet/usb disk/webcam activity.

I installed guvcview to test streaming from the cam - at 640x480 there is still the odd corrupted frame with nodrop=1 but with corrupted frames dropped I am getting approx 60% valid frames (camera is spitting out 30fps) of uncompressed video at 640x480 - a data rate of 17MB/s. Guvcview is maxing out the CPU anyways so I can't tell if the drops are more due to corruption or lack of cycles.

Increasing USB loading, as expected, does not alter the error rate of the video. Other interrupt sources (sd card, timers, etc) appear to cause a packet drop if they take longer than 125uS to complete, or the isoc transaction happens near the end of a uframe and servicing the interrupt takes it past the SOF for the next frame.

My Sidewinder X4 keyboard - previously unusuable on the pi, as of recent kernels more useable - is still missing keypresses though I think slightly less than before.

@ghost
Copy link

ghost commented Mar 21, 2013

Can someone, please, point me out to some guide how to compile the kernel including the patch from P33M?

@P33M
Copy link
Contributor

P33M commented Mar 21, 2013

I've pushed the commit to my repo. https://github.com/P33M/linux

If you follow the standard kernel compilation guide: http://elinux.org/RPi_Kernel_Compilation

But substitute my repo for the official one, you will get the patch. I strongly recommend using cross-compilation from another machine unless you have 12 hours to spare.

Note that you will need to install both the compiled modules AND the kernel.img to your SD card.

@ghost
Copy link

ghost commented Mar 22, 2013

Thanks for the effort and the guide. And the tip to cross-compile ;)
I did it accordingly (replacing the raspberrypi by P33M in github address) and all seemed to work well.
I compiled the kernel image and modules, tgz them and moved both to my Pi into appropriate locations.
The Pi booted (HURAY!!!), uname -a shows
3.6.11+ #1 PREEMPT
But, unfortunately, the result is still the same - the camera complains about Non-zero status and even with nodrop=1 and timeout=5000 parameters, the behaviour is exactly the same. After 5 records, the module simply stops responding.
Is there something I should have set/patched during the kernel compilation? Or is the fix not helping in this case?

@P33M
Copy link
Contributor

P33M commented Mar 22, 2013

@RKlauco That's interesting. It's possible there's another issue - the errors that I get are -63 (they do still occur, but much less than previously) which are due to frame overruns.

  • What camera produces this result?
  • What userland application are you using to stream/record the images?
  • Can you do a lsusb -v for the camera and post the output?

Looking at the driver, -5 is returned for periodic transfers when the transfer halted but there are no other interrupts set. I've not seen them in my case - further testing required!

Edit: -5 is a URB state, -63 is a individual iso_desc state - so rather than individual transfers failing, your URB is being aborted.

@ghost
Copy link

ghost commented Mar 22, 2013

I'll test this night with different camera - Microsoft LifeCam HD-3000,
also using 1280x720 just for fun ;)
I'll update the status than.

On Fri, Mar 22, 2013 at 10:28 AM, P33M [email protected] wrote:

@RKlauco https://github.com/RKlauco That's interesting. It's possible
there's another issue - the errors that I get are -63 (they do still occur,
but much less than previously) which are due to frame overruns.

  • What userland application are you using to stream/record the images?
  • Can you do a lsusb -v for the camera and post the output?

Looking at the driver, -5 is returned for periodic transfers when the
transfer halted but there are no other interrupts set. I've not seen them
in my case - further testing required!


Reply to this email directly or view it on GitHubhttps://github.com//issues/238#issuecomment-15287653
.

@P33M
Copy link
Contributor

P33M commented Mar 23, 2013

I've done some cross comparison between the current HEAD and #255 - and borrowed a Logitech Quickcam Pro 5000 (uvcvideo device, but with a much broader feature set than my ebay special) which supports multiple framerates and mjpeg capture.

For both cameras the difference is, well, an order of magnitude. With YUYV mode on the Logitech I was getting about 1/10th of a frame transmitted OK on any resolution >320x240 - and a substantially reduced framerate with mjpg. CPU usage was also much higher I assume due to uvcvideo trying to reassemble frames which included lost packets.

I can also confirm that this change significantly improves the usability of my Sidewinder X4 keyboard used in X11 - from maddeningly frequent keystroke misses to a couple every sentence.

@ghost
Copy link

ghost commented Mar 23, 2013

Funny thing - I am doing some tests with the mentioned MS LifeCam HD-3000
(1280x720 px) and no error within logs whatsoever. The motion runs
smoothly, although it takes 75% of CPU for single camera.
I also tested the non-fixed, original kernel and that one works, too.
I'll have to dig a bit deeper and test only using the crappy MS VX-700
webcam.
Tomorrow I'll post some results.

On Sat, Mar 23, 2013 at 9:26 PM, P33M [email protected] wrote:

I've done some cross comparison between the current HEAD and #255https://github.com/raspberrypi/linux/issues/255\- and borrowed a Logitech Quickcam Pro 5000 (uvcvideo device, but with a
much broader feature set than my ebay special) which supports multiple
framerates and mjpeg capture.

For both cameras the difference is, well, an order of magnitude. With YUYV
mode on the Logitech I was getting about 1/10th of a frame transmitted OK
on any resolution >320x240 - and a substantially reduced framerate with
mjpg. CPU usage was also much higher I assume due to uvcvideo trying to
reassemble frames which included lost packets.

I can also confirm that this change significantly improves the usability
of my Sidewinder X4 keyboard used in X11 - from maddeningly frequent
keystroke misses to a couple every sentence.


Reply to this email directly or view it on GitHubhttps://github.com//issues/238#issuecomment-15344109
.

@Berryfier
Copy link
Author

After rpi-update with the tasklet there are now on average just 1% uvcvideo: USB isochronous frame lost (-63). The ones that are lost seem to be grouped. To help a decision on whether the bug issue can be closed here is a kern.log extract with the largest grouping found.
Mar 25 10:25:41 raspberrypi kernel: [ 734.924825] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 734.924895] uvcvideo: frame 3110 stats: 16/258/533 packets, 1/11/533 pts (early initial), 532/533 scr, last pts/stc/sof 3393711660/3396838904/1177
Mar 25 10:25:41 raspberrypi kernel: [ 734.925043] uvcvideo: UVC Camera (046d:0991): PTS 3393711660 y 3337.238159 SOF 3337.238159 (x1 2149152577 x2 2150640899 y1 220987392 y2 223019008 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 734.925084] uvcvideo: UVC Camera (046d:0991): SOF 3337.238159 y 1000297238 ts 734.802418 buf ts 734.766231 (x1 217317376/9204/1324 x2 349569024/9430/1355 y1 1000000000 y2 1028242902)
Mar 25 10:25:41 raspberrypi kernel: [ 734.925663] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 734.926276] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 734.992755] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 734.992859] uvcvideo: frame 3111 stats: 16/259/533 packets, 1/12/533 pts (early initial), 532/533 scr, last pts/stc/sof 3396912178/3400037594/1244
Mar 25 10:25:41 raspberrypi kernel: [ 734.993045] uvcvideo: UVC Camera (046d:0991): PTS 3396912178 y 3403.901123 SOF 3403.901123 (x1 2149216765 x2 2150705084 y1 225443840 y2 227475456 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 734.993087] uvcvideo: UVC Camera (046d:0991): SOF 3403.901123 y 1007007491 ts 734.877129 buf ts 734.830356 (x1 219414528/9748/1392 x2 234160128/9973/1423 y1 1000000000 y2 1028204899)
Mar 25 10:25:41 raspberrypi kernel: [ 734.993661] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 734.994285] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.060755] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 735.060872] uvcvideo: frame 3112 stats: 17/261/534 packets, 1/12/534 pts (early initial), 533/534 scr, last pts/stc/sof 3400112696/3403242287/1310
Mar 25 10:25:41 raspberrypi kernel: [ 735.061114] uvcvideo: UVC Camera (046d:0991): PTS 3400112696 y 3470.564285 SOF 3470.564285 (x1 2149280951 x2 2150769273 y1 229900288 y2 231931904 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 735.061156] uvcvideo: UVC Camera (046d:0991): SOF 3470.564285 y 1011332920 ts 734.949460 buf ts 734.898298 (x1 221511680/10292/1460 x2 236322816/10518/1491 y1 1000000000 y2 1028280905)
Mar 25 10:25:41 raspberrypi kernel: [ 735.061742] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.062367] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.096988] uvcvideo: USB isochronous frame lost (-63).
Mar 25 10:25:41 raspberrypi kernel: [ 735.105238] uvcvideo: USB isochronous frame lost (-63).
Mar 25 10:25:41 raspberrypi kernel: [ 735.125121] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 735.125203] uvcvideo: frame 3113 stats: 16/255/529 packets, 1/11/529 pts (early initial), 528/529 scr, last pts/stc/sof 3403313213/3406440977/1377
Mar 25 10:25:41 raspberrypi kernel: [ 735.125382] uvcvideo: UVC Camera (046d:0991): PTS 3403313213 y 3537.227355 SOF 3537.227355 (x1 2149153096 x2 2150641418 y1 234094592 y2 236126208 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 735.125424] uvcvideo: UVC Camera (046d:0991): SOF 5585.227355 y 1027373519 ts 735.029690 buf ts 734.966316 (x1 238354432/10805/1524 x2 370671616/11032/1555 y1 1000000000 y2 1028367909)
Mar 25 10:25:41 raspberrypi kernel: [ 735.126026] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.126649] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.137199] uvcvideo: USB isochronous frame lost (-63).
Mar 25 10:25:41 raspberrypi kernel: [ 735.145394] uvcvideo: USB isochronous frame lost (-63).
Mar 25 10:25:41 raspberrypi kernel: [ 735.193280] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 735.193384] uvcvideo: frame 3114 stats: 16/256/529 packets, 1/12/529 pts (early initial), 528/529 scr, last pts/stc/sof 3406513731/3409639666/1444
Mar 25 10:25:41 raspberrypi kernel: [ 735.193560] uvcvideo: UVC Camera (046d:0991): PTS 3406513731 y 3603.890411 SOF 3603.890411 (x1 2149265293 x2 2150753615 y1 238616576 y2 240648192 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 735.193602] uvcvideo: UVC Camera (046d:0991): SOF 5651.890411 y 1027654593 ts 735.098383 buf ts 735.030677 (x1 240648192/11352/1593 x2 372899840/11578/1624 y1 1000000000 y2 1028186898)
Mar 25 10:25:41 raspberrypi kernel: [ 735.194183] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.196742] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.197408] uvcvideo: USB isochronous frame lost (-63).
Mar 25 10:25:41 raspberrypi kernel: [ 735.257501] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 735.257577] uvcvideo: frame 3115 stats: 17/262/531 packets, 1/12/531 pts (early initial), 530/531 scr, last pts/stc/sof 3409714248/3412838357/1510
Mar 25 10:25:41 raspberrypi kernel: [ 735.257731] uvcvideo: UVC Camera (046d:0991): PTS 3409714248 y 3670.553359 SOF 3670.553359 (x1 2149137441 x2 2150625760 y1 242810880 y2 244842496 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 735.257771] uvcvideo: UVC Camera (046d:0991): SOF 3670.553359 y 999965765 ts 735.134794 buf ts 735.098839 (x1 240713728/11865/1657 x2 372965376/12091/1688 y1 1000000000 y2 1028236901)
Mar 25 10:25:41 raspberrypi kernel: [ 735.258376] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.258957] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.325442] uvcvideo: Frame complete (EOF found).
Mar 25 10:25:41 raspberrypi kernel: [ 735.325543] uvcvideo: frame 3116 stats: 17/264/534 packets, 1/12/534 pts (early initial), 533/534 scr, last pts/stc/sof 3412914766/3416043049/1577
Mar 25 10:25:41 raspberrypi kernel: [ 735.325713] uvcvideo: UVC Camera (046d:0991): PTS 3412914766 y 3737.216476 SOF 3737.216476 (x1 2149201627 x2 2150689948 y1 247267328 y2 249298944 SOF offset 177)
Mar 25 10:25:41 raspberrypi kernel: [ 735.325755] uvcvideo: UVC Camera (046d:0991): SOF 3737.216476 y 1004024336 ts 735.206848 buf ts 735.163060 (x1 242810880/12409/1725 x2 257622016/12635/1756 y1 1000000000 y2 1028230900)
Mar 25 10:25:41 raspberrypi kernel: [ 735.326330] uvcvideo: uvc_v4l2_ioctl(VIDIOC_QBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.326944] uvcvideo: uvc_v4l2_ioctl(VIDIOC_DQBUF)
Mar 25 10:25:41 raspberrypi kernel: [ 735.393416] uvcvideo: Frame complete (EOF found).

@P33M
Copy link
Contributor

P33M commented Mar 25, 2013

@Berryfier - what are you doing with the webcam output - saving to the SD card? Are you also in LXDE or just terminal mode?

I have noticed that other interrupt sources, if active, can cause dropped isoc packets. I get marginally fewer lost isoc packets if saving to a USB pendrive than the SD card.

@Berryfier
Copy link
Author

@P33M. I was using mjpg-streamer 960x720 10fps, displaying pictures with omxplayer, and all done on the pi in terminal mode (using HDMI, USB keyboard and mouse, and two terminals F1 and F2 ). An ethernet wired network is connected, but I wasn't using the pi for anything else. Maybe the pi's own housekeeping activity is sufficient to cause these now rare dropped isoc packets. As regards picture flow, I don't see any further improvement since the nodrop tweak, which I still have in place. The noticeable difference is the elimination of most 'frame lost' messages from the log.

@Berryfier
Copy link
Author

@P33M BTW, I would also say that later on, when using guvcview in the LXDE, I noticed that CPU indicator was lower than before for the various frame rates and picture sizes, and that higher frame rates were achieved as the CPU maxed out.

@ghost
Copy link

ghost commented Mar 25, 2013

OK. I tested the latest changes. I installed everything according the instructions (last time I forgot the files from firmware to put into /boot directory.
This time, the progress is much better.
Now, the HD3000 webcam works with no problem, even streaming over ethernet at 1280x720 ~2-3fps (quite impressive).
On 640x360 it gives more than 5fps, still sufficient (although there is 3-4 seconds delay).
The (until now) buggy VX-700 webcam works too!!! No errors in log, still works for 5-6 minutes (not enough time to test this minute, I'll test this evening. Anyhow, 6 minutes with no errors is a clean record.
The mjpg_streamer takes aprox. 3% of CPU no matter what resolution/framerate I select, which seems reasonable, as the camera is sending encoded mjpeg stream directly over the network and the software is not doing any calculations.

Congratulations on the fix, btw.

@ghost
Copy link

ghost commented Mar 25, 2013

Oh, one more significant improvement - until now, the HD3000 did have color artifacts all over bottom half of the picture. This is also fixed!
hd3000

@P33M
Copy link
Contributor

P33M commented Mar 31, 2013

While much improved, the story with 2 webcams at once is a bit different, especially with large transfer sizes (used with uncompressed video streams).

The core USB controller will "schedule" each host channel's access to the bus during a microframe. Most of the time the transactions are bunched up toward the start of the microframe (and successful) but for some reason sometimes they are pushed to start in the latter half. This effect is exacerbated with more HCs enabled/demanding bus time.

Even with the tasklet patch, the dwc_otg irq handler still takes a good 15-20uS on a xfercomp interrupt which has consequences for the next HC assigned an isoc transaction further along in the microframe - by the time the ARM has finished with the first transaction, the microframe has already overrun causing a packet drop.

Suggested mitigations are:-

  • Use lower resolutions on uncompressed streams
  • Force capture programs to use compressed streams (mjpg or h264)

Reducing the packet size appears to drastically reduce the chance of an overrun error occuring. I've managed to get 3 simultaneous streams from 3 webcams with
480x360 MJPG
352x288 YUYV
352x288 YUYV
and only a small amount of garbling on webcam 2, but with 2 @ 640x480 YUYV it ends up with one webcam being nearly unusable.

@Rayn0r
Copy link
Contributor

Rayn0r commented Apr 13, 2013

Using the patched kernel under https://github.com/P33M/linux with uvccapture on my Raspberry Pi, now delivers un-garbled images at 1920x1080 very quickly. For me this patch fixed the webcam related issues I had before.

@popcornmix
Copy link
Collaborator

@Rayn0r
Is that any different to runnning the kernel you get from rpi-update? I believe that has all P33M's fixes in.

@Rayn0r
Copy link
Contributor

Rayn0r commented Apr 14, 2013

@popcornmix
Judging from your comment, I assume there is a difference between running "apt-get update/upgrade" and "rpi-update". If so, I have not upgraded to the most recent kernel on my device so far. All I did was cross-compile the above mentioned kernel-repository and copy the result to my Raspberry.
Looking at the commits for both repos, I'd say the fix is contained in both of them.

@hoeken
Copy link

hoeken commented Apr 15, 2013

i just wanted to chime in and say this fix is awesome. completely fixed my USB webcam issues!

webcam-corrupted
webcam

ssvb pushed a commit to ssvb/linux-rpi that referenced this issue Jun 4, 2013
Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

        if (q->queue_lock != &q->__queue_lock)
                q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ raspberrypi#238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Asias He <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
@P33M P33M closed this as completed May 5, 2014
davet321 pushed a commit to davet321/rpi-linux that referenced this issue May 16, 2016
Original implementation commit e54bcde ("arm64: eBPF JIT compiler")
had the relevant code paths, but due to an oversight always fail jiting.

As a result, we had been falling back to BPF interpreter whenever a BPF
program has JMP_JSET_{X,K} instructions.

With this fix, we confirm that the corresponding tests in lib/test_bpf
continue to pass, and also jited.

...
[    2.784553] test_bpf: raspberrypi#30 JSET jited:1 188 192 197 PASS
[    2.791373] test_bpf: raspberrypi#31 tcpdump port 22 jited:1 325 677 625 PASS
[    2.808800] test_bpf: raspberrypi#32 tcpdump complex jited:1 323 731 991 PASS
...
[    3.190759] test_bpf: raspberrypi#237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 110 PASS
[    3.192524] test_bpf: raspberrypi#238 JMP_JSET_K: if (0x3 & 0xffffffff) return 1 jited:1 98 PASS
[    3.211014] test_bpf: raspberrypi#249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 120 PASS
[    3.212973] test_bpf: raspberrypi#250 JMP_JSET_X: if (0x3 & 0xffffffff) return 1 jited:1 89 PASS
...

Fixes: e54bcde ("arm64: eBPF JIT compiler")
Signed-off-by: Zi Shen Lim <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Yang Shi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 16, 2017
The error path in podhd_init() tries to clear the pending timer, while
the timer object is initialized at the end of init sequence, thus it
may hit the uninitialized object, as spotted by syzkaller:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted
  4.14.0-rc2-42613-g1488251d1a98 #238
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: usb_hub_wq hub_event
  Call Trace:
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x292/0x395 lib/dump_stack.c:52
   register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
   __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
   lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
   del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237
   podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299
   line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783
   podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474
   ....

For addressing it, assure the initializations of timer and work by
moving them to the beginning of podhd_init().

Fixes: 790869d ("ALSA: line6: Add support for POD X3")
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 20, 2017
commit cb02ffc upstream.

The error path in podhd_init() tries to clear the pending timer, while
the timer object is initialized at the end of init sequence, thus it
may hit the uninitialized object, as spotted by syzkaller:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted
  4.14.0-rc2-42613-g1488251d1a98 #238
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: usb_hub_wq hub_event
  Call Trace:
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x292/0x395 lib/dump_stack.c:52
   register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
   __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
   lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
   del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237
   podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299
   line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783
   podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474
   ....

For addressing it, assure the initializations of timer and work by
moving them to the beginning of podhd_init().

Fixes: 790869d ("ALSA: line6: Add support for POD X3")
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Oct 20, 2017
commit cb02ffc upstream.

The error path in podhd_init() tries to clear the pending timer, while
the timer object is initialized at the end of init sequence, thus it
may hit the uninitialized object, as spotted by syzkaller:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted
  4.14.0-rc2-42613-g1488251d1a98 #238
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: usb_hub_wq hub_event
  Call Trace:
   __dump_stack lib/dump_stack.c:16
   dump_stack+0x292/0x395 lib/dump_stack.c:52
   register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
   __lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
   lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
   del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237
   podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299
   line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783
   podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474
   ....

For addressing it, assure the initializations of timer and work by
moving them to the beginning of podhd_init().

Fixes: 790869d ("ALSA: line6: Add support for POD X3")
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Terminus-IMRC pushed a commit to Idein/linux that referenced this issue Jan 4, 2018
commit 2c95a32 upstream.

Block layer will allocate a spinlock for the queue if the driver does
not provide one in blk_init_queue().

The reason to use the internal spinlock is that blk_cleanup_queue() will
switch to use the internal spinlock in the cleanup code path.

        if (q->queue_lock != &q->__queue_lock)
                q->queue_lock = &q->__queue_lock;

However, processes which are in D state might have taken the driver
provided spinlock, when the processes wake up, they would release the
block provided spinlock.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0-rc7+ raspberrypi#238 Not tainted
-------------------------------------
fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/3587:
 #0:  (&(&vblk->lock)->rlock){......}, at:
[<ffffffff8132661a>] get_request_wait+0x19a/0x250

Other drivers use block layer provided spinlock as well, e.g. SCSI.

Switching to the block layer provided spinlock saves a bit of memory and
does not increase lock contention. Performance test shows no real
difference is observed before and after this patch.

Changes in v2: Improve commit log as Michael suggested.

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Asias He <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 26, 2018
commit 2a4340c upstream.

As syzkaller spotted, currently line6 drivers submit a URB with the
fixed EP without checking whether it's actually available, which may
result in a kernel warning like:
  usb 1-1: BOGUS urb xfer, pipe 3 != type 1
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 24 at drivers/usb/core/urb.c:449
  usb_submit_urb+0xf8a/0x11d0
  Modules linked in:
  CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: usb_hub_wq hub_event
  Call Trace:
   line6_start_listen+0x55f/0x9e0 sound/usb/line6/driver.c:82
   line6_init_cap_control sound/usb/line6/driver.c:690
   line6_probe+0x7c9/0x1310 sound/usb/line6/driver.c:764
   podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474
   usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
   ....

This patch adds a sanity check of validity of EPs at the device
initialization phase for avoiding the call with an invalid EP.

Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 26, 2018
commit 6815a0b upstream.

As syzkaller spotted, currently bcd2000 driver submits a URB with the
fixed EP without checking whether it's actually available, which may
result in a kernel warning like:
  usb 1-1: BOGUS urb xfer, pipe 1 != type 3
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1846 at drivers/usb/core/urb.c:449
  usb_submit_urb+0xf8a/0x11d0
  Modules linked in:
  CPU: 0 PID: 1846 Comm: kworker/0:2 Not tainted
  4.14.0-rc2-42613-g1488251d1a98 #238
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  Workqueue: usb_hub_wq hub_event
  Call Trace:
   bcd2000_init_device sound/usb/bcd2000/bcd2000.c:289
   bcd2000_init_midi sound/usb/bcd2000/bcd2000.c:345
   bcd2000_probe+0xe64/0x19e0 sound/usb/bcd2000/bcd2000.c:406
   usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
   ....

This patch adds a sanity check of validity of EPs at the device
initialization phase for avoiding the call with an invalid EP.

Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Aug 7, 2018
[ Upstream commit c42a0e2 ]

We met NULL pointer BUG as follow:

[  151.760358] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
[  151.761340] PGD 80000001011eb067 P4D 80000001011eb067 PUD 1011ea067 PMD 0
[  151.762039] Oops: 0000 [#1] SMP PTI
[  151.762406] Modules linked in:
[  151.762723] CPU: 2 PID: 3561 Comm: mdadm-test Kdump: loaded Not tainted 4.17.0-rc1+ #238
[  151.763542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
[  151.764432] RIP: 0010:remove_and_add_spares.part.56+0x13c/0x3a0
[  151.765061] RSP: 0018:ffffc90001d7fcd8 EFLAGS: 00010246
[  151.765590] RAX: 0000000000000000 RBX: ffff88013601d600 RCX: 0000000000000000
[  151.766306] RDX: 0000000000000000 RSI: ffff88013601d600 RDI: ffff880136187000
[  151.767014] RBP: ffff880136187018 R08: 0000000000000003 R09: 0000000000000051
[  151.767728] R10: ffffc90001d7fed8 R11: 0000000000000000 R12: ffff88013601d600
[  151.768447] R13: ffff8801298b1300 R14: ffff880136187000 R15: 0000000000000000
[  151.769160] FS:  00007f2624276700(0000) GS:ffff88013ae80000(0000) knlGS:0000000000000000
[  151.769971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.770554] CR2: 0000000000000060 CR3: 0000000111aac000 CR4: 00000000000006e0
[  151.771272] Call Trace:
[  151.771542]  md_ioctl+0x1df2/0x1e10
[  151.771906]  ? __switch_to+0x129/0x440
[  151.772295]  ? __schedule+0x244/0x850
[  151.772672]  blkdev_ioctl+0x4bd/0x970
[  151.773048]  block_ioctl+0x39/0x40
[  151.773402]  do_vfs_ioctl+0xa4/0x610
[  151.773770]  ? dput.part.23+0x87/0x100
[  151.774151]  ksys_ioctl+0x70/0x80
[  151.774493]  __x64_sys_ioctl+0x16/0x20
[  151.774877]  do_syscall_64+0x5b/0x180
[  151.775258]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

For raid6, when two disk of the array are offline, two spare disks can
be added into the array. Before spare disks recovery completing,
system reboot and mdadm thinks it is ok to restart the degraded
array by md_ioctl(). Since disks in raid6 is not only_parity(),
raid5_run() will abort, when there is no PPL feature or not setting
'start_dirty_degraded' parameter. Therefore, mddev->pers is NULL.

But, mddev->raid_disks has been set and it will not be cleared when
raid5_run abort. md_ioctl() can execute cmd 'HOT_REMOVE_DISK' to
remove a disk by mdadm, which will cause NULL pointer dereference
in remove_and_add_spares() finally.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Aug 7, 2018
[ Upstream commit c42a0e2 ]

We met NULL pointer BUG as follow:

[  151.760358] BUG: unable to handle kernel NULL pointer dereference at 0000000000000060
[  151.761340] PGD 80000001011eb067 P4D 80000001011eb067 PUD 1011ea067 PMD 0
[  151.762039] Oops: 0000 [#1] SMP PTI
[  151.762406] Modules linked in:
[  151.762723] CPU: 2 PID: 3561 Comm: mdadm-test Kdump: loaded Not tainted 4.17.0-rc1+ #238
[  151.763542] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014
[  151.764432] RIP: 0010:remove_and_add_spares.part.56+0x13c/0x3a0
[  151.765061] RSP: 0018:ffffc90001d7fcd8 EFLAGS: 00010246
[  151.765590] RAX: 0000000000000000 RBX: ffff88013601d600 RCX: 0000000000000000
[  151.766306] RDX: 0000000000000000 RSI: ffff88013601d600 RDI: ffff880136187000
[  151.767014] RBP: ffff880136187018 R08: 0000000000000003 R09: 0000000000000051
[  151.767728] R10: ffffc90001d7fed8 R11: 0000000000000000 R12: ffff88013601d600
[  151.768447] R13: ffff8801298b1300 R14: ffff880136187000 R15: 0000000000000000
[  151.769160] FS:  00007f2624276700(0000) GS:ffff88013ae80000(0000) knlGS:0000000000000000
[  151.769971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  151.770554] CR2: 0000000000000060 CR3: 0000000111aac000 CR4: 00000000000006e0
[  151.771272] Call Trace:
[  151.771542]  md_ioctl+0x1df2/0x1e10
[  151.771906]  ? __switch_to+0x129/0x440
[  151.772295]  ? __schedule+0x244/0x850
[  151.772672]  blkdev_ioctl+0x4bd/0x970
[  151.773048]  block_ioctl+0x39/0x40
[  151.773402]  do_vfs_ioctl+0xa4/0x610
[  151.773770]  ? dput.part.23+0x87/0x100
[  151.774151]  ksys_ioctl+0x70/0x80
[  151.774493]  __x64_sys_ioctl+0x16/0x20
[  151.774877]  do_syscall_64+0x5b/0x180
[  151.775258]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

For raid6, when two disk of the array are offline, two spare disks can
be added into the array. Before spare disks recovery completing,
system reboot and mdadm thinks it is ok to restart the degraded
array by md_ioctl(). Since disks in raid6 is not only_parity(),
raid5_run() will abort, when there is no PPL feature or not setting
'start_dirty_degraded' parameter. Therefore, mddev->pers is NULL.

But, mddev->raid_disks has been set and it will not be cleared when
raid5_run abort. md_ioctl() can execute cmd 'HOT_REMOVE_DISK' to
remove a disk by mdadm, which will cause NULL pointer dereference
in remove_and_add_spares() finally.

Signed-off-by: Yufen Yu <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 20, 2020
…riptors

commit 68035c8 upstream.

Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked
up the following WARNING from the UVC chain scanning code:

  | list_add double add: new=ffff880069084010, prev=ffff880069084010,
  | next=ffff880067d22298.
  | ------------[ cut here ]------------
  | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0
  | Modules linked in:
  | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
  | 4.14.0-rc2-42613-g1488251d1a98 #238
  | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  | Workqueue: usb_hub_wq hub_event
  | task: ffff88006b01ca40 task.stack: ffff880064358000
  | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29
  | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286
  | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000
  | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac
  | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000
  | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010
  | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0
  | FS:  0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000
  | CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0
  | Call Trace:
  |  __list_add ./include/linux/list.h:59
  |  list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92
  |  uvc_scan_chain_forward.isra.8+0x373/0x416
  | drivers/media/usb/uvc/uvc_driver.c:1471
  |  uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585
  |  uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769
  |  uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104

Looking into the output from usbmon, the interesting part is the
following data packet:

  ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080
  00090403 00000e01 00000924 03000103 7c003328 010204db

If we drop the lead configuration and interface descriptors, we're left
with an output terminal descriptor describing a generic display:

  /* Output terminal descriptor */
  buf[0]	09
  buf[1]	24
  buf[2]	03	/* UVC_VC_OUTPUT_TERMINAL */
  buf[3]	00	/* ID */
  buf[4]	01	/* type == 0x0301 (UVC_OTT_DISPLAY) */
  buf[5]	03
  buf[6]	7c
  buf[7]	00	/* source ID refers to self! */
  buf[8]	33

The problem with this descriptor is that it is self-referential: the
source ID of 0 matches itself! This causes the 'struct uvc_entity'
representing the display to be added to its chain list twice during
'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is
processed directly from the 'dev->entities' list and then again
immediately afterwards when trying to follow the source ID in
'uvc_scan_chain_forward()'

Add a check before adding an entity to a chain list to ensure that the
entity is not already part of a chain.

Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/

Cc: <[email protected]>
Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver")
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 20, 2020
…riptors

commit 68035c8 upstream.

Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked
up the following WARNING from the UVC chain scanning code:

  | list_add double add: new=ffff880069084010, prev=ffff880069084010,
  | next=ffff880067d22298.
  | ------------[ cut here ]------------
  | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0
  | Modules linked in:
  | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
  | 4.14.0-rc2-42613-g1488251d1a98 #238
  | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  | Workqueue: usb_hub_wq hub_event
  | task: ffff88006b01ca40 task.stack: ffff880064358000
  | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29
  | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286
  | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000
  | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac
  | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000
  | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010
  | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0
  | FS:  0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000
  | CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0
  | Call Trace:
  |  __list_add ./include/linux/list.h:59
  |  list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92
  |  uvc_scan_chain_forward.isra.8+0x373/0x416
  | drivers/media/usb/uvc/uvc_driver.c:1471
  |  uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585
  |  uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769
  |  uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104

Looking into the output from usbmon, the interesting part is the
following data packet:

  ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080
  00090403 00000e01 00000924 03000103 7c003328 010204db

If we drop the lead configuration and interface descriptors, we're left
with an output terminal descriptor describing a generic display:

  /* Output terminal descriptor */
  buf[0]	09
  buf[1]	24
  buf[2]	03	/* UVC_VC_OUTPUT_TERMINAL */
  buf[3]	00	/* ID */
  buf[4]	01	/* type == 0x0301 (UVC_OTT_DISPLAY) */
  buf[5]	03
  buf[6]	7c
  buf[7]	00	/* source ID refers to self! */
  buf[8]	33

The problem with this descriptor is that it is self-referential: the
source ID of 0 matches itself! This causes the 'struct uvc_entity'
representing the display to be added to its chain list twice during
'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is
processed directly from the 'dev->entities' list and then again
immediately afterwards when trying to follow the source ID in
'uvc_scan_chain_forward()'

Add a check before adding an entity to a chain list to ensure that the
entity is not already part of a chain.

Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/

Cc: <[email protected]>
Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver")
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
popcornmix pushed a commit that referenced this issue Feb 20, 2020
…riptors

commit 68035c8 upstream.

Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked
up the following WARNING from the UVC chain scanning code:

  | list_add double add: new=ffff880069084010, prev=ffff880069084010,
  | next=ffff880067d22298.
  | ------------[ cut here ]------------
  | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0
  | Modules linked in:
  | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
  | 4.14.0-rc2-42613-g1488251d1a98 #238
  | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  | Workqueue: usb_hub_wq hub_event
  | task: ffff88006b01ca40 task.stack: ffff880064358000
  | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29
  | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286
  | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000
  | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac
  | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000
  | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010
  | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0
  | FS:  0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000
  | CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0
  | Call Trace:
  |  __list_add ./include/linux/list.h:59
  |  list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92
  |  uvc_scan_chain_forward.isra.8+0x373/0x416
  | drivers/media/usb/uvc/uvc_driver.c:1471
  |  uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585
  |  uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769
  |  uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104

Looking into the output from usbmon, the interesting part is the
following data packet:

  ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080
  00090403 00000e01 00000924 03000103 7c003328 010204db

If we drop the lead configuration and interface descriptors, we're left
with an output terminal descriptor describing a generic display:

  /* Output terminal descriptor */
  buf[0]	09
  buf[1]	24
  buf[2]	03	/* UVC_VC_OUTPUT_TERMINAL */
  buf[3]	00	/* ID */
  buf[4]	01	/* type == 0x0301 (UVC_OTT_DISPLAY) */
  buf[5]	03
  buf[6]	7c
  buf[7]	00	/* source ID refers to self! */
  buf[8]	33

The problem with this descriptor is that it is self-referential: the
source ID of 0 matches itself! This causes the 'struct uvc_entity'
representing the display to be added to its chain list twice during
'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is
processed directly from the 'dev->entities' list and then again
immediately afterwards when trying to follow the source ID in
'uvc_scan_chain_forward()'

Add a check before adding an entity to a chain list to ensure that the
entity is not already part of a chain.

Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/

Cc: <[email protected]>
Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver")
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
margro pushed a commit to margro/linux that referenced this issue May 31, 2020
…riptors

commit 68035c8 upstream.

Way back in 2017, fuzzing the 4.14-rc2 USB stack with syzkaller kicked
up the following WARNING from the UVC chain scanning code:

  | list_add double add: new=ffff880069084010, prev=ffff880069084010,
  | next=ffff880067d22298.
  | ------------[ cut here ]------------
  | WARNING: CPU: 1 PID: 1846 at lib/list_debug.c:31 __list_add_valid+0xbd/0xf0
  | Modules linked in:
  | CPU: 1 PID: 1846 Comm: kworker/1:2 Not tainted
  | 4.14.0-rc2-42613-g1488251d1a98 raspberrypi#238
  | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  | Workqueue: usb_hub_wq hub_event
  | task: ffff88006b01ca40 task.stack: ffff880064358000
  | RIP: 0010:__list_add_valid+0xbd/0xf0 lib/list_debug.c:29
  | RSP: 0018:ffff88006435ddd0 EFLAGS: 00010286
  | RAX: 0000000000000058 RBX: ffff880067d22298 RCX: 0000000000000000
  | RDX: 0000000000000058 RSI: ffffffff85a58800 RDI: ffffed000c86bbac
  | RBP: ffff88006435dde8 R08: 1ffff1000c86ba52 R09: 0000000000000000
  | R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069084010
  | R13: ffff880067d22298 R14: ffff880069084010 R15: ffff880067d222a0
  | FS:  0000000000000000(0000) GS:ffff88006c900000(0000) knlGS:0000000000000000
  | CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  | CR2: 0000000020004ff2 CR3: 000000006b447000 CR4: 00000000000006e0
  | Call Trace:
  |  __list_add ./include/linux/list.h:59
  |  list_add_tail+0x8c/0x1b0 ./include/linux/list.h:92
  |  uvc_scan_chain_forward.isra.8+0x373/0x416
  | drivers/media/usb/uvc/uvc_driver.c:1471
  |  uvc_scan_chain drivers/media/usb/uvc/uvc_driver.c:1585
  |  uvc_scan_device drivers/media/usb/uvc/uvc_driver.c:1769
  |  uvc_probe+0x77f2/0x8f00 drivers/media/usb/uvc/uvc_driver.c:2104

Looking into the output from usbmon, the interesting part is the
following data packet:

  ffff880069c63e00 30710169 C Ci:1:002:0 0 143 = 09028f00 01030080
  00090403 00000e01 00000924 03000103 7c003328 010204db

If we drop the lead configuration and interface descriptors, we're left
with an output terminal descriptor describing a generic display:

  /* Output terminal descriptor */
  buf[0]	09
  buf[1]	24
  buf[2]	03	/* UVC_VC_OUTPUT_TERMINAL */
  buf[3]	00	/* ID */
  buf[4]	01	/* type == 0x0301 (UVC_OTT_DISPLAY) */
  buf[5]	03
  buf[6]	7c
  buf[7]	00	/* source ID refers to self! */
  buf[8]	33

The problem with this descriptor is that it is self-referential: the
source ID of 0 matches itself! This causes the 'struct uvc_entity'
representing the display to be added to its chain list twice during
'uvc_scan_chain()': once via 'uvc_scan_chain_entity()' when it is
processed directly from the 'dev->entities' list and then again
immediately afterwards when trying to follow the source ID in
'uvc_scan_chain_forward()'

Add a check before adding an entity to a chain list to ensure that the
entity is not already part of a chain.

Link: https://lore.kernel.org/linux-media/CAAeHK+z+Si69jUR+N-SjN9q4O+o5KFiNManqEa-PjUta7EOb7A@mail.gmail.com/

Cc: <[email protected]>
Fixes: c0efd23 ("V4L/DVB (8145a): USB Video Class driver")
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Laurent Pinchart <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants