-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible racecondition in dwc_otg driver #830
Comments
For reference, which list is being modified? |
Oh, sry. I'm talking about dwc_otg_hc_t.free_hc_list. |
I pushed my first fix to my github account: https://github.com/flipreverse/linux/commits/dwcotgfix |
You're adding a lock to every circleq_* call site. Many of these are in different contexts - particularly with regard to the top-level HCD lock being held or not. This makes the possibility of an A-B-B-A type deadlock far more likely... There should be no need to have an additional lock. Holding the HCD spinlock while manipulating the list should be sufficient. |
I've been running my fix for a few hours. No deadlock happened. Instead, the same null pointer panic occured again. So, why is the cqe_next pointer NULL? |
Please post a lsusb -v and dmesg since boot. |
May i have your email address to send you the requested stuff? |
Sent an email to the address you have listed on Github. |
A lot of crashs later, i've got a bit more information about the error itself. Any further advices what i could do to find the source of this bug? |
Hm. Thinking about this a bit - what happens if we have no host channels left and try to assign one? Ordinarily this won't happen because it's very unlikely that you will have multiple transfers active on the bus at the same time - you would have to have 8 simultaneous outstanding transactions. With your particular use case I can see this happening. Edit: Ordinarily this should be guarded by !dwc_circleq_empty in both cases but I wonder if the list itself gets corrupted if you e.g. hit zero channels available... |
Yeah, this might happen. But the hc list isn't empty.
This produces the following output:
|
Can you post the output of |
|
I wonder whether this isn't related to #1256 |
I have also encountered this problem. I have encountered it on kernels 4.1.19-v7+ and 4.4.11-v7+ (built from git; with some extra printks). I have encountered it with various combinations of USB devices, but for me it seems to require 2 USB cameras (tried with multiple UVC and Playstation Eye cameras) and a custom pic24 based full-speed USB device. Throwing in continuous reads of an old, full-speed USB flash drive seems to help it occur more quickly. I have determined that the free_hc_list list is being corrupted because an hc is being appended to the list twice. Occasionally, dwc_otg_hcd_handle_hc_intr() is reading that a given host channel has a pending interrupt, but the corresponding dwc_hc_t structure is already on the free_hc_list - at the end of the list in every case I've seen. When dwc_otg_hcd_handle_hc_n_intr() calls release_channel(), the dwc_hc_t structure gets appended to the free_hc_list, resulting in it effectively being in the free_hc_list twice (since its own cqe_next points to itself). Later, it will get removed from the free_hc_list twice. The first time clears the cqe_next pointer, so it contains NULL when removed a second time - resulting in the NULL pointer dereference. I don't know enough about the hardware to know if/why there is either an extra interrupt or the original interrupt isn't being cleared. |
If that's the case, then there is a path in the driver that is re-entrant in some strange way (two hardware interrupts arriving for the same channel, for example). With there being a double-entry at the end of the list, I'd suspect bogus interrupts occurring in quick succession could cause this. |
fixed in #2010 |
rpi-update firmware contains latest kernel fixes. Please update and test. |
rust: allow fs driver to initialise new superblocks
Hi all,
i discovered a seriuos bug in the dwc driver. When i connect multiple usb devices and each of them generates a serious amount of traffic, i get a null pointer dereference error after a while.
The error message itself is quoted below. I narrowed the bug down to drivers/usb/host/dwc_otg/dwc_otg_hcd.c line 1205. To be more precise, the assignement of the new predecessor of the next element fails.
This might happen while a certain context tries to remove an item from the free list another context already removed. Hence, the first context fails during the DWC_CIRECLEQ_REMOVE_INIT macro.
As a first guess, i secured the list using a spinlock. I'm currently testing my fix.
I'll inform you as soon as i know if i fixed that issue.
Meanwhile, can you please tell me if there might be other sources for this bug?
Thanks!
Greetings
flip
The text was updated successfully, but these errors were encountered: