-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xhci: Regression for cdc_acm device (bisected) #4061
Comments
Is the failure instant (i.e. the first time a disconnect is singalled)? |
The failure occurs most of the time, but not in 100% of the cases, so it might require a few tries. |
A minimal test case for my setup is the following: In the success case, pv exits with Note that this method may require some more retries to trigger the error condition, but no more than 5 tries were enough for me so far. |
A few more observations:
|
Can you reproduce it with the Pi0? If not I can try to get my hands on one |
I have limited bandwidth to look at issues such as these right now. I will update the issue when I have had a chance to try reproducing it. |
I reproduced the issue with a Pi0.
|
Is the SSH connection over wifi (as in are you using a Pi0W)? |
Yes |
If this is not easily fixable, would it be possible to add a kernel cmdline parameter to toggle to the old behaviour? I am kind of stuck with an old kernel version here ... :-/ |
I've not had time to look at this. There's no way to sensibly disable a quirk once applied, as there's no sysfs node to override the value used at probe time. Another quirk applied to the VLI controller is causing issues elsewhere (#3981) so for now I think both need to be opt-in for testing. |
I've pushed a commit that should allow you to restore the pre-Oct 2020 behaviour to https://github.com/P33M/linux/tree/nobble-vli-quirks To build and install, follow the guide here: https://www.raspberrypi.org/documentation/linux/kernel/building.md |
I've installed the kernel and will test in the next few days |
XHCI_EP_CTX_BROKEN_DCS and XHCI_AVOID_DQ_ON_LINK are (1<<41) and (1<<42) respectively, so I think the 0x10000000000 quirk is wrong. Or do I miss something? |
Ah. Rebasing the patch in 5.10 shifted the bits up as upstream started using them. 0x200_0000_0000. and 0x400_0000_0000 respectively. |
Ok, I finally had time to prepare a test setup to reproduce it quickly.
And the strange thing is, that all trigger the errorcase. Do you have an suggestion what I should try next? cherry-pick your patch the original 5.4 kernel that triggered the bug the first time? or maybe its firmware rather than kernel related? |
That's confusing. If the commit in October 2020 is responsible then at least one of the quirks options you used should fix it. It's unlikely that an upstream change would have broken cdc_acm in exactly the same way as the quirk does. Are you sure that you're booting with the correct kernel each time? |
Yes, I explicitly set CONFIG_LOCALVERSION to verify that. Only the dbt and dbto files are unchanged, but these should not matter. Also
without /boot/cmdline.txt modification should verify that your patch was indeed applied (otherwise the quirks would be 0x0000060000000890, right?) |
Same issue, did anyone find a fix for this? EDIT: dmesg does'nt report an unplug event when the USB device doesn't respond but it sometimes shows this error The rpi 3B+ is not affected. |
Please run rpi-update and test. This issue may be related to #3981 - I see stall events when running the test script on a Pi 0W. |
Th issue is still there after rpi-update on my side. |
I finally managed to spend some time on more detailed testing and it turns out that my previous test (#4061 (comment)) was flawed by a broken USB hub ;-(
Where do we go from here @P33M? It would be great to have a solution that doesn't require a patched kernel (especially now that bullseye is released) One more observation:
|
So using AVOID_DQ_ON_LINK breaks only if an external hub is used? Please post a full Can you confirm that the Pi 0 testcase also triggers the issue with an external hub, and that the Pi0 connects at high speed? |
Yes. I will double-check this tomorrow to be 100% sure
See the lsusb output below. Please ignore the "1a86:7523 QinHeng Electronics CH340 serial converter" device, it's unrelated
Unfortunately that can take a bit longer as I don't have the neccessary hardware here at the moment |
Double check was successful. When booting the patched kernel with xhci_hcd.quirks=0x60000000000 the issue only occurs if the cdc_acm device is connected via an USB hub (D-Link DUB-H4). |
Using a Pi0W the script appears to hang after the first loop, hub or not. If I look at the g_serial module refcount - it goes negative:
|
I got my hands on the Pi0W again and used the same SD card as in #4061 (comment) (so an old installation). My /boot/config.txt contains dtoverlay=dwc2 uname -a output on the Pi0W is Linux pizero 5.10.17+ #1403 Mon Feb 22 11:26:13 GMT 2021 armv6l GNU/Linux |
One more thing: The Pi0 bug also occurs without the external USB hub |
Hi,
I recently ran into issues with a serial USB device (cdc_acm driver) on a Raspberry Pi 4B 2GB Revision 1.1. I bisected this on the Hexxeh/rpi-firmware repository and f9a31df39c03911ae42e8b43b89313e25c30661c is the first bad commit (e530832 last good):
As it's USB related, a strong candidate would be #3929 (pinging @P33M).
Steps to reproduce:
I have a piece of software that keeps the serial device /dev/ttyACM0 open for continuous communication. From time to time the other communication partner is resetted/restarted, which is normally detected as unplugging and replugging in dmesg:
When upgrading to commit f9a31df or newer, no unplugging is detected and the communication just appears dead, but is not closed. The unplugging and replugging is logged in dmesg as soon as my piece of software closes /dev/ttyACM0 due to timeouts.
The text was updated successfully, but these errors were encountered: