Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usb connection issues #265

Open
xkonni opened this issue Oct 10, 2024 · 72 comments
Open

usb connection issues #265

xkonni opened this issue Oct 10, 2024 · 72 comments
Labels
fix/pcb Fix request for PCB

Comments

@xkonni
Copy link

xkonni commented Oct 10, 2024

got 2 crkbd rev 4.1, love them, typing on my old 60% is a pain now.

but for some reason the usb connections on both devices are rather unstable on my machines (linux pc, 2 dell laptops with linux). first I thought it was a hw issue, but the second (one from a diy store in germany, one from aliexpress) shows the exact same issues.

using your firmware with the vial keymap. tried some options (remove USB_SUSPEND_WAKEUP_DELAY, increase it, ...) but the devices remain unstable. sometimes they run for hours, then they fail every few seconds.

Could this be related to #229 ?

any help is highly appreciated!

logs:

Sep 29 21:15:06 annoyance kernel: input: foostan Corne v4 as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.0/0003:4653:0004.0058/input/input158
Sep 29 21:15:06 annoyance kernel: hid-generic 0003:4653:0004.0058: input,hidraw6: USB HID v1.11 Keyboard [foostan Corne v4] on usb-0000:2a:00.1-3/input0
Sep 29 21:15:06 annoyance kernel: hid-generic 0003:4653:0004.0059: hiddev99,hidraw7: USB HID v1.11 Device [foostan Corne v4] on usb-0000:2a:00.1-3/input1
Sep 29 21:15:06 annoyance kernel: input: foostan Corne v4 Mouse as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005A/input/input159
Sep 29 21:15:06 annoyance kernel: input: foostan Corne v4 System Control as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005A/input/input160
Sep 29 21:15:06 annoyance kernel: input: foostan Corne v4 Consumer Control as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005A/input/input161
Sep 29 21:15:06 annoyance kernel: input: foostan Corne v4 Keyboard as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005A/input/input162
Sep 29 21:15:06 annoyance kernel: hid-generic 0003:4653:0004.005A: input,hidraw8: USB HID v1.11 Mouse [foostan Corne v4] on usb-0000:2a:00.1-3/input2
Sep 29 21:15:09 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:09 annoyance kernel: usb 1-3: device descriptor read/all, error -71
Sep 29 21:15:09 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:09 annoyance kernel: usb 1-3: device descriptor read/64, error -71
Sep 29 21:15:09 annoyance kernel: usb 1-3: device descriptor read/64, error -71
Sep 29 21:15:10 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:10 annoyance kernel: usbhid 1-3:1.0: can't add hid device: -71
Sep 29 21:15:10 annoyance kernel: usbhid 1-3:1.0: probe with driver usbhid failed with error -71
Sep 29 21:15:11 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:11 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:13 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:13 annoyance kernel: usb 1-3: device descriptor read/64, error -71
Sep 29 21:15:13 annoyance kernel: usb 1-3: device descriptor read/64, error -71
Sep 29 21:15:13 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:13 annoyance kernel: usb 1-3: device descriptor read/64, error -71
Sep 29 21:15:14 annoyance kernel: usb 1-3: device descriptor read/64, error -71
Sep 29 21:15:14 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:15 annoyance kernel: usb 1-3: reset full-speed USB device number 50 using xhci_hcd
Sep 29 21:15:15 annoyance kernel: usb 1-3: device firmware changed
Sep 29 21:15:15 annoyance kernel: usb 1-3: USB disconnect, device number 50
Sep 29 21:15:16 annoyance kernel: usb 1-3: new full-speed USB device number 51 using xhci_hcd
Sep 29 21:15:16 annoyance kernel: usb 1-3: unable to read config index 0 descriptor/all
Sep 29 21:15:16 annoyance kernel: usb 1-3: can't read configurations, error -71
Sep 29 21:15:16 annoyance kernel: usb 1-3: new full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:16 annoyance kernel: usb 1-3: New USB device found, idVendor=4653, idProduct=0004, bcdDevice= 4.10
Sep 29 21:15:16 annoyance kernel: usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Sep 29 21:15:16 annoyance kernel: usb 1-3: Product: Corne v4
Sep 29 21:15:16 annoyance kernel: usb 1-3: Manufacturer: foostan
Sep 29 21:15:16 annoyance kernel: usb 1-3: SerialNumber: vial:f64c2b3c
Sep 29 21:15:16 annoyance kernel: input: foostan Corne v4 as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.0/0003:4653:0004.005B/input/input163
Sep 29 21:15:16 annoyance kernel: hid-generic 0003:4653:0004.005B: input,hidraw6: USB HID v1.11 Keyboard [foostan Corne v4] on usb-0000:2a:00.1-3/input0
Sep 29 21:15:16 annoyance kernel: hid-generic 0003:4653:0004.005C: hiddev99,hidraw8: USB HID v1.11 Device [foostan Corne v4] on usb-0000:2a:00.1-3/input1
Sep 29 21:15:16 annoyance kernel: input: foostan Corne v4 Mouse as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005D/input/input164
Sep 29 21:15:16 annoyance kernel: input: foostan Corne v4 System Control as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005D/input/input165
Sep 29 21:15:16 annoyance kernel: input: foostan Corne v4 Consumer Control as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005D/input/input166
Sep 29 21:15:16 annoyance kernel: input: foostan Corne v4 Keyboard as /devices/pci0000:00/0000:00:01.2/0000:20:00.0/0000:21:08.0/0000:2a:00.1/usb1/1-3/1-3:1.2/0003:4653:0004.005D/input/input167
Sep 29 21:15:16 annoyance kernel: hid-generic 0003:4653:0004.005D: input,hidraw9: USB HID v1.11 Mouse [foostan Corne v4] on usb-0000:2a:00.1-3/input2
Sep 29 21:15:20 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:21 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:23 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:23 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:24 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:27 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:27 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:28 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:31 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:31 annoyance kernel: usb 1-3: Device not responding to setup address.
Sep 29 21:15:31 annoyance kernel: usb 1-3: Device not responding to setup address.
Sep 29 21:15:31 annoyance kernel: usb 1-3: device not accepting address 52, error -71
Sep 29 21:15:31 annoyance kernel: usb 1-3: WARN: invalid context state for evaluate context command.
Sep 29 21:15:31 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:31 annoyance kernel: xhci_hcd 0000:2a:00.1: ERROR: unexpected setup context command completion code 0x11.
Sep 29 21:15:31 annoyance kernel: usb 1-3: hub failed to enable device, error -22
Sep 29 21:15:31 annoyance kernel: usb 1-3: WARN: invalid context state for evaluate context command.
Sep 29 21:15:31 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:31 annoyance kernel: xhci_hcd 0000:2a:00.1: ERROR: unexpected setup address command completion code 0x11.
Sep 29 21:15:32 annoyance kernel: xhci_hcd 0000:2a:00.1: ERROR: unexpected setup address command completion code 0x11.
Sep 29 21:15:32 annoyance kernel: usb 1-3: device not accepting address 52, error -22
Sep 29 21:15:32 annoyance kernel: usb 1-3: WARN: invalid context state for evaluate context command.
Sep 29 21:15:32 annoyance kernel: usb 1-3: reset full-speed USB device number 52 using xhci_hcd
Sep 29 21:15:32 annoyance kernel: xhci_hcd 0000:2a:00.1: ERROR: unexpected setup address command completion code 0x11.
Sep 29 21:15:32 annoyance kernel: xhci_hcd 0000:2a:00.1: ERROR: unexpected setup address command completion code 0x11.
Sep 29 21:15:32 annoyance kernel: usb 1-3: device not accepting address 52, error -22
Sep 29 21:15:32 annoyance kernel: usb 1-3: USB disconnect, device number 52
@foostan
Copy link
Owner

foostan commented Oct 13, 2024

Thank you for the information. I have received some reports, but I have not yet been able to identify what the cause is. I will review some of the design policies and try to improve them.

@xkonni
Copy link
Author

xkonni commented Oct 13, 2024

if you need any further information or have an idea how to fix existing pcbs I'm all ears!

@l4u
Copy link

l4u commented Oct 14, 2024

@xkonni can you let us know the distro and kernel versions please?

@xkonni
Copy link
Author

xkonni commented Oct 14, 2024

sure, here are my 3 test computers

  • Dell Latitude 7420, ubuntu 22.04 with 6.8.0-45-generic
  • Dell XPS 7390, arch with 6.11.3-arch1-1
  • PC, arch with 6.11.3-arch1-1

@foostan
Copy link
Owner

foostan commented Oct 14, 2024

Is yours cherry or chocolate?
How about the communication between the left and right sides. Is the USB connection unstable only?

@xkonni
Copy link
Author

xkonni commented Oct 14, 2024

I got two 4.1 here, one cherry, one choc. they behave exactly the same. On a usual day they work. Does not matter which one I use.

Then after a while the usb issues appear. Changing usb from left to right does not help, switching cherry to choc does not help.
My left side is normally plugged in via usb, right via trrs. Sometimes the left side still works, right does not. But then replugging the left or switching to the right just leads to more usb errors in the kernel log.

A cold boot sometimes helps.

@foostan
Copy link
Owner

foostan commented Oct 14, 2024

Thank you for sharing the details!

@foostan foostan added the fix/pcb Fix request for PCB label Oct 19, 2024
@github-project-automation github-project-automation bot moved this to Backlog in KBD Roadmap Oct 19, 2024
@foostan foostan moved this from Backlog to Researching in KBD Roadmap Oct 19, 2024
@PaulRopel
Copy link

I’m also experiencing some disconnect issues with my Core v4.1. When I plug it in and use it to practice on keybr.com, it works well. However, if I set it aside (while it’s still plugged in), switch to browsing, or use my Mac keyboard, the Core v4.1 stops responding when I try to use it again, even though the LED remains on. Tell me if I can help somehow troubleshooting...

@dahmwern
Copy link

dahmwern commented Nov 1, 2024

I'm having similar issues as well. Seems to be that the non-plugged side disconnects most often, but sometimes I'll get disconnects on the plugged in side as well. I saw the LEDs flicker when this happened, which isn't suprising, but it was a series of very short bursts of flickers which makes me think it's a Power-related issue.

@foostan
Copy link
Owner

foostan commented Nov 1, 2024

This is just a guess, but from some reports I've heard it seems to be a power supply issue. There are some parts that are not very well designed, and some of them may be defective.

I'd like to isolate some of the root causes, and I'd appreciate any information you could give me.

  • Does the problem happen on a different PC?
  • Does the problem happen on another Corne v4 (if you have one)?

@dahmwern
Copy link

dahmwern commented Nov 1, 2024

I'd like to isolate some of the root causes, and I'd appreciate any information you could give me.

* Does the problem happen on a different PC?

* Does the problem happen on another Corne v4 (if you have one)?

I've had the issue on a Mac. I do have a spare PC that I can test with later this weekend and report back.

No spare Corne v4.1 keyboards assembled to test with easily.

@dahmwern
Copy link

dahmwern commented Nov 1, 2024

Update:

Set up:

  1. Connected via USB C to MacBook Pro directly and with USB C hub
  2. USB C to right half of keyboard
  3. TRS between halves

I used the Corne V4.1 all day today, about 10 hours of use during work. I experienced a total of about 10 losses of function, some back to back, with varying amount of time between them.

Left hand (slave side) had about 6-7 losses of function. Right hand (master side) had about 3-4 losses of function. On one occasion, losses of function occurred every 30 seconds and required the keyboard to be reflashed.

Each time there was loss of function, it was preceded by LED flickering.

Hope this helps! I'm happy to set up more Corne V4.1s to test in varying conditions.

@dahmwern
Copy link

dahmwern commented Nov 6, 2024

Another update:

I swapped my keyboard out with another Corne v4.1 PCB this evening. I did this to verify that there were no hardware issues with the first PCB. I also used the same firmware to avoid SW variation.

I confirmed the same behavior with keyboard lockup on one side resulting in requiring a power cycle to recover.

This is a big issue! Right now I can't use my (5) Corne v4.1's nor can I use a v4.1 as my daily driver with these reliability issues.

@foostan have you looked into this any further?

@foostan
Copy link
Owner

foostan commented Nov 6, 2024

Thank you for your confirmation. Unfortunately, this problem does not occur in my environment, so I cannot investigate further.

@alessiocurri
Copy link

alessiocurri commented Nov 6, 2024

Hi,
i can report i have the same issue.
The keyboard locks up so much it's impossible to use. I tested the keyboard with two different set of pcbs (both chocolate), with multiple computers (mostly linux, a windows out of desperation).
I also tried flashing a custom KMKFw one-side-only setup and, later, a custom QMK firmware. Multiple USB cables, HUBs, no Hubs, Hid-remapper in front of the keyboard. Same result.
The two pcbs were sourced from different vendors in Europe, i tested both.

How can we help you further investigate this issue?

edit:
i forgot to add, the keyboards seems to lock a less with QMK.

@chadhakala
Copy link

FYI the second USB port will work (opposite hand) however your special keybinds may behave differently from your custom layout; found this to be a pleasant surprise considering a USB port joint was damaged on mine and the opposite ha d allows me to work around the one broken USB jack.
Not sure of this will solve your problem but worth a shot and worth knowing it appears to be different from older branches in that way.

@foostan
Copy link
Owner

foostan commented Nov 6, 2024

Another possibility is that the PCB is simply damaged. Please also contact your supplier for further information.

@alessiocurri
Copy link

@foostan 4 different PBCs from 2 different vendors show the exact same exact issue, both used as a pair and as a single unit (with a custom firmware).
The same issue reported by other user in the this thread.
I assembled the keyboard myself, and inspected the second set of PCB i got very carefully when i received them: the only reason I bought a second set was to test if my unit was the issue.
The custom software was tested on an generic RP2040, to test the stability: no issues for days while the same (KMKfw) software running on the corne has usb issues after a few minutes. I can reproduce this with all my 4 units (2 left and 2 right ones) and it works fine on any other RP2040 i tested.

I had spent quite a lot of time trying to debug and i'm 100% positive it's not a single unit, it's not my computer, the usb cable or simila.

What i'm hoping to get here is some help in further debugging what is an issue with the USB on the keyboard, and hopefully find a solution/workaround to help the other user that may have the same issue.

So, in that light, is there any other info i can provide?

@chadhakala no, the usb port is not damaged at all.

@foostan
Copy link
Owner

foostan commented Nov 7, 2024

Thank you for sharing the details. I'm glad you're being helpful.

So what you're reporting means is that the issue is more likely to occur with KMKfw than with QMK? I'll give KMKfw a try. Thanks again.

@dahmwern
Copy link

dahmwern commented Nov 7, 2024

@foostan I don't think he's saying the KMKfw is worse, but rather by testing the same firmware on a generic RP2040 and on the Corne v4.1 board, the issue is only present on the Corne v4.1. This eliminates as many noise factors as conveniently possible.

The help needed is some debugging on the Corne v4.1 USB HW design to understand what's unique to the design that's causing the issue.

Please let us know if you need data. I am fully willing to support as needed. I would love to help solve this.

@foostan
Copy link
Owner

foostan commented Nov 7, 2024

I'm sorry, of course. I didn't mean KMKfw is worse. I would like to isolate the problem and investigate the cause in detail.

Thank you for your cooperation. Let's share information on this issue.

@ChadHacksaLot
Copy link

@foostan 4 different PBCs from 2 different vendors show the exact same exact issue, both used as a pair and as a single unit (with a custom firmware). The same issue reported by other user in the this thread. I assembled the keyboard myself, and inspected the second set of PCB i got very carefully when i received them: the only reason I bought a second set was to test if my unit was the issue. The custom software was tested on an generic RP2040, to test the stability: no issues for days while the same (KMKfw) software running on the corne has usb issues after a few minutes. I can reproduce this with all my 4 units (2 left and 2 right ones) and it works fine on any other RP2040 i tested.

I had spent quite a lot of time trying to debug and i'm 100% positive it's not a single unit, it's not my computer, the usb cable or simila.

What i'm hoping to get here is some help in further debugging what is an issue with the USB on the keyboard, and hopefully find a solution/workaround to help the other user that may have the same issue.

So, in that light, is there any other info i can provide?

@chadhakala no, the usb port is not damaged at all.

@alessiocurri My apologies--I didn't realize this thread was all about the lockup; while,I have faced this issue and other unique issues for which I do not have systematic evidence for being a USB fault.

The last time I used the corne I did face this exact lock up issue and stop using it completely for that reason, I am following all these threads so my apologies, little embarassed for chiming in didn't even read the full thread; I'm pretty sure I meant to respond to a different comment in the thread and was unaware there was even an issue for lockup.

@alessiocurri
Copy link

@dahmwern exactly what i meant ;)

@foostan here https://gist.github.com/alessiocurri/18e6b0c48a74c37dee766a71a22ac62a you can find my config for a left-only corne 4.1, no TRRS cable nor right side necessary.
This script will run fine on any circuitpython, i tried on versions 8.x, 9.1 and 9.2 (no changes).

To install kfmfw I just copied the kmkfw files from the github repo, added the neopixel.py library (you can also use the .pyc, it should be the same) and my code.py and boot.py (the latter is not strictly necessary). Please note the default layer is empty. To test you need to switch to another layer, the leds will highlight the active keys.

In this config i can replicate the usb lockup on average in 20 minutes, using all 4 boards (tested without the switches, with, no difference). The easier way to check the status is to use a serial console con the virtual com port exposed. There you can find the python REPL. That virtual serial port will disappear when the usb issue presents itself.

@ChadHacksaLot no prob at all, probably it's me owning an apology... in my reply i have been very blunt, probably a tad much :)

@viscount-monty
Copy link

Just wanted to add that I'm experience what sounds like the exact same issue with my corne v4.1.

Same behaviour everyone above is describing - sometimes one side becomes unresponsive, sometimes both, sometimes it works nearly all day, sometimes it's only seconds or minutes until the next lockup after disconnecting and reconnecting the USB cable.

One time, the right side even changed colour to the pattern pictured below:
image

Same behaviour when plugged into

  • Desktop PC running Windows 10

  • The same PC running Linux Mint 22 Cinnamon

  • Pixel 6 phone (Android 14)

  • lsusb during failure, both sides, Linux Mint

    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 001 Device 003: ID 0665:5161 Cypress Semiconductor USB to Serial
    Bus 001 Device 004: ID 3434:d030 Keychron  Keychron Link 
    Bus 001 Device 005: ID 0b05:18a3 ASUSTek Computer, Inc. AURA MOTHERBOARD
    Bus 001 Device 006: ID 8087:0aaa Intel Corp. Bluetooth 9460/9560 Jefferson Peak (JfP)
    Bus 001 Device 011: ID 1532:008f Razer USA, Ltd Razer Naga Pro
    Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    
  • lsusb after disconnect/reconnect

    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 001 Device 003: ID 0665:5161 Cypress Semiconductor USB to Serial
    Bus 001 Device 004: ID 3434:d030 Keychron  Keychron Link 
    Bus 001 Device 005: ID 0b05:18a3 ASUSTek Computer, Inc. AURA MOTHERBOARD
    Bus 001 Device 006: ID 8087:0aaa Intel Corp. Bluetooth 9460/9560 Jefferson Peak (JfP)
    Bus 001 Device 011: ID 1532:008f Razer USA, Ltd Razer Naga Pro
    Bus 001 Device 023: ID 4653:0004 foostan Corne v4
    Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
    
    

I absolutely adore this keyboard when it works, I would love to assist in some way. I have career experience in PCB design and experience in micro-controller firmware programming - let me know what I can do to help or point me in a direction please :)

@dahmwern
Copy link

dahmwern commented Nov 8, 2024

@viscount-monty I've experienced that same LED pattern during lockup. Your description is consistent with my experience.

What information do you need to analyze the PCB design for potential USB related comms issues?

@foostan
Copy link
Owner

foostan commented Nov 9, 2024

It seems that it may or may not occur depending on the environment. I don't know under what conditions it occurs, but has anyone noticed any electrical abnormalities when the problem occurs, such as a short interruption or a significant drop in voltage or current?

@foostan
Copy link
Owner

foostan commented Nov 13, 2024

Considering the current case, the distance between the IC and the USB doesn't seem to matter much, because this issue also occurs at one side which is not connected USB.

@george-norton
Copy link

If you are familiar with the SWD interface and can reproduce the failure, you may be able to power the board via the OLED header, connect to SWD and see if you can get the MCU to fail without any USB connection present at all.

@alessiocurri
Copy link

@george-norton i tested this with circuitpython and a simple program that light the leds in sequence: when the USB goes down, the mcu is still running fine.
I think the processor itself is not affected, but the software is: when KMKfw try to send data via USB, the python core crashes after a few seconds. I assume QMK as a similar behavior, but i have not tested it.

@foostan I'm an hobbist in this field (electronics), so this is a bit of guesswork.
I think the secondary unit is stuck because it cannot sync with the primary, which in turn is stuck trying to write on the USB (at least with the default qmk firmware).
I have verified both the MCUs of the primary and the secondary unit are still alive (with the aforementioned script) and will talk via the interconnecting cable (in my example i'm flipping up and down the gpio conneted to the TRS cable at ~1Hz on one side, and turn on a led on the other.
This is to say that i would consider the other side being stuck only a side effect of not being able to talk to the primary side.

@fabianmuehlberger
Copy link

@george-norton You are right. It would be beneficial to rule out other causes. I assume this could be tested my running the RP2040 with some testing C or Python code, just logging the behavior via the debug serial header. For this test, the USB lines should be fully deactivated, (disabled GPIOs) to mitigate interference.

I also assume the error codes could indicate the problem we are facing here.
The question is: What USB connection issues and logs are present in the following cases? And are there other signs indicating the root cause.

  1. In case of EMI affecting the Crystal.
  2. Insufficient Power for the MCU.
  3. EMI problem in USB differential pair.

Considering the current case, the distance between the IC and the USB doesn't seem to matter much, because this issue also occurs at one side which is not connected USB.

The main reason I mentioned this was, that the design is not within the specification for routing USB lines. This does not say it has to be a problem, but is certainly one of the first areas to look at when problems like this occur.

@alessiocurri
Copy link

I tested the case printed with conductive filament but, as expected tbh, there was no change.
At this point I'm out of tests I can do.

@foostan while you investigate, i think you should add a warning in the readme.md file about this issue.

@foostan
Copy link
Owner

foostan commented Nov 15, 2024

Add a notice about this issue on README https://github.com/foostan/crkbd?tab=readme-ov-file#notice

@viscount-monty
Copy link

Add a notice about this issue on README https://github.com/foostan/crkbd?tab=readme-ov-file#notice

Thanks for including that notice @foostan 👍 are you able to test a board with an Abracon ABM8-272-T3 and its associated 15p load caps? Was the change to an alternate crystal due to availability issues? I noticed Digikey have 0 in stock for the Abracon ABM8-272-T3 👎

@george-norton raises a very valid point regarding the crystal load capacitance. The source to which he is referring is pictured below
From Hardware Design with RP2040
image

Having said that, my understanding of @alessiocurri 's reports are that he observes both RP2040 devices remaining functional (changing LED colours on both sides) after they have experienced a USB disconnect due to phone EMI.

I will attempt replicate those results, and look into the excellent debugging suggestions by @george-norton and @fabianmuehlberger

  • Attempt inducing a lockup when running code which drives a simple pattern on the LEDs
    • To see if the RP2040s are still running correctly when the lockups occur
  • Attempt to induce a lockup when the RP2040 is powered via the OLED header and connected to the SWD interface
    • An additional method to confirm the outcome of the above test.

I also have a bunch of Pi Pico/W units with various dev boards and sensors, I will see if I can replicate the issue on any of those!

@foostan
Copy link
Owner

foostan commented Nov 15, 2024

Was the change to an alternate crystal due to availability issues?

I had not considered it properly. I'll look into the Abracon ABM8-272-T3 and other options.

I investigated the EMI effect of a mobile phone on Cornelius v2. This board is not a split keyboard, but it uses the same circuits and parts.

The investigation resulted in the USB connection being cut off and locked, just like the Corne v4. This shows that the problem is not with the USB-related wiring or part placement but with the selection of parts or the circuit design.

By the way, the Cornelius is an aluminum body, so I don't think this will actually be an issue.
https://github.com/foostan/corneliuskbd/tree/main/pcbs/v2/hotswap
image

@fabianmuehlberger
Copy link

  • Attempt inducing a lockup when running code which drives a simple pattern on the LEDs

    • To see if the RP2040s are still running correctly when the lockups occur
  • Attempt to induce a lockup when the RP2040 is powered via the OLED header and connected to the SWD interface

    • An additional method to confirm the outcome of the above test.

Rather than visually observing the LED, you could just output a high frequency and meassure it. (basically what an LED does ;) so you can just hook up an oscilloscope to it :)

@foostan
Copy link
Owner

foostan commented Nov 16, 2024

I found that the Cornelius board and the Corne v4 board have some different characteristics.

I was able to cause the problem with Cornelius yesterday, but since then I have not been able to cause any problems. On the other hand, I have been able to cause the problem many times with Corne v4. This means that Corne v4 is very unstable compared to Cornelius. In addition to the part selection and circuit, it suggests that there are problems with the part placement and wiring.

Placing a mobile phone right next to the RP2040 and a Corne v4 crystal immediately causes the problem. On the other hand, no problems occurred when moving the phone close to the TRRS connector. It seems that the noise is significantly reduced even if you talk about 10 cm away. The RP2040 and crystal are placed at the very edge of the PCB, so it is possible that they are easily affected by external factors. Simply shifting them to the center may have an effect.

By the way, would it be okay to make the DC-DC converter and Flash memory parts smaller? The current parts are too large and there is little freedom in placement. Doesn't it need 128M of Flash memory?

@george-norton
Copy link

There are small form factor, large capacity flash parts available. See C2843335.

@foostan
Copy link
Owner

foostan commented Nov 23, 2024

I'm changing the position of RP2040 to the center of PCB.
image

@viscount-monty
Copy link

@foostan may I ask why you didn't go for a ground plane pour on both sides of the corne?

The image you posted of the Cornelius appears to show a ground plane pour on the micro-controller side, though it seems to lack any 'stitching' vias. If there is no ground plane pour on the other layer that would make sense though.

I'm not certain it would make a difference, but I'm used to seeing/designing RF boards with top and bottom ground plane pours, stitched with vias at reasonable intervals.

Example:
image

@foostan
Copy link
Owner

foostan commented Nov 25, 2024

@viscount-monty Can you tell me what its significance is? I didn't do it because I didn't know what effect it would have.

@fabianmuehlberger
Copy link

@foostan may I ask why you didn't go for a ground plane pour on both sides of the corne?

The image you posted of the Cornelius appears to show a ground plane pour on the micro-controller side, though it seems to lack any 'stitching' vias. If there is no ground plane pour on the other layer that would make sense though.

Instead of a pour on top an bottom, I highly recommend making a 4 layer board.

The top layer is segmented due to the lines, having a solid inner ground would be beneficial.

  1. Easier routing
  2. Clear current return paths
  3. The high speed lines can be via fenced for isolation.

@foostan
Copy link
Owner

foostan commented Nov 25, 2024

I have already verified the four-layer design and prototype. Although the wiring has certainly been simplified, I have concluded that the benefits were not enough to justify the increased costs.

@viscount-monty
Copy link

@foostan Certainly - proper grounding/shielding prevents EMI, either emitting RF which could interfere with other devices, or shielding the device from other source of RF interference. Similar to how a co-axial cable features a shield/ground which entirely covers the signal conductor. This kind of grounding/shielding is required to make RF PCBs function correctly and pass compliance testing.

To think of it another way, a PCB trace could accidentally behave as an antenna for external interference if not shielded/grounded sufficiently.

@foostan
Copy link
Owner

foostan commented Nov 28, 2024

So, let's put a ground around the edge of the board as much as possible. That alone will have an effect. As mentioned above, I will not use a 4-layer board.

@foostan
Copy link
Owner

foostan commented Nov 28, 2024

The latest board is here. I'll confirm again and create a prototype.

  • Move a MCU to the center of a board
  • Important signal lines should be as thick and short as possible, and the number of vias is reduced as much as possible.
  • Put GND for EMI and EMS.

image
image

@viscount-monty
Copy link

Nice work, looks great! I'm looking forward to hearing how the prototype turns out 🤞

@dahmwern
Copy link

Can't wait to hear the results. If you need other people to help you test your boards... Just saying :)

@fabianmuehlberger
Copy link

fabianmuehlberger commented Nov 30, 2024

The latest board is here. I'll confirm again and create a prototype.

* Move a MCU to the center of a board

* Important signal lines should be as thick and short as possible, and the number of vias is reduced as much as possible.

* Put GND for EMI and EMS.

Not quite. The USB lines should not be "as thick as possible" but rather have the correct impedance.

Regarding trance length: For a high speed USB signal, a conservative approach for a 2 layer board (according to the article) would be to stay under the 25% limit, which is roughly 20 mm line length.

Beside good shielding from EMI this is also an important factor for signal integrity. Below is a guide for 2 layer boards.
https://resources.altium.com/p/routing-requirements-usb-20-2-layer-pcb
Impedance calculator: https://www.pcbway.com/pcb_prototype/impedance_calculator.html

@foostan
Copy link
Owner

foostan commented Nov 30, 2024

Thanks for the detailed information and feedback. I will try to calculate the impedance and improve it.

@fabianmuehlberger
Copy link

fabianmuehlberger commented Dec 1, 2024

To make it clear: The impedance is not the only criteria for USB. Implementation according to best practices and hardware design guides as mentioned in other posts are critical if you want your product to be within the specification.

This is a general overview of the EMC USB specification https://www.we-online.com/components/media/o109031v410%20ANP024d_The%20USB%20Interface%20from%20EMC%20Point%20of%20View.pdf

If you plan to sell your product in the EU, you have to comply with the regulations described here https://single-market-economy.ec.europa.eu/sectors/electrical-and-electronic-engineering-industries-eei/electromagnetic-compatibility-emc-directive_en
For other markets, similar regulations apply.

@b3n-l
Copy link

b3n-l commented Dec 6, 2024

Just as an external comment, I've got an RP2040 based split board that exhibits similar behaviour when a mobile phone is placed near the board. (Fingerpunch ximi v2).

I managed to reduce the symptoms significantly by using a shielded USB-C cable

@PaulRopel
Copy link

The latest board is here. I'll confirm again and create a prototype.

* Move a MCU to the center of a board

* Important signal lines should be as thick and short as possible, and the number of vias is reduced as much as possible.

* Put GND for EMI and EMS.

image image

hi, not pushing just like to know how long does it take to make the prototype?

@foostan
Copy link
Owner

foostan commented Dec 11, 2024

I ordered it last week.

@bridgerbrown
Copy link

Another temporary fix seems to be using a powered USB hub with EMI shielding. I've been using my Corne v4.1 for a couple weeks now through one without this issue, then today started plugging it directly into my laptop via usb C and the issue started. I'm gonna try a more shielded usb C cable for travel now.

My details are the same as others have already stated: left split is plugged in via usb, right split connected to left via trrs. Phone gets close, the right split will stop taking input after a short period (but keeps rgb on with animation paused), and left still works. Have to unplug and replug for it to start working again, and then it will last anywhere from 5-30s depending on how close the phone is I guess. Issue goes away with enough distance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix/pcb Fix request for PCB
Projects
Status: Researching
Development

No branches or pull requests