-
-
Notifications
You must be signed in to change notification settings - Fork 257
Trezor MB Java API - USB attach takes long time on Windows #98
Comments
I looked into this, but this is not so easy to fix. Background: While libopencm3 handles low-level usb messages in interrupts, one has to regularly call usbd_poll from the main thread to even answer low level requests. Trezor firmware doesn't do this when it is busy, e.g. computing the bip32 master key from the bip39 seed. In this case the OS usb stack or some other part of Java asks Trezor for a STRING REPORT (a very low level message that is usually handled by libopencm3). I'm not sure what triggers it. An enumerate for example would trigger this. When this request isn't answered within a few milli seconds, the libopencm3 usb stack gives up. This causes chaos, especially when later the STRING REPORT is handled and there is an unmatched packet in the queue. The workaround on the PC side is to not do anything like an enumerate, directly after a package was sent to the Trezor for at least three seconds. python-trezor uses a timeout of 10 seconds before reenumerating to check if device is still there. Maybe we could call usbd_poll in long running operations like pbkdf2, but I'm not sure how often one must call it to avoid this problem. If it must be called more often than once every 10 ms, then the same problem would be with public key derivation and signing. Also simply calling usbd_poll is dangerous as it also handles high-level messages and lead to unexpected recursions. |
Thanks for the reply! I must say I dont understand much of what you wrote, but Im not very good at low level stuff :-) Just my limited view - could it be much simplier? Again, KeepKey does not suffer from this problem. Fixing this would not only help my app Trezor SSH Agent but also improve experience of Trezor users on Multibit. M. |
I'm not so familiar with the Java code that I know who reenumerates the usb. But someone does this every 0.5 s. If you can somehow stop this task, that would help. If not, at least make the polling much slower, but then it can still happen if it polls at the wrong moment. Keepkey has the same problem, but only if additional passphrase is enabled. If passphrase isn't enabled keepkey never performs the time consuming bip39 computation (I think it caches the bip32-root node in that case). |
I submitted new issue to hid4java project. The problem is opened in three different projects now, lets hope someone will fix it. I know I cant. |
There are three things contributing to this problem; fixing one of them would be enough:
I'm hoping to get 3. fixed in the next firmware. I'm still trying to understand why this hang-up happens and how to detect that it happened. I already know how to fix it, when it happened.... |
Not sure if I understand what 0.5 second constant means. In our stack we enumerate 500ms AFTER the previous enumeration was finished. If that's not the case (i.e. mbhw enumerates every 500ms without waiting for the result of previous operation), I suggest to change the logic. |
I didn't find this code that enumerates... Yes this completely explains it. The Trezor firmware has a bug that if an enumerate happens during the "Waking up" phase, the device will go to this state where it cannot answer control requests and gets very slow. If you enumerate 0.5 seconds after the last enumeration finished, then it is guaranteed to happen. Is it easy to change the Java implementation? At least don't enumerate the first five seconds after the last message was sent to the Trezor? This would be a workaround for current firmware. |
Of course, the bug is really on the Trezor side. It should be able to handle an enumerate at any time. One problem is that the usb stack on Trezor side implemented in libopencm3 gets confused. I opened libopencm3/libopencm3#668 and have a preliminary fix for this. But it still needs some more thought. |
This patch adds calls to usbPoll in the progress callback. This should address trezor#98. We call usbDelay instead of Poll, to call usbd_poll several times. Otherwise it would only handle one event instead of handling all events that were pending so far. The ugly magic number 5 is a guess. Note that we also need to set usbTiny, so that we don't recursively process messages. Since we don't know whether usbTiny is set, we need to store the old value (especially true for u2f).
Note that the above fix is not enough; one also needs to fix another bug in libopencm3. Also it would probably better to shorten the update interval for bip39. |
This patch adds calls to usbPoll in the progress callback. This should address trezor#98. We call usbDelay instead of Poll, to call usbd_poll several times. Otherwise it would only handle one event instead of handling all events that were pending so far. The ugly magic number 5 is a guess. Note that we also need to set usbTiny, so that we don't recursively process messages. Since we don't know whether usbTiny is set, we need to store the old value (especially true for u2f). This fix also relies on another fix in libopencm3.
I have opened the pull request libopencm3/libopencm3#672. Can you try if it solves your problems? You can try the below patch to build the firmware with docker.
I also have a branch in trezor-mcu called pollfix that calls |
This patch adds calls to usbPoll in the progress callback. This should address #98. We call usbDelay instead of Poll, to call usbd_poll several times. Otherwise it would only handle one event instead of handling all events that were pending so far. The ugly magic number 5 is a guess. Note that we also need to set usbTiny, so that we don't recursively process messages. Since we don't know whether usbTiny is set, we need to store the old value (especially true for u2f). This fix also relies on another fix in libopencm3.
BIP39 Interval was shortened in trezor/trezor-crypto@9101c05 Pulled the change from pollfix branch in 9287dd7 Last thing missing is upstream merge of libopencm3/libopencm3#672 Let's wait a few days and use local copy of libopencm3 if this does not happen before the next release. Closing this for now. |
First choice where to report the bug should be Multibit hardware project, which Im referring to as Java API. I already did that - link bellow. But I have a strong feeling (not 100% however) that problem is in Trezor firmware.
The problem is that after providing the PIN to Trezor, doing software detach in HID Java API and starting the API again, USB init takes about 30 seconds to complete. The only cure is to do hard USB detach (unplug cable :-) So why do I feel like its a Trezor problem?
Repeat scenario is described here:
Multibit-Legacy/multibit-hardware#29
FW: 1.3.5, 1.3.6
Thanks, M.
The text was updated successfully, but these errors were encountered: