-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash when USB connection is terminated #393
Comments
This is what happens (wireshark) when the device is disconnected.
pcap file: https://pcap.honeynet.org.my/v1/submission/view.php?id=1111.php |
The "crash" comes from the By commenting on the following line, we can experience that the roadrunner suspends when the USB connection suspends; Yet the roadrunner recovers once we start the connection again.
|
Yes, the restart is caused by the watchdog. You can turn it off by compiling with DEBUG=1 if you like. I did a quick test this morning with a DEBUG compiled firmware and it seemed to hand instead of restarting. I did not have time to dig any more into this though. Interesting that it seems to recover for you. |
I have seen the same issue with crazyflie_ros/crazyflie_cpp. I did debug the problem and found that it is caused by the logging task keep filling up the message queue with new data even though USB is disconnected (I think one can also reproduce by pulling the cable). Unfortunately, I was not able to figure out why this causes a crash (the queue has a finite size), and how to fix it. For a CF 2.0, I have not seen recovery after the issue (a reboot is required). |
@whoenig To have a recovery of the CF2.0, I disabled the Are you sure it's due to overflow? I experience the CF2.0 to "wait" for the USB connection to come back. I feel it is more like a feature of STM32 for the USB driver. |
I debugged this more today and found a much simpler reproducible. It's essentially enough to just query for the platformVersion once. There will be a response, but afterwards the heart beat LED is not blinking anymore (DEBUG build, I guess in release the watchdog would kick in). The CF will still respond to other messages, but all other tasks seem to be suspended until messages over USB are exchanged. A debugger just shows that time is spent in various USB related interrupts. So far, I could not find the underlying issue (and I don't know much about the STM32 USB OTG driver that is being used). However, this shows that this is not necessarily a logging related issue. |
Great! It is good to have a quick way to trigger the problem. We suspect that it is related to task levels or interrupts taking too long but we have not investigated it further yet. The watchdog is reset on idle and if there is no idle time it will not be reset. |
Is there an easy way/documentation on how to debug such issues? The task priority for the CRTP tasks is lower compared to some of the other tasks. I am not sure how to profile time spent in interrupts. |
@whoenig unfortunately not. There is some support in FreeRTOS to monitor tasks but nothing for interrupts. Instrumenting the code is maybe the way forward? |
Following the proposed solution:
|
I think this is a ST Library issue. Found this in st forum. I will try the fix and get back. |
Tried the fix and now the it is not able to connect USB at all = FAIL |
The forum post and the USB device lib used in the crazyflie firmware are both pretty old. Do you think it would help to upgrade to the latest STM libraries? I assume this is a bigger change, since there might be a bunch of dependencies that need to be resolved. |
@whoenig Yes I think updating the libraries is the next thing to try. @krichardsson did a try last Friday but apparently there are quite some things to be resolved so didn't get all the way. We will try again as soon as we have time. |
It seems as the problem occurs when the Crazyflie tries to send data over USB when the USB connection is terminated. The USB library continues to request data (from an interrupt) after the USB connection is terminated and this seems to cause some form of infinite loop in the interrupt code (implemented in the USB lib). I think the CF code is OK and that the problem lies in the USB lib. @whoenig, I have tried to upgrade to V2.2.1 / 17-March-2018 of the USB library (https://www.st.com/content/st_com/en/products/embedded-software/mcu-mpu-embedded-software/stm32-embedded-software/stm32-standard-peripheral-library-expansion/stsw-stm32046.html) but have not yet managed to get it to work. |
Thanks for the update Kristoffer! What is the issue when you try to upgrade the USB library? Does it compile and simply not run, or are there issues with compiling because they changed their API in the meantime? |
It did compile, the API/lib is fairly similar, the problem was that I not could get normal operation to work correctly. It worked as expected for a few packets (around 5 if I remember correctly) but after that some packets were "stuck" and interleaved with garbage. I got the feeling that it might be related to some form of buffer overrun in a circular buffer or similar, maybe cased by miss-configuration? @evoggy spent some more time on this yesterday, not sure of the outcome yet. |
I think I've found a fix for this, but there's still some debugging to be done. I'll write more details once it's cleaned up and pushed. |
This should be fixed now, but please test it out a bit and re-open the issue if there's still problems. The main issue was the fifo empty interrupt flag not being cleared correctly by the ST lib. Previously the flag would be cleared once the transfer is completed. So you would send data tough the USB, the data would be moved to the USB peripheral and the ISR would constantly run until the host had fetched the data. Since this is host dependent the time spent here may vary and we've seen some long latency propagate though the system when running logging on USB. As a side-effect of this if the cable is pulled or the host program terminated while we're waiting to the transfer to complete the ISR would continue to run which would have nasty side-effects like queues filing up and the SYSLINK assert mentioned above being hit and the watchdog triggering. The fix for this is detailed here (https://community.st.com/s/question/0D50X00009XkbWBSAZ/bug-in-stm32usbotgdriver-causes-100-cpu-in-fifo-empty-interrupts). Instead of waiting until the transfer is done the fifo empty interrupt is switched off after the data is sent to the USB peripheral. But even with fixed the above there's some congestion in the system since logging continues to run, lots of data has been moved to the USB peripheral that will not be sent and so on. So together with clearing out the queues for sending data, flushing the TX in the USB peripheral and stopping the logging by fixing #428 the connection performance has been improved and is now very stable on connect/disconnect. |
USB connection in the python client does not seem to work on OSX after this change. Not 100% sure but it seems likely that it is related to this issue |
The fix for this worked well on Linux, but not at all on Mac OSX or Windows. Looking into this again to see if we can fix the fix... |
From issue bitcraze/crazyflie-lib-python#110
How to reproduce:
Connecting to usb://0
in terminal)Symptoms:
The text was updated successfully, but these errors were encountered: