-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio error during large image upload #30
Comments
Is the device resetting at all when this happens or is it happy as far as you can tell? You can try increasing the timeout arguments to the upload command. I'll look into why a task is getting cancelled, I thought that it would raise a timeout error instead. |
The logs do not show any evidence of a MCU reset/reboot/fault during the upload. I enabled the zephyr mcumgr_smp debug logging and there is no indication of a problem on that end. mcumgr-web is able to upload the large image, although it does occasionally hang and have to be restarted. I am also able to consistently upload a small dummy image with smpclient. I'll look into the upload timeout arguments next. |
What board / MCU are you using? I think that I can borrow an Arm Mac to see if I can reproduce. Please consider forking and opening a PR if you find that the example script or anything else needs some changes. |
@bjandre It appears to be an asyncio bug causing the nasty traceback, though I suspect the root cause is that the smpclient is not receiving the notify within the 2.5 second timeout. The issue is still open at Python: https://bugs.python.org/issue39032 But the SO poster also included a workaround that I'll look into. |
We're based on an nrf52840 dk, zephyr board nrf52840dk_nrf52840, with upstream zephyr 3.6, not the nordic sdk. I experimented with the timeout parameter and it didn't help. It seems like a notify is being dropped, but I'm not sure if the issue is on the zephyr or smpclient side. I'm still investigating the root cause. If we can get a cleaner exception when a notify timeout occurs, that would make it easier to trap at the application level and allow retries. |
@bjandre If you don't mind wiping the flash on your DK, can I ask that you run the FW upgrade HW integration test for the nrf52840dk:
And paste the output. Expected:
|
I don't have a nrf52840 devkit with me, but our custom board is compatible. When I run the upgrade script, it will detect the device by name "A SMP DUT", but it fails to connect by device address:
Increasing the connection timeout doesn't help, but modifying upgrade.py:67 to use the device name instead of address allows it to connect.
|
That makes sense, MacOS does not allow the use of BLE MAC addresses: hbldh/bleak#140 This is good that it's so reproducible! I will borrow a Mac and see what I see. I'm assuming this is on Arm. I'm suspecting that the FW has not actually received all 5 of the SMP packet fragments (2475 / 5 = 495), so it doesn't notify, and then the request times out. I wonder if it is a synchronization issue wherein the loop here: smpclient/smpclient/transport/ble.py Lines 131 to 137 in 26a3a2d
Is not actually waiting for the previous transfers to complete. It would slow things way down, but I wonder if adding an |
Yes, ARM based Mac. I tried adding the asyncio.sleep and the results were inconclusive. In general it seemed to make it the same few K into the upload, but on a few runs it did make it noticeably further before hanging. |
https://developer.apple.com/documentation/corebluetooth/cbperipheral/writevalue(_:for:type:) https://stackoverflow.com/questions/65586100/rate-limiting-a-corebluetooth-write-without-response I'm starting to suspect that Core Bluetooth writeValue returns without guaranteeing completion of transmission. Here's where Bleak uses the Core Bluetooth API: Generally, I've noticed BLE being better on MacOS/iOS. I wonder if it's so good that it's "too good for the peripheral" - it's sending the fragmented packets too quickly. The manual fragmentation that's done is required on Windows - at least with the BLE adapter that I'm using. I wonder if removing the fragmentation and telling the CB API to send the entire 2475B packet would fix it? |
May be possible to make Bleak properly async with this: https://developer.apple.com/documentation/corebluetooth/cbperipheraldelegate/peripheralisready(tosendwritewithoutresponse:)/ Or use write with response and/or send all 2475 bytes. |
Currently testing from a MacBook Pro 14 Nov 2023 M3 Pro Sonoma 14.5 Using this branch: https://github.com/intercreate/smpclient/tree/fix-%2331/image-upload-match Strange thing is that I was initially able to reproduce the error. I started fiddling with the timeouts and making some changes to The test setup is to have an nrf52840dk with the
You have to keep switching between a/b because an upload of an image that is already uploaded will simply complete immediately:
Uploads are slower than I expect. Typically about 15KB/s where Windows gets 25KB/s. MTU is negotiating to 495. Write with response does not work, I guess it's not setup on the peripheral:
Sending the entire packet also does not work (hangs and time out). By using the upload example script, I have discovered something strange. After a successful upload, the next upload will fail with
In Bleak, I've added this to def peripheralIsReady_toSendWriteWithoutResponse_(self, peripheral: CBPeripheral, characteristic: CBCharacteristic) -> None:
logger.debug(type(peripheral), type(characteristic))
logger.error("Called!") It registers but it does not get called. In async def write_gatt_char(
self,
characteristic: BleakGATTCharacteristic,
data: Buffer,
response: bool,
) -> None:
value = NSData.alloc().initWithBytes_length_(data, len(data))
await self._delegate.write_characteristic(
characteristic.obj,
value,
(
CBCharacteristicWriteWithResponse
if response
else CBCharacteristicWriteWithoutResponse
),
)
while self._peripheral.canSendWriteWithoutResponse() != True:
logger.error("Not ready to send!")
await asyncio.sleep(0.010)
logger.debug(f"Write Characteristic {characteristic.uuid} : {data}") This results in
etc. So, the canSendWriteWithoutResponse flag does seem to be False for ~50ms after avery message. But even without this delay, calling writeWithoutResponse is not raising any error callbacks. https://developer.apple.com/documentation/corebluetooth/cbperipheral/cansendwritewithoutresponse |
Gonna try hard coding the MTU to 244 which seems to be what your Mac negotiated. Still worked OK. |
I'm trying moving the DK further away from the Mac, past several walls. This makes the transfer fail, but in a sorta reasonable way:
|
Discussing here: hbldh/bleak#1589 |
@bjandre Can you try increasing the delay between the packets? And LMK the Mac model and OS version. Thanks! |
I'm using a MacBook Pro, with Apple M1 Pro chip. MacOS Sonoma. I'm glad you were able to reproduce the issue, at least initially. I tried changing the asyncio.sleep command we discussed previously with a few values up to 1 sec, but it didn't change anything. If there is a different delay value please let me know where to set it and I will try it. |
Can you try changing this line to return self.mtu? This should prevent SMP packet fragmentation. If it helps, then it broadly confirms the source of the issue. JP Update: I can confirm that mcumgr-web does not fragment the packets and instead uploads packets sized to the MTU. https://github.com/boogie/mcumgr-web/blob/main/js/mcumgr.js#L224 |
I'm trying to build a small smpclient based application using BLE transport. I am running into an asyncio error during DFU image upload that I'm not sure how to debug. I have been able to reproduce it using the code in examples/ble/upload.py. The upload is large and smpclient is able to get through a few 2475 byte buffers with mtu=244 bytes before it crashes.
Any debugging help would be appreciated.
smpclient version: 3.2.0
examples/ble/upload.py output:
The text was updated successfully, but these errors were encountered: