-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth connection hangs / too many messages - with v3.3.0b6 #1095
Comments
I am able to reproduce the problem, even with observing disabled. It appears to cause the Bluetooth chip to crash (not only does it stop printing, it also disconnects). Getting stuck on shutdown seems like a second bug - probably due to use after free due to pybricks/pybricks-micropython@7d34df4 hack. |
Crashing of the bluetooth chip sounds familiar to me. When I was experimenting with simultaneously scanning and advertising this happened to me as well, and in the end led me to the more stable but less favourable option to switch between scanning and advertising instead. This inherently limited the number of commands to the bt-chip. Your observation that this also happens with observing disabled on the other hand is new to me. You are still printing to the console and scanning simultaneously? I unfortunately have never found a way to get feedback from the BT-chip, like an error or warning that indicated that it needed more time to process things before a new command could be send. |
Good question. I tested it and it still crashes without the Running the same program while not connected to the computer runs longer but crashes the Bluetooth chip eventually too. |
Many Bluetooth tasks can get stuck (can't be canceled) if the Bluetooth chip crashes. Adding checks for this at every yield point would be very expensive in terms of code size. So for now, we just add another hack to cancel all tasks on shutdown to avoid undefined behavior. In some cases, this undefined behavior was causing the hub to not power off unless the batteries were removed. Issue: pybricks/support#1095
This part of the issue is fixed by pybricks/pybricks-micropython@7a8a849. So at least we don't have to remove the batteries any more. Still investigating how to prevent the Bluetooth chip from crashing in the first place. |
Sounds like the bluetooth crash is not an easy topic. Thank you so much @dlech @NStrijbosch for all your continued efforts and not giving up on this. I did another test and made an observation which I find interesting. Maybe it is something completely different and unrelated, but I imagine in case this is unintentional behavior it might potentially also help with the crash. I use v3.3.0b6 and: from pybricks.hubs import TechnicHub
from pybricks.tools import wait
hub = TechnicHub(broadcast_channel=1)
count = 0
while(True):
hub.ble.broadcast("hel", count)
count = count + 1
wait(10) If I run the code while connected to pybricks-ide I recieve a broadcast every 100ms and the number in the message increments by 10 every time. It stays like this when disconnecting the computer while the program is running. This is what I would have expected. When I start the program again, this time with the hub not connected to the computer, the broadcast is 3 to 4 times faster at irregular intervals and eventually crashes. The timestamps that are transmitted until it crashes look like this: To recieve the broadcast I used this nodejs script: |
The broadcasting 3x as frequently when not connected bug should be fixed now by pybricks/pybricks-micropython@3317369. |
I'm starting to think this is simply not fixable (bug in Bluetooth chip). By reducing the frequency of calling Is there a known version of Nard's firmware that can run for >5 hours without the Bluetooth chip crashing? |
I checked out pybricks/pybricks-micropython@af8626c and ran the following (outdated/modified) programs to broadcast with just a 10 ms delay: Technic Hub from pybricks.hubs import ThisHub
from pybricks.parameters import Color
from pybricks.tools import wait
from pybricks.experimental import Broadcast
hub = ThisHub()
hub.light.on(Color.WHITE)
# Initialize broadcast with two topics.
radio = Broadcast(["counter", "hue"])
while True:
for i in range(100):
# Send one byte on the number signal.
radio.send_bytes("counter", bytes([i]))
wait(10)
# Receive two bytes on the hue signal.
hue_data = radio.receive_bytes("hue")
# If we received something, set the light.
if hue_data:
hue = int.from_bytes(hue_data, "little")
hub.light.on(Color(hue)) On Prime Hub from pybricks.hubs import PrimeHub
from pybricks.parameters import Color
from pybricks.tools import wait
from pybricks.experimental import Broadcast
hub = PrimeHub()
hub.light.on(Color.WHITE)
# Initialize broadcast with two topics.
radio = Broadcast(topics=["counter", "hue"])
while True:
# Read the counter value.
counter_data = radio.receive_bytes("counter")
# If we received it, display it, and send a response.
if counter_data:
count = counter_data[0] % 100
hub.display.number(count)
# Send back a hue value.
hue_data = int(count * 3.6).to_bytes(2, "little")
radio.send_bytes("hue", hue_data)
# Stop if we reach the end.
if count > 99:
break
# Don't need to update all the time.
wait(10) test.mp4I started it 30 minutes ago, and it is still going! To be continued.... |
Still going after 7 hours! 🕐 At 7 * 60 * 60 * 100 = 2 520 000 messages going both ways this probably shows that it could work? |
Would you mind attaching the firmware.zip for that here here? |
I picked a build fairly randomly. It could be others work too. I should probably also test both scripts on the latest firmware. |
Still going after 12 hours. One of the hubs is about to run low on battery though 😄 |
Just to get the details correct: although you call the broadcast 100 times per second, it will only update the advertising data maximal 20 times per second (depending on which commit you used, I introduced some pseudo random period at some point). Nonetheless still impressive that it ran for at least twelve hours. This is also about the build Anika was running on a City hub during 7 days of LEGO world without many issues. That one was only receiving data though. |
The firmware and test program that Laurens shared still locks up the Bluetooth chip on my Technic hub in about a minute or so when running while connected to Pybricks Code. It seems to be running longer when not connected to Pybricks Code though. So I will need to go back and try the current firmware without being connected to Pybricks code and see how that goes. |
The Bluetooth chips on the Technic and City hubs seem to lock up if the advertisement date is updated too many times. So far, we have only found one condition where this doesn't happen. The Bluetooth chip can't be connected to any other devices and advertising has to be stopped and restarted between each update. This is quite limiting, but seems to be the best we can do for now. Issue: pybricks/support#1095
I think I finally made a breakthrough. I found some alternate commands to set the advertising data and one of them appears to avoid the lock up! For those interested in the details, there are GAP Set Advertisement Token, GAP Remove Advertisement Token, and GAP Update Advertisement Tokens commands that update individual advertisement data type values instead of doing the whole thing at once. I tried these commands, but they result in the same lockup. But this leads me to believe that there is some sort of fancy data management in the Bluetooth chip that parses the data for each AD structure. It seems quite plausible that there is a memory leak related to this that is what eventually causes the Bluetooth chip to lock up. Fortunately though, the chip also supports some standard Bluetooth HCI commands. There is one for setting LE Advertisement Data. I tried this one and it appears to update the data without triggering the memory leak! Interestingly, with the standard HCI command we a command complete event as if we had called the TI vendor-specific command. But the docs say "Note that while all commands can be sent by the application processor to the Network Processor using HCI, not all events will be returned as they will be trapped and possibly discarded by the Host layer of the |
Discussed in https://github.com/orgs/pybricks/discussions/1091
Originally posted by Debenben June 3, 2023
I can reproduce something similar to the reported problem with v3.3.0b6 and the following code:
When running the above program the hub stops broadcasting "hello" after around 10 to 40 seconds. If connected to a computer it also stops printing "running" in the pybricks-ide. When connected to the computer and using long-press the hub starts blinking blue very fast and is no longer responsive, I need to remove the batteries to get it back to normal. If not connected to the computer I can stop the program with long-press on the button and then the hub behavior is back to normal.
The hub.light.on and print command is not necessary, the behavior is also there when only using hub.ble.observe and hub.ble.broadcast commands. The crash and weired behavior is gone when adding wait(1) in the loop.
The text was updated successfully, but these errors were encountered: