-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLE broadcast: communication between pybricks hubs #80
Conversation
Nice work! This could be a great first step in bluetooth communication, and compatibility with the official MINDTORMS app is a nice touch. It could be a nice addition/alternative to the BLE UART in pybricks/support#262 since both approaches have different use cases. As you point out, there's still some work to be done before we can merge it, but a PR is just the right spot for that :) |
I've taken a closer look and reworked parts of it to address some of your questions and some stability issues: The aim was to move all the platform-agnostic logic to The With the original PR, the hubs would crash (watchdog timer) quite frequently. While I haven't ruled out all possible causes, I think part of it may be related to how data is allocated and used. When transmitting, I think the process could be trying to access data that no longer exists. When handling advertisement data, it could sometimes write data to heap even after the user program ends. This update resolves this by allocating a number of read-only signals statically. There is one static object for transmission. A next step could be to look at how the various processes should work, especially considering (re)starting of scanning and advertising.
I've added a timestamp to incoming messages, which could be a way to address this. This is not extensively tested yet, so not everything may work as advertised (ha!). EDIT: With these updates, the hub is no longer crashing and rebooting with the watchdog timer. |
662a229
to
faa1f8c
Compare
Thanks for the great (offline) discussion so far @NStrijbosch. I've added my changes on top of your branch. One of the next steps would be to convert the transmission into a task which updates and waits for the advertising data to be updated. It could (should?) also start another (cancellable) task to stop advertising after a timeout, and not wait for it at the user level. Combined, this ensures that we 1) don't have to wait a second between transmissions and 2) can safely update the transmitted data without overriding ongoing tasks. |
I've added an update with two additional methods ( You can now send and receive a tuple with up to 8 values at once, as follows: radio = Broadcast(topics=["info"])
radio.send("info", ("Hello", 1.234, "world!")) The tuple may contain integers (2 bytes or 4 bytes), floats (4 bytes), or strings ( N + 1 bytes), as long as the total fits within 20 bytes. This facilitates many use cases, including things like remote control of a vehicle, e.g: radio.send("info", (steering, speed, light_status)) On the receiving end, you could do: data = radio.receive("info")
if data is not None:
steering, speed, light_status = data
# Apply steering, speed, etc. The original methods to send raw bytes are still available, though they are renamed to |
Great stuff going on here. Following with interest... |
6837d44
to
7fb0166
Compare
This is fixed through: 7fb0166 Besides this the following claim was false:
It turns out the BT-chip can scan and transmit at the same time |
7fb0166
to
a72d891
Compare
We've now added a broadcast process that ensures we don't que up too much stuff if you send things in a tight loop. This makes things quite a bit smoother and more reliable. That also means some additional work is required to start and stop this background process cleanly. This isn't done yet, so you might see some unexpected behavior after running your first program if you try this build. |
Thanks for this great work! |
FYI, LEGO has fixed the advertising data bugs where the length was missing and the byte order of the company ID was wrong in the MINDSTORMS app v10.3.0. Unfortunately, they are still using ADV_SCAN_IND instead of ADV_NONCONN_IND. |
@@ -152,7 +153,8 @@ void pbio_broadcast_parse_advertising_data(const uint8_t *data, uint8_t size) { | |||
} | |||
|
|||
// We only process data with the right header | |||
if (memcmp(data, &transmit_signal.header[0], 3)) { | |||
if ((data[0] != size - 1) && data[1] != transmit_signal.AD_type && | |||
memcmp(&data[2], &transmit_signal.company_id, 2)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memcmp(&data[2], &transmit_signal.company_id, 2)
Can probably be simplified to something like
pbio_get_uint16_le(&data[2]) != transmit_signal.company_id
0b56ef2
to
cd98faa
Compare
I will just leave some issues I am trying to fix here: When running a program on the technic hub from
Both issues are hard to debug since I am not able to find a case with a 100% chance on either of the issues above. Besides this I only observed the issues when running from main.py, so I cannot print messages to the terminal to see where the program is stuck... Anyhow, I still have some ideas to test. I will update the observations above accordingly. |
Does this make the problem go away? diff --git a/pybricks/util_pb/pb_task.c b/pybricks/util_pb/pb_task.c
index 73b58558..84e20878 100644
--- a/pybricks/util_pb/pb_task.c
+++ b/pybricks/util_pb/pb_task.c
@@ -42,7 +42,7 @@ void pb_wait_task(pbio_task_t *task, mp_int_t timeout) {
pbio_task_cancel(task);
while (task->status == PBIO_ERROR_AGAIN) {
- MICROPY_VM_HOOK_LOOP
+ MICROPY_EVENT_POLL_HOOK
}
nlr_jump(nlr.ret_val); This isn't a proper fix because it would cause tasks to keep running after an exception and lead to memory corruption, etc But it would tell us that the problem is that a task was requested to be canceled but never properly canceled. |
The problem is not gone. But it does change the behaviour in the blocking state: With this change a single press will shutdown the hub, instead of the long press before. So at least One of my own test was to change |
If the task times out, it still has to be cancelled and wait for cancellation to complete. But some tasks are not currently cancellable at all or in some regions of code, so will cause a lock up. Since the hub powers off/reboots when you make the change I suggested, that indicates that it caused a hard crash, which was expected. |
@dlech You are correct. Some of the tasks I added were not cancellable. I am trying to understand what is going on. I suspect that the assumption that the BTchip will perform all commands without an error is not justified anymore when stretching the limit of fast advertisement updates. To debug the status of the BTchip I use a pybricks-micropython/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c Lines 49 to 59 in da8fd73
|
Yes. And in the case of actually connecting to other devices, we need to consider the handle of the remote device as well. Basically, we need to replace all instances of pybricks-micropython/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c Lines 549 to 576 in da8fd73
This is dangerous since printing also uses Bluetooth. It will likely change the behavior of what you are trying to inspect. Also, using printf in any pbio code is dangerous because it can cause reentrany issues because it runs the contiki event loop while we are already in the contiki event loop. One trick I use sometimes is to make the print buffer very big so that printf doesn't block when called. This avoids the reentrancy problems in most cases.
This uses a UART to print debug messages. This works around the problem of printf using Bluetooth. Although it still isn't perfect because it is blocking and can break things by changing the timing. To use it, you have to change the configuration to make the hub only have 3 ports instead of 4 and then add some code in the early init in platform.c to configure the UART on port D. If you look way back in git history, you can probably find a time where this was implemented. |
This sounds like exactly what I hoped for! I look back in the GIT history to check the implementation. Deep down: 48c442b |
Using the UART hack on port D I was able to trace the cause of the blocking state. It seems that the pybricks-micropython/lib/pbio/drv/bluetooth/bluetooth_stm32_cc2640.c Lines 1703 to 1709 in da8fd73
Consequently, there are no new events registered anymore. When waiting, e.g., for an event that indicates that advertisement data is updated, this event is not triggered. Leading to the blocking state. |
I wonder if we are getting an SPI error. If you override the weak function |
I added the following to lib/pbio/platform/technic_hub/platform.c: void HAL_SPI_ErrorCallback(SPI_HandleTypeDef *hspi) {
DBG("SPI error");
} including the required definition of But unfortunately the DBG print in |
Hmm... it sounds as if we could be just missing an interrupt then. I don't see how else |
Could it be just a single interrupt? The hub is scanning, so advertising data should be received by the BTchip all the time. So shouldn't this interrupt be triggered again after missing one? At the moment when it is stuck, the interrupt handler is not called anymore. In other words: couldnt the cause be that the BTchip just stopped sending stuff? |
The interrupt is for the end of a DMA transfer, so there will never be another interrupt until a new DMA transfer is started. But that doesn't happen because the loop that starts the transfers is locked waiting for the previous transfer to complete. On the SPI bus, unlike UART, the MCU has to initiate all communications. |
Another thought. We could check the return value of |
Clear! |
the advertising state `advertising_now` needs to be changed when advertising for broadcasts and connecting to code.pybricks.com. It also turned out that by changing the maximum number of peripheral devices to two for the primehub removed the implicit advertisement stop after connecting to code.pybricks.com. This is also fixed by an expicit advertisement stop.
when the broadcast process is stopped, also stop scanning and stop advertising.
Since the Bluetooth and BLE roadmap won't be finalized for the upcoming release, it's better to import this class via experimental. This means that we can still change it even if we merge it.
Simultaneous advertising and scanning is unreliable with the stm32_cc2640. This commit introduces a broadcast process in the driver that periodically switches between scanning (30 ms) and advertising (20 ms). This enables to get as close to simultaneous broadcast sending and receiving as possible.
For endurance testing this longer test is helpfull
To stop the broadcast process after the program is stopped by the user
stopping the broadcast process is not sufficient. When stopping it it is likely to be still advertising or scanning. Hence, after stopping the process the advertising and scanning also need to be stopped.
Add the info() method. The argument is the name of a topic. It returns a tuple containing: index, timestamp, and rssi value corresponding to the last received data for this topic.
To comply with the conventional structure of advertising data the number of bytes should correspond to the complete number of bytes (including the meta data).
0a7c4bb
to
af8626c
Compare
Hi! What I need is bidirectional communication between two technic hubs, ideally with "minimal" lag. To test responsiveness (a bit empirically) I connected two motors to each hub to ports A and B and used the position of the motor on port A (the I noticed that the communications works for a few seconds, then stops for a few seconds, then resumes again... I load the programs on both hubs from Is there something I can do to troubleshoot this? |
Thank you for the feedback. I do not recognize the drop in communication that takes seconds. Can you share a small code example such that I can reproduce this?
With this implemention it is important to distinguish between the two possible scenario's to run a program:
|
Well, the two programs I use are short enough. Here is the "commander": from pybricks.hubs import TechnicHub
from pybricks.pupdevices import Motor
from pybricks.parameters import Color, Port
from pybricks.robotics import DriveBase
from pybricks.tools import wait, StopWatch
from pybricks.experimental import Broadcast
hub = TechnicHub()
ma = Motor(Port.A)
mb = Motor(Port.B)
radio = Broadcast(topics=["data", "cmd"])
data_angle = 0
cmd_angle = 0
while True:
cmd_angle = ma.angle()
mb.track_target(data_angle)
radio.send("cmd", (cmd_angle))
data = radio.receive("data")
if data:
data_angle = data
hub.light.on(Color.GREEN)
else:
hub.light.on(Color.RED)
wait(10) And here is the "reporter": from pybricks.hubs import TechnicHub
from pybricks.pupdevices import Motor
from pybricks.parameters import Color, Port
from pybricks.robotics import DriveBase
from pybricks.tools import wait, StopWatch
from pybricks.experimental import Broadcast
hub = TechnicHub()
ma = Motor(Port.A)
mb = Motor(Port.B)
radio = Broadcast(topics=["data", "cmd"])
data_angle = 0
cmd_angle = 0
while True:
data_angle = ma.angle()
mb.track_target(cmd_angle)
radio.send("data", (data_angle))
cmd = radio.receive("cmd")
if cmd:
cmd_angle = cmd
hub.light.on(Color.GREEN)
else:
hub.light.on(Color.RED)
wait(10) They are actually mostly identical, with just the topics reversed. |
BTW, I'd like to know the actual rate limit for bidirectional communication :-) It's a bit off-topic, but I wonder if a Technic Hub could be configured as a master for the Nordic UART Service (NUS), and connect to the other hub using the NUS, and if this would result in better performance than this amazing but tricky broadcast machinery (which, I admit, is a stroke of genius in its way of piggybacking on the BLE device advertising data frames...). |
Thanks I will try to reproduce your issue in the coming days.
In the current implementation this is 20 Hz for the Technic/City hubs (these switch between scanning (20ms) and advertising (30ms)). For the Inventor/Prime/Essential hubs this rate might be better, these hubs are able to scan and advertise at the same time. However you have to consider that it is a lossy communication protocol, i.e., increasing the communication rate might lead to a lot of lost data.
I think this should be possible, and it could definitely give you better performance than the current broadcast. At the time this broadcast implementation seemed to be an easier way to get hub-to-hub communication working for a lot of usecases. |
I think I have a reasonable explanation for the behavior I am observing.
The two hubs are not synchronizing. What I fear is happening is that the 50ms period (20ms scan + 30ms send) is not "exact", and the two hub periods change "phase" periodically (relatively slowly). I think that this could be verified with a BLE radio scanner on a third device (like a PC or a radio-enabled CPU like the ESP32), and I have seen instructions on how to make it in other related issues (like here) but I still did not have the time to do it. Now, supposing that I am right, and given the fact that technic hubs are not able to advertise and receive at the same time, to make this work in the bidirectional case we'd need to sync their periods in some way. I already have a few ideas, but before explaining them I'd like to check a few things about hardware capabilities.
|
My general idea is to give more information to the broadcast system initialization about the "network topology".
Each brick is supposed to follow the topic list cyclically, and for each topic either receive or transmit for the transmission time period of that topic. Each packet should contain a number that tells how many microseconds the transmitter is inside its transmission time window. Note two more things:
I know this complicates things, but I fear that it is the only way to make it work with the technic hub BLE chip. The technic hub is dear to me because it is likely the best hardware we can have for Lego robotic competitions. I care about latency because our robots move at about 1m/s (line followers are even faster). |
One final comment, with one more detail on where we could put the timestamp inside the packet payload. We are already using every available byte, but I think we could repurpose the topic CRC32 four bytes.
My general idea is to use "sender IDs" to understand "what is being transmitted", and let each sender write ideally to a single topic, and at most a handful of topics (we already have a static limit of four topics for the technic hub...).
This would leave the whole radio framework unchanged but would support the "sync dance" described in the previous message. If you think all of this makes sense, I can really help with the implementation on top of this PR!!! |
@massimiliano-mantione I am rather skeptical about using broadcast for real-time synchronization between hubs. There is no guarantee that the message is received and many other factors can break the communication. A reliable synchronization protocol would require acknowledging messages and would introduce a substantial delay in the transmission. I see broadcast as more like an asynchronous communication channel. The main benefit of using it is that you don't need to know the topology. New hubs just join the network and you don't need to pair them. So my suggestion would be to support sharing topics by multiple hubs and collect responses from all devices (now responses are collected per topic hash and the last response wins). |
Thanks for all the great discussions and ideas everyone! Feel free to open new discussions if you'd like to explore certain topics further. |
This test uses the LED to indicate data dropouts.
The broadcast process periodically switches between scanning and advertising. Previously this was done with a fixed timing interval (50ms). This could lead to this process being fully synchronized, i.e., both hubs are scanning at the same time and advertising at the same time, which makes it imposible to transfer data via the broadcast. This commit introduces a pseudo random timing of both the scanning and advertising interval. Each in the interval [15,35] ms. This makes it virtually impossible to have fully synchronization for multiple periods in a row.
That is correct, and I think the issue you notice is indeed related to this. Ideally the advertising and scanning are completely opposite, i.e., the broadcast process is asynchronous as possible. In the current version available here the constant period length is not exactly constant, leading to the process getting slowly in sync and slowly out of sync again. This can be observed as data drop outs for multiple seconds. One solution I could think of to minimize the chance of the process being in sync is to make the period random, which is included by f747335. This solution should work for any number of hubs. I did some experiments and this greatly improved the bidirectional communication between two technic hub that transfer data at a high rate. Of course there are the expected data dropouts, but they don't take seconds anymore. |
I rebased this branch here: https://github.com/pybricks/pybricks-micropython/tree/broadcast-rebase. The basics still seem to work, but for Prime Hub it now only works if it is not connected to the computer. So I did not force push this branch yet. |
This adds new methods to broadcast and observe advertising data using Bluetooth Low Energy. The communication protocol uses the LEGO manufacturer-specific data similar to the experimental protocol from the official Robot Inventor firmware. Multiple Python objects are serialized using a compressed format to squeeze as many objects into the 31 byte advertising data as possible. Since the BLE object is a singleton initialized on the hub object, the initialization parameters are added to the Hub constructors. Each hub can only broadcast on a single "channel" and can receive on multiple channels. Channels are limited to 16 to avoid allocating too much memory.
A pull request with similar functionality inspired by this one has been merged in #158. Thanks everyone for all the contributions, especially @NStrijbosch ! |
This PR is a starting point to support communication between Primehubs, Inventorhubs, Essentialhubs, Technichubs and Cityhubs.
The protocol is based on the protocol used by the
hub to hub
wordblocks of the official LEGO MINDSTORMS app. This implies communication is possible between a hub running pybricks and an inventor hub programmed using wordblock using the official LEGO app.Code example
When running this code on multiple prime/inventor hubs the screen will sync.
Known issues:
Although this PR contains the essentials to get the communication working. There are a few minor issues:
adv_type=ADV_SCAN_IND
. Currently, this does not work on the Technic Hub, hence it only sends withadv_type=ADV_IND
. Communication between pybricks hubs works fine at the moment, but I am still looking for a fix to advertise withADV_SCAN_IND
. EDIT: BTChip reports error:bleIncorrectMode
when sending withADV_SCAN_IND
To have bidirectional communication hubs need to both scan(receive) and advertise (transmit). I am not sure how this is exactly implemented on BT-chip on the Technic Hub/City but the implementation in this PR is as follows:when a broadcast is initiated the hub starts scanningwhentransmit()
is called the hub starts advertising. I think the BT-chip stops scanning temporarily.1000 ms after callingtransmit()
the hub stops advertising. Scanning restarts automatically (no explicit call is necessary)I implemented the 1000ms delay using anetimer
, and it seems to work when starting a program from code.pybricks.com. However when I store the program in main.py the timer never expires, i.e., the hub advertises indefinitely and never restarts scanning. This prevents bidirectional communication when storing the program on the hub. I am not sure what all the process are doing in the background that cause this behaviour. Maybe you have some insights to fix this bug.Anyhow, thanks in advance for all feedback you may have!