-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer reception dataflow is all wrong, performance is a nightmare, determinism went out the window, everything is terrible #185
Comments
Marked as "bug" because it is a serious problem, even though it is not visible from outside. |
If what you mostly seek is to remove the linear time searching, it could be enoughto store the data in self-balanced binary trees, couldn't it ? We could cut it from |
I think we should separate the question of the data structure used for port ID mapping from the question of sharding. I am quite certain that sharding as described in my first post shall be implemented, because the gains are huge. Now, to the question of which data structure to use. Let's have a look: http://bigocheatsheet.com/ What we care about is the worst case complexity of searching and insertion. Finally, I can put my l33t gimp skills to a good use: Ignoring the lame options we have our linked list and four kinds of trees. So the use case is simple: we receive a transfer and go look for the matching receiver, which is |
I would like to add also that the data structure should be heavily optimized for very small values of |
I think it is too expensive to shard the receive list, hence my proposal to not shard but use a more efficient data structure instead. Let's do the math: Cost of sharding: We have to keep 256 We could also put all the TransferReceivers into hashmaps, which would reduce the average lookup time, without using additional memory. |
Yep, looks right and perfectly acceptable. 1 KB is not that much; memory-limited applications can always opt for libcanard. |
I'll revisit this bug in the future. My gut reaction is "egads! 256 std::maps!?!" but I won't rule it out. |
Just 256 instances of |
Invariance to the number of nodes in the network is extremely important. The ability of a node to adhere to real-time constraints should not be conditional on the number of other nodes on the network. Meaning that |
What I neglected to provide for is that there are two kinds of service transfers: requests and responses. It is possible for a node to act as a service server and a client for any service ID at the same time, meaning that the proposed dual sharding between message and service transfers does not cover all cases. The sharding must be done across three kinds: messages, service requests, and service responses, resulting in the following arrangement: Map<uint16_t, TransferReceiver> receivers_by_port_id_[128 * 3]; The rest still holds. |
So the current reception architecture was created back in 2014 when I was working on a very, very early proof-of-concept (the protocol was first announced in 2014). A lot of ground-breaking changes were introduced since then, so we ended up with this weird logic which was never designed to be this way, it just happened to be this way due to multiple partial mutations.
Сurrently, every received frame goes through the following pipeline:
TransferListener
) in a linked list. The remaining dimensions are node ID and kind. Observe that the same listener serves both services and messages under the same port ID, this is odd.TransferReceiver
) using kind and source node ID (another linked list traversal).TransferReceiver
object which reconstructs the transfer.So, plenty of unnecessary indirection and linear-time searching is afoot which we must fix.
We could follow the example of libcanard and use
uint32_t
transfer descriptors: https://github.com/UAVCAN/libcanard/blob/master/DESIGN.md#rx-transfer-states which would reduce the number of indirections to just one. Libcanard was designed much later so we were able to get things right from the first attempt. However, there is a limitation that will be problematic for libuavcan: storing all states in a single linked list requires traversal of a huge linked list per every received frame.An obvious solution is to shard lists into two per kind: one for messages, one for services; in this case, no extra dynamic indirection is introduced since the number of options is fixed (always two) and the logic can be implemented statically (meaning that there will be two separate lists, one for services, the other for messages). The remaining number of dimensions to map through will be reduced down to two: port ID and source node ID.
That seems better, but what if we're dealing with a hard real-time application? The worst case frame processing complexity would be
O(number_of_ports * number_of_nodes)
, which is, frankly, terrible, especially if we can't control for the number of nodes on the bus, which is the case for COTS hardware.How about we shard lists further? Both the number of nodes and the number of ports have well-known limits: 128 and 65536/512, respectively. The latter is impractical to deal with, but how about node ID? If we were to really put our foot down and say that libuavcan can't work with memory-tight MCUs (let's face it, it's already the case, so what), we could keep a dedicated list of
TransferReceiver
per node ID, per kind. The right linked list can be resolved by constant-complexity static array indexing, and then the right listener can be found by traversal with the complexity ofO(number_of_ports)
, where the argument is well-controlled by the application (we always know how many subjects/services we subscribe to!). The resulting structure would be:Where
uint16_t
is the type of port ID, 128 is the worst case number of nodes, and 2 is the number of transfer kinds (services or messages). TheMap
template is just a thin wrapper over a linked list. Frame dispatching would look roughly like:This would be a monumental improvement to determinism & performance, and some improvement to the ROM footprint.
The text was updated successfully, but these errors were encountered: