-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO performance is highly dependent on the size of the TX queue #189
Comments
I'm unaware of details of this library, but perhaps if there are nodes that are more frequently accessed, we could use a splay tree (it pushes most used nodes closer to the root). The drawback of this would be that it's internal representation changes upon each read (but this could be stopped after a 'characteristic' number of reads, in order to have more predictable read performance). I guess that in order for the best way to be decided, simulations need to be ran with a characteristic payload that covers as many usage edge-cases as possible.
(I'm sorry if this is totally irrelevant, I'm trying to make my way in uavs) |
Thanks for your input. Most applications that libuavcan (or UAVCAN in general) is designed for don't really care about average case. We must optimize for the worst case. Hence, while the experience of non-real-time systems such as Android may still be valuable, it can't be applied here directly. |
a simple height balanced tree (avl) can be used then. If extra performance is needed at the cost of some extra work upon insertion, red-black is the way to go. All these can be implemented by having an underlying array to take advantage of potential cpu caching. (Android is just an example on how to have an array powering a map interface) |
Would you be interested to draft up a proof-of-concept implementation? (@thirtytwobits FYI) |
I'd be willing to. Basically these need to be done on
I guess the backing AVL tree needs access to the allocator. Moving on, the tx queue can inherit from the tree so that we can do some smart remove-expired-nodes while traversing. There are some parts of the code that traverse the linked list without using it's functions but I think I can cope with that. Am I missing something? |
That makes sense. I think the optimal approach is to implement the changes in the current upstream, and we will then reimplement them in the rewrite unless @thirtytwobits has a different view (his call).
Either that, or you could use a fixed-capacity array as backing storage, as you described earlier. The size of the array could be parametrized via preprocessor options, template parameters, or a reference to the backing storage could be simply supplied from outside. |
Ok. I'm developing at https://github.com/Zarkopafilis/libuavcan/tree/txq (Check the latest commits if you want to see the progress -- it's not buildable yet) One more thing I noticed is that I'm developing without a backing array right now, it seems that it may cause many problems with memory copying -- consistently adding and removing items could cause bad performance. With the tree, special care must be taken regarding when we cleanup multiple expired frames. For now, I'm sticking to only try to remove one if out of memory, with maybe a 3rd retry after cleaning up everything. @thirtytwobits code reviews, enhancements and what-not is very welcome! :) | I've created some avl_tree under |
Update: I just finished the most optimal CanTxQueue interface I could imagine. Currently it looks like this: class UAVCAN_EXPORT CanTxQueue : public AvlTree<CanTxQueueEntry>
{
private:
static bool AVLTxQueueEntryInsertComparator(const CanTxQueueEntry &lhs, const CanTxQueueEntry &rhs) {
return rhs.frame.priorityHigherThan(lhs.frame);
}
ISystemClock& sysclock_;
uint32_t rejected_frames_cnt_;
void safeIncrementRejectedFrames();
public:
CanTxQueue(IPoolAllocator& allocator, ISystemClock& sysclock, std::size_t allocator_quota)
: AvlTree(allocator, allocator_quota, AVLTxQueueEntryInsertComparator)
, sysclock_(sysclock)
, rejected_frames_cnt_(0)
{}
~CanTxQueue();
/* Avl Tree allocates the AvlTree::Node, while this(CanTxQueue) allocates the CanTxQueueEntry
* Same logic for removal. */
void push(const CanFrame &frame, MonotonicTime tx_deadline, Qos qos, CanIOFlags flags);
void remove(CanTxQueueEntry*& entry);
/* Tries to look up rightmost Node. If the frame is expired, garbage-collects all the expired frames */
const CanTxQueueEntry* getTopPriorityNonExpiredPendingEntry() const;
/* When OOM, try to avoid garbage-collecting all the expired frames and instead swiftly remove one or two */
void removeFirstWithQosLowerOrThenEqualThan(Qos qos) const;
void removeFirstWithQosLargerThan(Qos qos) const;
// The 'or equal' condition is necessary to avoid frame reordering.
bool topPriorityHigherOrEqual(const CanFrame& rhs_frame) const;
uint32_t getRejectedFrameCount() const { return rejected_frames_cnt_; }
void removeOneExpiredFrame() const;
void removeAllExpiredFrames() const;
}; It's "optimal" for the following reasons:
There is a comment: (For now, frames are ordered by greatest priority. Perhaps some caching can be done, so that when the highest priority entry is lowest that all the other messages present in the bus, perform some cleanups before we can retransmit?). @pavel-kirienko The OOM retry could not even be possible without doing a stop-the-world GC pause, because even if we swiftly manage to find space for a new Entry, the AVL tree could fail to allocate a new Node immediately afterwards, because it has no knowledge of the outside world -- This is an extreme edge case, but perhaps, upon the slightest OOM we should clean up as much data as we can and then proceed. That's it for now. (Sorry for the wall of text). I still have some internal AVL-tree implementation to do while these sub-issues are resolved. |
Please see #195. I suggest we simplify the logic as described for the sake of robustness and determinism (and the ROM footprint too, although it's not very important). Do you think that makes sense? If it does, then designing the new interface should be a no-brainer: no cleanups and traversals are needed, and no QoS used. |
I'm almost ready to do a PR but I get this in tests:
I'm using dockcross-linux-x64 on macos mojave and I don't think I messed with that one. Do you have any idea on what is causing this? |
Signal 11 means segmentation fault. Could be a dangling pointer or whatever. I suggest you use a debugger or Valgrind to find the culprit. |
Fixed in #198 |
Whenever the library emits a frame, it puts it into a prioritized transmission queue. There is one such TX queue per interface. Each entry in the queue has a fixed lifetime so that if the interface is saturated or down, old entries can be removed in favor of newer ones.
The queue is a sorted linked list of entries, and as such, it has a linear time complexity, which causes severe performance degradation when there is a down interface, especially in complex applications where TX queues are allowed to use a lot of memory.
The worst thing about it is that it can lead to bad side effects, like a multi-interface node failing to meet its real-time constraints when an interface goes down because the library is suddenly eating all its processing time.
There are two okay solutions:
leavechoose the first option, log(N) is not so bad and constant time might be theoretically impossible.The text was updated successfully, but these errors were encountered: