-
-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TID distance of 1 may cause timed out state to be selected for incoming packet #129
Comments
I apologize for the slow response. I've read your post back in June, but I knew immediately that sorting this out is going to take time so I had to table the issue until later. The later is now.
I've just spent an hour digging through my old records trying to understand what is happening. Long story short, the problem is that the arguments to This is the pseudocode from Specification: Observe that the local state is on the right; the received transfer-ID is on the left. The algorithm requires us to compute the number of increment operations that need to be applied to the received value in order to equalize it with the local state. This algorithm is implemented correctly in libuavcan and pyuavcan: Notice that in libcanard the argument order is different, hence the problem you've observed. Swapping the argument order on your branch while keeping the correct comparison |
Hi Pavel, my apologies too, things have been hectic my end! I will get this done and back to you tonight/tomorrow. |
Fixed in #142 |
遇到的同样的问题,发现原因是:发送端应该每条消息使用独立的transfer_id,而不要所有的消息共用同一个递增的transfer_id。 |
A publishing node shall use a separate transfer-ID counter per subject (topic). Usage of a shared transfer-ID counter for all subjects is non-compliant and will cause issues. |
We've been running a system with lots of nodes, some sharing the same set of messages.
We found that after a target node had timeout out once the system enters a strange state where new messages (with a higher TID) pick up state from an old transfer and don't process because of TID/TOGGLE mismatches.
On further inspection we can see that
Tracing this back through the code I think there is a bug in the computation of the
not_previous_tid
variable - this causesneeds_restart
to not be true and the transfer to be rejected for bad TIDs and TOGGLEsSubstituting:
for
Fixes this issue. This seems plausible as well, as a distance of 1 is indeed
not_previous_tid
Does this seem reasonable or are we maybe missing something here?
The text was updated successfully, but these errors were encountered: