TID distance of 1 may cause timed out state to be selected for incoming packet #129

magshaw · 2019-06-14T14:06:11Z

We've been running a system with lots of nodes, some sharing the same set of messages.

We found that after a target node had timeout out once the system enters a strange state where new messages (with a higher TID) pick up state from an old transfer and don't process because of TID/TOGGLE mismatches.

On further inspection we can see that

Tracing this back through the code I think there is a bug in the computation of the not_previous_tid variable - this causes needs_restart to not be true and the transfer to be rejected for bad TIDs and TOGGLEs

Substituting:

    const bool not_previous_tid =
        computeTransferIDForwardDistance((uint8_t) rx_state->transfer_id, TRANSFER_ID_FROM_TAIL_BYTE(tail_byte)) > 1;

for

    const bool not_previous_tid =
        computeTransferIDForwardDistance((uint8_t) rx_state->transfer_id, TRANSFER_ID_FROM_TAIL_BYTE(tail_byte)) != 0;

Fixes this issue. This seems plausible as well, as a distance of 1 is indeed not_previous_tid

Does this seem reasonable or are we maybe missing something here?

The text was updated successfully, but these errors were encountered:

pavel-kirienko · 2019-07-31T17:48:12Z

I apologize for the slow response. I've read your post back in June, but I knew immediately that sorting this out is going to take time so I had to table the issue until later. The later is now.

Does this seem reasonable or are we maybe missing something here?

I've just spent an hour digging through my old records trying to understand what is happening. Long story short, the problem is that the arguments to computeTransferIDForwardDistance() are swapped. Being human, I am prone to mistakes of that sort.

This is the pseudocode from Specification:

Observe that the local state is on the right; the received transfer-ID is on the left. The algorithm requires us to compute the number of increment operations that need to be applied to the received value in order to equalize it with the local state. This algorithm is implemented correctly in libuavcan and pyuavcan:

https://github.com/UAVCAN/libuavcan/blob/67e56232362aea9f9606d0e80454e6abcae5ff5a/libuavcan/src/transport/uc_transfer_receiver.cpp#L201

https://github.com/UAVCAN/pyuavcan/blob/281680b28657574818348ec5a67347f129525d44/pyuavcan/transport/can/_session/_transfer_receiver.py#L53

Notice that in libcanard the argument order is different, hence the problem you've observed. Swapping the argument order on your branch while keeping the correct comparison > 1 (strictly greater) makes your unit tests pass.

mike7c2 · 2019-08-19T16:42:31Z

Hi Pavel, my apologies too, things have been hectic my end! I will get this done and back to you tonight/tomorrow.

pavel-kirienko · 2020-03-08T03:27:19Z

Fixed in #142

ImFovery · 2024-02-20T02:08:03Z

遇到的同样的问题，发现原因是：发送端应该每条消息使用独立的transfer_id，而不要所有的消息共用同一个递增的transfer_id。
The same issue encountered was found to be caused by the sender using a unique transfer_id for each message, instead of all messages sharing the same incrementing transfer_id.

pavel-kirienko · 2024-02-20T10:01:49Z

A publishing node shall use a separate transfer-ID counter per subject (topic). Usage of a shared transfer-ID counter for all subjects is non-compliant and will cause issues.

pavel-kirienko self-assigned this Jun 14, 2019

pavel-kirienko mentioned this issue Jul 31, 2019

Modification to TID handling and new tests to cover TID and timeout #132

Closed

pavel-kirienko mentioned this issue Mar 3, 2020

UAVCAN v1.0 done #142

Merged

pavel-kirienko closed this as completed Mar 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TID distance of 1 may cause timed out state to be selected for incoming packet #129

TID distance of 1 may cause timed out state to be selected for incoming packet #129

magshaw commented Jun 14, 2019 •

edited

Loading

pavel-kirienko commented Jul 31, 2019

mike7c2 commented Aug 19, 2019

pavel-kirienko commented Mar 8, 2020

ImFovery commented Feb 20, 2024

pavel-kirienko commented Feb 20, 2024

TID distance of 1 may cause timed out state to be selected for incoming packet #129

TID distance of 1 may cause timed out state to be selected for incoming packet #129

Comments

magshaw commented Jun 14, 2019 • edited Loading

pavel-kirienko commented Jul 31, 2019

mike7c2 commented Aug 19, 2019

pavel-kirienko commented Mar 8, 2020

ImFovery commented Feb 20, 2024

pavel-kirienko commented Feb 20, 2024

magshaw commented Jun 14, 2019 •

edited

Loading