fix: update handling of SAF message propagation and deletion #3164

philipr-za · 2021-08-05T18:34:41Z

Description

This PR adds two changes to the way SAF messages are handled to fix two subtle bugs spotted while developing cucumber tests.

The first issue was that when a Node propagates a SAF message it was storing to other nodes in its neighbourhood the broadcast strategy it was using only chose from currently connected base nodes. This meant that if the Node had an active connection to a Communication Client (wallet) it would not just directly send the SAF message to that client but to other base nodes in the network region. This meant that the wallet would only receive new SAF message when it actively requested them on connection even though it was directly connected to the node.

This PR adds a new broadcast strategy called DirectOrClosestNodes which will first check if the node has a direct active connection and if it does just send the SAF message directly to its destination.

The second issue was a subtle problem where when a node starts to send SAF messages to a destination it would remove the messages from the database based only on whether the outbound messages were put onto the outbound message pipeline. The problem occurs when the TCP connection to that peer is actually broken the sending of those messages would fail at the end of the pipeline but the SAF messages were already deleted from the database.

This PR changes the way SAF messages are deleted. When a client asks a node for SAF message it will also provide a timestamp of the most recent SAF message it has received. The Node will then send all SAF messages since that timestamp that it has for the node and will delete all SAF messages from before the specified Timestamp. This serves as a form of Ack that the client has received the older messages at some point and they are no longer needed.

How Has This Been Tested?

Unit tests have been updated to test this functionality.

Checklist:

I'm merging against the development branch.
I have squashed my commits into a single commit.

SWvheerden

Looks good, this is a good fix

sdbondi

Looks good - I had assumed that we keep the broadcast strats the same and just make closest send to the dest peer if a connection is available rather than creating a new one broadcast strat. (Perhaps a miscommunication, I had thought maybe a DirectOrClosest would be needed on the connectivity_manager.select_connections call, not as a broadcast strat)

Any reason/use-case that we need to keep the previous Closest behaviour around?
A quick look, I can't see a reason to have a new strat. If that is the case, I'd say rather fix ClosestNodes to send directly if connected.

comms/dht/src/store_forward/service.rs

philipr-za · 2021-08-10T06:24:56Z

A quick look, I can't see a reason to have a new strat. If that is the case, I'd say rather fix ClosestNodes to send directly if connected.

Cool, I was a bit hesitant to assume there was no use for it so I left it in as is, in case, but if you think it can be cut I will take it out. I think I will leave it called DirectOrClosestNodes just so that the name makes the behaviour clear.

This PR adds two changes to the way SAF messages are handled to fix two subtle bugs spotted while developing cucumber tests. The first issue was that when a Node propagates a SAF message it was storing to other nodes in its neighbourhood the broadcast strategy it was using only chose from currently connected base nodes. This meant that if the Node had an active connection to a Communication Client (wallet) it would not just directly send the SAF message to that client but to other base nodes in the network region. This meant that the wallet would only receive new SAF message when it actively requested them on connection even though it was directly connected to the node. This PR adds a new broadcast strategy called `DirectOrClosestNodes` which will first check if the node has a direct active connection and if it does just send the SAF message directly to its destination. The second issue was a subtle problem where when a node starts to send SAF messages to a destination it would remove the messages from the database based only on whether the outbound messages were put onto the outbound message pipeline. The problem occurs when the TCP connection to that peer is actually broken the sending of those messages would fail at the end of the pipeline but the SAF messages were already deleted from the database. This PR changes the way SAF messages are deleted. When a client asks a node for SAF message it will also provide a timestamp of the most recent SAF message it has received. The Node will then send all SAF messages since that timestamp that it has for the node and will delete all SAF messages from before the specified Timestamp. This serves as a form of Ack that the client has received the older messages at some point and they are no longer needed.

philipr-za · 2021-08-10T09:49:16Z

Looks good - I had assumed that we keep the broadcast strats the same and just make closest send to the dest peer if a connection is available rather than creating a new one broadcast strat. (Perhaps a miscommunication, I had thought maybe a DirectOrClosest would be needed on the connectivity_manager.select_connections call, not as a broadcast strat)

Any reason/use-case that we need to keep the previous Closest behaviour around?
A quick look, I can't see a reason to have a new strat. If that is the case, I'd say rather fix ClosestNodes to send directly if connected.

So I looked at this a bit and I noted that the Join process still kind of needs the ClosestNodes without the direct option. I also think that perhaps there might be other applications where sending to a neighbourhood and not doing the direct case might be useful.

The Join process might need an update in the future when we move to k-buckets but I am going to leave that considering for the future.

aviator-app · 2021-08-10T12:59:31Z

PR queued successfully. Your position in queue is: 3

aviator-app · 2021-08-10T15:14:59Z

PR is on top of the queue now

@stanimal

# Based on tari-project#3164 This PR addresses the following scenario spotted by @stanimal: - NodeA sends to nodeB(offline) - NodeA goes offline - NodeB receives tx, and cancels it (weird I know) - NodeA comes online and broadcasts the transaction - NodeB is not aware of the transaction, transaction complete for NodeA This is handled by adding logic that if a FinalizedTransaction is received with no active Receive Protocols that the database is checked if there is a matching cancelled inbound transaction from the same pubkey. If there is the receiver might as well restart that protocol and accept the finalized transaction. A cucumber test is provided to test this case. This required adding in functionality to the Transaction and Output Manager service to reinstate a cancelled inbound transaction, unit tests provided for that.

aviator-app · 2021-08-10T16:25:37Z

PR failed to merge with reason: Some CI status(es) failed
Failed CI(s): ci/circleci: run-integration-tests

aviator-app · 2021-08-10T18:36:34Z

PR queued successfully. Your position in queue is: 1

aviator-app · 2021-08-10T18:36:37Z

PR failed to merge with reason: Some CI status(es) failed
Failed CI(s): ci/circleci: run-integration-tests

aviator-app · 2021-08-11T15:50:08Z

PR queued successfully. Your position in queue is: 2

aviator-app · 2021-08-11T16:51:13Z

PR is on top of the queue now

SWvheerden previously approved these changes Aug 6, 2021

View reviewed changes

sdbondi reviewed Aug 8, 2021

View reviewed changes

comms/dht/src/store_forward/service.rs Outdated Show resolved Hide resolved

philipr-za dismissed SWvheerden’s stale review via 03c5146 August 10, 2021 09:47

philipr-za force-pushed the philip-saf-fixes branch from 15cf624 to 03c5146 Compare August 10, 2021 09:47

stringhandler approved these changes Aug 10, 2021

View reviewed changes

stringhandler changed the title ~~Update handling of SAF message propagation and deletion~~ fix: update handling of SAF message propagation and deletion Aug 10, 2021

stringhandler added the mq-approved label Aug 10, 2021

Merge branch 'development' into philip-saf-fixes

ebe3f44

philipr-za mentioned this pull request Aug 10, 2021

fix: handle receiver cancelling an inbound transaction that is later received #3177

Merged

2 tasks

aviator-app bot added the mq-failed label Aug 10, 2021

sdbondi approved these changes Aug 10, 2021

View reviewed changes

stringhandler removed the mq-failed label Aug 10, 2021

aviator-app bot added the mq-failed label Aug 10, 2021

stringhandler removed the mq-failed label Aug 11, 2021

aviator-app bot added 2 commits August 11, 2021 16:51

Merge branch 'development' into philip-saf-fixes

9da7d91

Merge branch 'development' into philip-saf-fixes

a65eecb

aviator-app bot merged commit cedb4ef into tari-project:development Aug 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: update handling of SAF message propagation and deletion #3164

fix: update handling of SAF message propagation and deletion #3164

philipr-za commented Aug 5, 2021

SWvheerden left a comment

sdbondi left a comment •

edited

Loading

philipr-za commented Aug 10, 2021 •

edited

Loading

philipr-za commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 11, 2021

aviator-app bot commented Aug 11, 2021

fix: update handling of SAF message propagation and deletion #3164

fix: update handling of SAF message propagation and deletion #3164

Conversation

philipr-za commented Aug 5, 2021

Description

How Has This Been Tested?

Checklist:

SWvheerden left a comment

Choose a reason for hiding this comment

sdbondi left a comment • edited Loading

Choose a reason for hiding this comment

philipr-za commented Aug 10, 2021 • edited Loading

philipr-za commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 10, 2021

aviator-app bot commented Aug 11, 2021

aviator-app bot commented Aug 11, 2021

sdbondi left a comment •

edited

Loading

philipr-za commented Aug 10, 2021 •

edited

Loading