-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: update handling of SAF message propagation and deletion #3164
fix: update handling of SAF message propagation and deletion #3164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, this is a good fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - I had assumed that we keep the broadcast strats the same and just make closest send to the dest peer if a connection is available rather than creating a new one broadcast strat. (Perhaps a miscommunication, I had thought maybe a DirectOrClosest would be needed on the connectivity_manager.select_connections
call, not as a broadcast strat)
Any reason/use-case that we need to keep the previous Closest
behaviour around?
A quick look, I can't see a reason to have a new strat. If that is the case, I'd say rather fix ClosestNodes
to send directly if connected.
Cool, I was a bit hesitant to assume there was no use for it so I left it in as is, in case, but if you think it can be cut I will take it out. I think I will leave it called DirectOrClosestNodes just so that the name makes the behaviour clear. |
This PR adds two changes to the way SAF messages are handled to fix two subtle bugs spotted while developing cucumber tests. The first issue was that when a Node propagates a SAF message it was storing to other nodes in its neighbourhood the broadcast strategy it was using only chose from currently connected base nodes. This meant that if the Node had an active connection to a Communication Client (wallet) it would not just directly send the SAF message to that client but to other base nodes in the network region. This meant that the wallet would only receive new SAF message when it actively requested them on connection even though it was directly connected to the node. This PR adds a new broadcast strategy called `DirectOrClosestNodes` which will first check if the node has a direct active connection and if it does just send the SAF message directly to its destination. The second issue was a subtle problem where when a node starts to send SAF messages to a destination it would remove the messages from the database based only on whether the outbound messages were put onto the outbound message pipeline. The problem occurs when the TCP connection to that peer is actually broken the sending of those messages would fail at the end of the pipeline but the SAF messages were already deleted from the database. This PR changes the way SAF messages are deleted. When a client asks a node for SAF message it will also provide a timestamp of the most recent SAF message it has received. The Node will then send all SAF messages since that timestamp that it has for the node and will delete all SAF messages from before the specified Timestamp. This serves as a form of Ack that the client has received the older messages at some point and they are no longer needed.
15cf624
to
03c5146
Compare
So I looked at this a bit and I noted that the Join process still kind of needs the ClosestNodes without the direct option. I also think that perhaps there might be other applications where sending to a neighbourhood and not doing the direct case might be useful. The Join process might need an update in the future when we move to k-buckets but I am going to leave that considering for the future. |
PR queued successfully. Your position in queue is: 3 |
PR is on top of the queue now |
# Based on tari-project#3164 This PR addresses the following scenario spotted by @stanimal: - NodeA sends to nodeB(offline) - NodeA goes offline - NodeB receives tx, and cancels it (weird I know) - NodeA comes online and broadcasts the transaction - NodeB is not aware of the transaction, transaction complete for NodeA This is handled by adding logic that if a FinalizedTransaction is received with no active Receive Protocols that the database is checked if there is a matching cancelled inbound transaction from the same pubkey. If there is the receiver might as well restart that protocol and accept the finalized transaction. A cucumber test is provided to test this case. This required adding in functionality to the Transaction and Output Manager service to reinstate a cancelled inbound transaction, unit tests provided for that.
PR failed to merge with reason: Some CI status(es) failed |
PR queued successfully. Your position in queue is: 1 |
PR failed to merge with reason: Some CI status(es) failed |
PR queued successfully. Your position in queue is: 2 |
PR is on top of the queue now |
Description
This PR adds two changes to the way SAF messages are handled to fix two subtle bugs spotted while developing cucumber tests.
The first issue was that when a Node propagates a SAF message it was storing to other nodes in its neighbourhood the broadcast strategy it was using only chose from currently connected base nodes. This meant that if the Node had an active connection to a Communication Client (wallet) it would not just directly send the SAF message to that client but to other base nodes in the network region. This meant that the wallet would only receive new SAF message when it actively requested them on connection even though it was directly connected to the node.
This PR adds a new broadcast strategy called
DirectOrClosestNodes
which will first check if the node has a direct active connection and if it does just send the SAF message directly to its destination.The second issue was a subtle problem where when a node starts to send SAF messages to a destination it would remove the messages from the database based only on whether the outbound messages were put onto the outbound message pipeline. The problem occurs when the TCP connection to that peer is actually broken the sending of those messages would fail at the end of the pipeline but the SAF messages were already deleted from the database.
This PR changes the way SAF messages are deleted. When a client asks a node for SAF message it will also provide a timestamp of the most recent SAF message it has received. The Node will then send all SAF messages since that timestamp that it has for the node and will delete all SAF messages from before the specified Timestamp. This serves as a form of Ack that the client has received the older messages at some point and they are no longer needed.
How Has This Been Tested?
Unit tests have been updated to test this functionality.
Checklist:
development
branch.