-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: selective ack construction breaks on overflow #137
fix: selective ack construction breaks on overflow #137
Conversation
29f899e
to
f6cafbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, this doesn't fix the Sending packet larger than max size
error for me. The reason is not because of the packet numbers overflow but because of issue #138 which also leads to overflow when building the acks .
Anyways, this PR still improves on what you mentioned in the PR description.
src/recv.rs
Outdated
let mut present_packets = self.pending.keys().copied().collect::<HashSet<_>>(); | ||
|
||
let mut acked = vec![]; | ||
while !present_packets.is_empty() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of #138 and never decreasing last_ack
, this loop will run until we overflow, which creates a very large acked vector, here is a real example from my tests running this fix:
let last_ack = 47998
let present_packets = [47990, 47991]
// After the loop acked.len() is 65530.
Here, the flaw is that last_ack
is not expected to be bigger than the pending packets, but if we want to be more defensive, we can do something like
present_packets.retain(|&k| k >= last_ack);
before running the loop.
This, of course, will not solve the core problem, but at least it will get us rid of trying to send packets over the discv5 limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let last_ack = 47998
let present_packets = [47990, 47991]
I don't believe this case is possible
Here, the flaw is that last_ack is not expected to be bigger than the pending packets, but if we want to be more defensive, we can do something like
There is nothing wrong with last_ack
being bigger then a pending packet seq_num, that is intentional in the design. I don't think the "example case" give above is a valid test case
let last_ack = 47998
let present_packets = [47990, 47991]
Because in the example above last_ack
would be 47988 not 47998
present_packets.retain(|&k| k >= last_ack);
This would cause us to generate invalid selective acks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let last_ack = 47998 let present_packets = [47990, 47991]
I don't believe this case is possible
This is happening right now in our implementation, it is not an imaginary example.
You are saying that this case is not possible but then you are saying that there is "nothing wrong last_ack
being bigger than a pending packet seq_num, how both can be true, if we DO NOT overflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let last_ack = 47998 let present_packets = [47990, 47991]
I don't believe this case is possible
This is happening right now in our implementation, it is not an imaginary example.
Can logs be posted then?
You are saying that this case is not possible but then you are saying that there is "nothing wrong
last_ack
being bigger than a pending packet seq_num, how both can be true?
last_ack is Returns the last sequence number in a contiguous sequence from the initial sequence number.
, the example case above shows a divergence of 2^16-8 packets.
How can a client have received packets
47998, 2^16+47990, 2^16+47991, but nothing in between. I don't think they can as that is bigger then the send_window. Why would the other client keep sending packets until 2^16+47990, if we didn't ack 2^16-8 packets already.
If this is a problem, then what we can do is clamp the loop to something like 1k,
The max amount of bytes in the send_window is 1048576 or 1MB, 1048576/(960 max packet size we send)=1093 packets in the air, so I am not sure how we get 2^16-8
packets in the air, hence the example is invalid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is very confusing though, with the clamp I added on the total selective ack size it will fix us creating a packet of 8k bytes, but that is super confusing why have 2^16 packets in the air nice find! Hopefully we fix that soon,
Our client can currently only send 2^16 data packets we might as well fix that as well I guess.
fa4ee0e
to
14a0f0d
Compare
14a0f0d
to
c94c80c
Compare
What was wrong?
I am benchmarking trin and I noticed, (this was happening in the wild, but there wasn't a reliable way to reproduce it until now)
We were trying to send packets of over 8000 bytes which was causing us to drop our discv5 peer leading to the discv5 handshake being redone, which in result causes failures.
In short the bug was if we were sending a selective ack and the packets we were confirming as present overflowed we would linearly mark 2^16 as false, which is invalid the selective ack is supposed to be
So start at ack_num + 2 and onward for received.
How was it fixed?
Looping through the pending keys is flawed as it is sorted sequentially, which means if we are overflowing which we should do the code is set to while loop till it finds the sequence number matching our next ack, this doesn't account for the cyclical property of the receive buffer and will spam the selective ack with garbage falses.
Instead we want to loop from ack_num+2, and check if the sequence number is available to mark the bit true.