Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do proper congestion control before the first ACK has come in #1603

Merged
merged 16 commits into from
Jan 31, 2024

Conversation

larseggert
Copy link
Collaborator

@larseggert larseggert commented Jan 30, 2024

The small fix here is to neqo-transport/src/recovery.rs, to properly declare packets lost when a PTO fires before we have gotten a first ACK.

All the rest of the PR is to fix the various tests that had baked in the assumption that the initial cwnd wouldn't shrink before an ACK was received. The easiest way (= requiring the fewest changes) to do this was to insert PING frames into the handshake method used for the tests, causing ACKs to be emitted and preventing the PTO from firing. As a side effect, that also leaves a connection idle after the handshake, leading to simplifications. A few tests did their own force idle and required other fixes.

The idle_timeout_crazy_rtt test kept failing. The reason here was that there is a bug in our PTO calculation (in pto_period_inner) when the PTO count became large (> 26). I have an ugly, inefficient fix for this in the PR below, but we need a proper one, hence making this a draft PR.

CC @martinthomson

@larseggert larseggert linked an issue Jan 30, 2024 that may be closed by this pull request
@codecov-commenter
Copy link

codecov-commenter commented Jan 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (fbc5d62) 87.54% compared to head (019937b) 87.48%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1603      +/-   ##
==========================================
- Coverage   87.54%   87.48%   -0.07%     
==========================================
  Files         118      118              
  Lines       38608    38452     -156     
==========================================
- Hits        33798    33638     -160     
- Misses       4810     4814       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that #1605 will solve our problem with the PTO going to zero. I've also suggested a pretty big simplification to the test.

Overall though, this is a nice improvement.

let output = a.process(input.as_ref(), now).dgram();
if should_ping {
a.test_frame_writer = None;
did_ping[a.role()] = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I can work out your logic a bit better, because it doesn't seem like you need to ping on both peers for this to work.

I think that you can just have the server reach Confirmed and then force a PING into the packet it sends then. The client will acknowledge that immediately, but have nothing further to send.

This arrangement has the client send a PING as well as soon as the connection is complete. That happens when the state is Connected at the TLS layer, which corresponds with the Connected state for QUIC (I don't know why you need to check the TLS layer). The client is already sending a TLS Finished as a Handshake packet at that point, which causes the server to send HANDSHAKE_DONE, so you don't need to do anything to get the server to respond immediately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I saw was the client sending a bunch of new connection IDs to the server which remained unACK'ed (due to ACK delay being in effect). This change forces a PING into that packet, to trigger a server ACK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's right. I forget that I added that feature. Won't it be enough for the client to do that then? You can want for Confirmed and send a ping with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try. (I think i tried to do it client-only and there were some more test failures.)

Copy link
Collaborator Author

@larseggert larseggert Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that works, because the HandshakeDone doesn't elicit an immediate ACK. So when the handshake() method is done, the client still has the delayed ACK timer pending and is not idle.

There is a case to be made to send an immediate ACK on HandshakeDone, which would fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Well, it would fix it for a subset of tests, but many are still failing with a client-only PING and an immediate ACK on HandshakeDone.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK to merge this as-is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes. It's a bit ugly having the check for "Connected" on the client always, but I think you have it right. We can't rely on the server Connected signal and the server Confirmed happens after the ping. Thanks for checking.

@larseggert larseggert marked this pull request as ready for review January 31, 2024 09:37
@martinthomson martinthomson merged commit 5d54da7 into mozilla:main Jan 31, 2024
9 checks passed
@larseggert larseggert deleted the fix-pre-ack-cc branch February 1, 2024 05:53
@mxinden mxinden mentioned this pull request Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pto_period_inner can return 0ns
3 participants