Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge timeout in L1 QNS test #2088

Closed
larseggert opened this issue Sep 5, 2024 · 3 comments · Fixed by #2114 · May be fixed by #1998
Closed

Huge timeout in L1 QNS test #2088

larseggert opened this issue Sep 5, 2024 · 3 comments · Fixed by #2114 · May be fixed by #1998

Comments

@larseggert
Copy link
Collaborator

We're failing this L1 QNS test against the kwik server due to the test taking longer than 300s.

That seems to be because we are setting huge timeouts of ~90 seconds during the connection-close process, and we don't continue on to the next connection in a reasonable time.

client  | 93s 96ms DEBUG Setting timeout of 19.971247ms
client  | 93s 96ms DEBUG stream 0 complete
client  | 93s 96ms DEBUG stream BiDi creatable
client  | 93s 96ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 93s 96ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 93s 96ms DEBUG TX blocked, profile=SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 93s 96ms DEBUG [Client 2cc1e081e1fe5325] delay duration 19.923328ms
client  | 93s 96ms DEBUG Setting timeout of 19.923328ms
client  | 93s 96ms DEBUG [Client 2cc1e081e1fe5325] State change from Confirmed -> Closing { error: Application(0), timeout: Instant { tv_sec: 667, tv_nsec: 656536104 } }
client  | 93s 96ms WARN Unhandled event StateChange(Closing { error: Application(0), timeout: Instant { tv_sec: 667, tv_nsec: 656536104 } })
client  | 93s 96ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 93s 96ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 93s 96ms DEBUG [Recvd-ap] immediate_ack at Instant { tv_sec: 573, tv_nsec: 125867796 }
client  | 93s 96ms DEBUG [Client 2cc1e081e1fe5325] pn=3 type=Short pri-path:f734cc7d2e99a52a 0.0.0.0:42465->193.167.100.100:443 IpTos(Cs0, NotEct) len 42
client  |   TX -> Ack { largest_acknowledged: 2, ack_delay: 13, first_ack_range: 2, ack_ranges: [], ecn_count: None }
client  |   TX -> ConnectionClose { error_code: Application(0), frame_type: 0, reason_phrase: "kthxbye!" }
client  | 93s 96ms DEBUG Setting timeout of 94.530598598s
sim     | Forwarding 82 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 48/149 (32.2%)
sim     | Forwarding 1352 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 48/150 (32.0%)
sim     | Forwarding 31 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 48/151 (31.8%)
sim     | Dropping 42 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 49/152 (32.2%)
sim     | Forwarding 24 bytes 193.167.100.100:443 -> 193.167.0.100:42465, dropped 34/135 (25.2%)
client  | 93s146ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 93s146ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 93s146ms DEBUG [Recvd-ap] immediate_ack at Instant { tv_sec: 573, tv_nsec: 176033952 }
client  | 93s146ms DEBUG [Client 2cc1e081e1fe5325] pn=4 type=Short pri-path:f734cc7d2e99a52a 0.0.0.0:42465->193.167.100.100:443 IpTos(Cs0, NotEct) len 43
client  |   TX -> Ack { largest_acknowledged: 2, ack_delay: 6283, first_ack_range: 2, ack_ranges: [], ecn_count: None }
client  |   TX -> ConnectionClose { error_code: Application(0), frame_type: 0, reason_phrase: "kthxbye!" }
client  | 93s147ms DEBUG Setting timeout of 94.48035613s
client  | 93s147ms DEBUG Setting timeout of 94.480342525s
sim     | Forwarding 43 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 49/153 (32.0%)
sim     | Forwarding 29 bytes 193.167.100.100:443 -> 193.167.0.100:42465, dropped 34/136 (25.0%)
client  | 93s178ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 93s178ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 93s178ms DEBUG [Recvd-ap] immediate_ack at Instant { tv_sec: 573, tv_nsec: 207809755 }
client  | 93s178ms DEBUG [Client 2cc1e081e1fe5325] pn=5 type=Short pri-path:f734cc7d2e99a52a 0.0.0.0:42465->193.167.100.100:443 IpTos(Cs0, NotEct) len 43
client  |   TX -> Ack { largest_acknowledged: 2, ack_delay: 10255, first_ack_range: 2, ack_ranges: [], ecn_count: None }
client  |   TX -> ConnectionClose { error_code: Application(0), frame_type: 0, reason_phrase: "kthxbye!" }
client  | 93s178ms DEBUG Setting timeout of 94.448622416s
client  | 93s178ms DEBUG Setting timeout of 94.44861391s
sim     | Dropping 43 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 50/154 (32.5%)
sim     | Forwarding 21 bytes 193.167.100.100:443 -> 193.167.0.100:42465, dropped 34/137 (24.8%)
client  | 108s111ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 108s111ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 108s111ms DEBUG [Recvd-ap] immediate_ack at Instant { tv_sec: 588, tv_nsec: 140480140 }
client  | 108s111ms DEBUG [Client 2cc1e081e1fe5325] pn=6 type=Short pri-path:f734cc7d2e99a52a 0.0.0.0:42465->193.167.100.100:443 IpTos(Cs0, NotEct) len 45
client  |   TX -> Ack { largest_acknowledged: 2, ack_delay: 1876839, first_ack_range: 2, ack_ranges: [], ecn_count: None }
client  |   TX -> ConnectionClose { error_code: Application(0), frame_type: 0, reason_phrase: "kthxbye!" }
client  | 108s111ms DEBUG Setting timeout of 79.515901617s
client  | 108s111ms DEBUG Setting timeout of 79.515885948s
sim     | Dropping 21 bytes 193.167.100.100:443 -> 193.167.0.100:42465, dropped 35/138 (25.4%)
sim     | Dropping 45 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 51/155 (32.9%)
sim     | Forwarding 204 bytes 193.167.100.100:443 -> 193.167.0.100:42465, dropped 35/139 (25.2%)
client  | 108s773ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 108s773ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 108s773ms DEBUG [Recvd-ap] immediate_ack at Instant { tv_sec: 588, tv_nsec: 803084953 }
client  | 108s773ms DEBUG [Client 2cc1e081e1fe5325] pn=7 type=Short pri-path:f734cc7d2e99a52a 0.0.0.0:42465->193.167.100.100:443 IpTos(Cs0, NotEct) len 45
client  |   TX -> Ack { largest_acknowledged: 2, ack_delay: 1959665, first_ack_range: 2, ack_ranges: [], ecn_count: None }
client  |   TX -> ConnectionClose { error_code: Application(0), frame_type: 0, reason_phrase: "kthxbye!" }
client  | 108s774ms DEBUG Setting timeout of 78.853327351s
client  | 108s774ms DEBUG Setting timeout of 78.853304208s
sim     | Forwarding 204 bytes 193.167.100.100:443 -> 193.167.0.100:42465, dropped 35/140 (25.0%)
client  | 108s774ms DEBUG [Client 2cc1e081e1fe5325] output_path send_profile SendProfile { limit: 1152, pto: None, probe: (), paced: false }
client  | 108s774ms DEBUG Building Short dcid Some(CID [8]: f734cc7d2e99a52a)
client  | 108s774ms DEBUG [Recvd-ap] immediate_ack at Instant { tv_sec: 588, tv_nsec: 804030443 }
client  | 108s774ms DEBUG [Client 2cc1e081e1fe5325] pn=8 type=Short pri-path:f734cc7d2e99a52a 0.0.0.0:42465->193.167.100.100:443 IpTos(Cs0, NotEct) len 45
client  |   TX -> Ack { largest_acknowledged: 2, ack_delay: 1959783, first_ack_range: 2, ack_ranges: [], ecn_count: None }
client  |   TX -> ConnectionClose { error_code: Application(0), frame_type: 0, reason_phrase: "kthxbye!" }
client  | 108s774ms DEBUG Setting timeout of 78.852422597s
client  | 108s774ms DEBUG Setting timeout of 78.852411897s
sim     | Forwarding 45 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 51/156 (32.7%)
sim     | Forwarding 45 bytes 193.167.0.100:42465 -> 193.167.100.100:443, dropped 51/157 (32.5%)
server  | QLog: done with 9840ca8b42f950a3.qlog
server  | QLog: done with d58f290ebd9bcfa9d5.qlog
server  | QLog: done with 2cc1e081e1fe5325.qlog
client  | 187s628ms DEBUG [Client 2cc1e081e1fe5325] State change from Closing { error: Application(0), timeout: Instant { tv_sec: 667, tv_nsec: 656536104 } } -> Closed(Application(0))
client  | 187s628ms INFO Closing timer expired
client  | 187s628ms DEBUG Output::None
client  | 187s628ms WARN Unhandled event StateChange(Closed(Application(0)))
client  | 187s628ms DEBUG Timer fired while closed
client  | 187s628ms DEBUG Output::None
@larseggert larseggert changed the title Huuge timeout in L1 QNS test Huge timeout in L1 QNS test Sep 5, 2024
@larseggert
Copy link
Collaborator Author

I wonder if the stack has always done this, but whether the old demo client would not wait until a previous connection is completely gone before starting the next one. Any idea, @mxinden?

@larseggert
Copy link
Collaborator Author

Similar issue here, PTO of ~17 seconds:

client  | 206s733ms DEBUG [Client 710a2ea0c5c12d7ba3d2c3c5e0ab45] output_path send_profile SendProfile { limit: 1252, pto: None, probe: (), paced: false }
client  | 206s733ms DEBUG Building Short dcid Some(CID [20]: 04afedde3e680e1abe8bf7ef01165b6e88757819)
client  | 206s733ms DEBUG TX blocked, profile=SendProfile { limit: 1252, pto: None, probe: (), paced: false }
client  | 206s733ms DEBUG [Client 710a2ea0c5c12d7ba3d2c3c5e0ab45] delay duration 17.26252894s
client  | 206s733ms DEBUG Setting timeout of 17.26252894s
sim     | Dropping 1352 bytes 193.167.0.100:38464 -> 193.167.100.100:443, dropped 238/762 (31.2%)
client  | 223s998ms DEBUG [LossRecovery] PTO timer fired for ap

@mxinden
Copy link
Collaborator

mxinden commented Sep 6, 2024

but whether the old demo client would not wait until a previous connection is completely gone before starting the next one.

I believe it did.

if let Http3State::Closed(..) = client.state() {
return Ok(client.state());
}


To make sure I understand the above correctly, it is not that we should start a new connection while another is still closing, but that Neqo should be more aggressive in considering a connection lost, instead of trying to close it for 90+ seconds, right?

larseggert added a commit to larseggert/neqo that referenced this issue Sep 16, 2024
Because under lossy conditions (e.g., QNS `handshakeloss` test), the
guess can be multiple times the actual RTT, which when encoded in the
resumption ticket will cause an extremely slow second handshake, often
causing the test to time out.

Broken out of mozilla#1998
Fixes mozilla#2088
github-merge-queue bot pushed a commit that referenced this issue Sep 17, 2024
* fix: Don't encode large RTT guesses in tickets

Because under lossy conditions (e.g., QNS `handshakeloss` test), the
guess can be multiple times the actual RTT, which when encoded in the
resumption ticket will cause an extremely slow second handshake, often
causing the test to time out.

Broken out of #1998
Fixes #2088

* Fixes & tests

* Suggestion from @mxinden

* Fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants