Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High latency on linux with delayed ack enabled #480

Closed
ansiwen opened this issue Oct 4, 2023 · 10 comments
Closed

High latency on linux with delayed ack enabled #480

ansiwen opened this issue Oct 4, 2023 · 10 comments

Comments

@ansiwen
Copy link

ansiwen commented Oct 4, 2023

I'm not really sure the issue is ocaml-tls, but it's my best guess for now.

I'm running a Cohttp server with TLS, and I was investigating a suspiciously high latency (~40ms roundtrip) over sub-ms network when sending requests in a persistent TLS connection from an arbitrary https client on Linux. The same did not appear, when doing the requests from MacOS. Looking at the network traces I saw the following issue:

Bildschirmfoto 2023-10-04 um 14 35 19

The delay between frame 36 and 37 comes from the delayed ack, that is activated by default on Linux. But the question is: why is the server sending only the first fragment of the response and then waiting for the ACK? After the ACK arrives at the server, it immediately sends the rest of the response (TLS segment 2 and 3) in frame 38. (That there are 3 segments is probably an Cohttp issue, and maybe not even a real issue.)

It does not seem to be the mirage TCP/IP stack, because I can also reproduce it when I build it with the unix socket interface.

If I enable TCP_QUICKACK on the client socket, the issue almost disappears, but sill the request-response needs two roundtrips instead of one, because the server side is waiting for the ACK of the first part of the response, before sending the rest.

The first TLS segment contains the HTTP response code line:
Bildschirmfoto 2023-10-04 um 14 36 34

Some versions:

ocaml 4.14.0
mirage                  3.10.8
tls-mirage              0.15.3
conduit-mirage          5.0.0
cohttp-mirage           5.0.0
@hannesm
Copy link
Member

hannesm commented Oct 4, 2023

Hi, first thanks for your report.

Since TLS does no buffering (the write / send_application_data immediately calls the underlying FLOW.write), I'm surprised by this behaviour. Would you mind to sprinkle some log messages around (in TLS and TCP) to figure out which function call is taking so much time? Alternatively, break into a debugger in the Unix version and figure out where the data is?

@hannesm
Copy link
Member

hannesm commented Oct 4, 2023

Thinking a bit more about that, my question is: what is in 36?

My intuition is that the TLS handshake is already established, now in 34 the client requested a resource. The server may do whatever computation is needed, and only when that is finished (webmachine, database fetching, serializing data, ...), a write is called. May it be that the server logic takes 40ms? Again, having log messages with precise timestamps in the tls layer whenever a write is executed (and on the underlying tcp layer, when a write is executed) may be useful to dig deeper.

@ansiwen
Copy link
Author

ansiwen commented Oct 4, 2023

Hi @hannesm, thanks for looking at this.

I can produce more logs for sure, but I will first point out some things that I know already now:

Yes, the TLS handshake is established already, it's a persistent TLS connection, in which several HTTP requests are sent sequentially.

Frame 36 is the first of three TLS segments which contains HTTP/1.1 200 OK\n\r, that is what I marked in the second screenshot. And then the server is really waiting for the ACK from the client, it's not computing anything. It sends the first TLS segment, waits for the ACK of that first segment, and then sends segment 2 and 3 (that is headers and body in this case) in the next frame (38). I know that, because if I disable the delayed ack on the client side (check out TCP_QUICKACK if you want to know what that is, it was new to me), the ACK is sent immediately, and then also the server responds with the rest of the HTTP response immediately without any waiting time. So what I don't understand is, which layer on the server side is waiting for the ACK from the client for frame 36 before sending the rest of the response is sent?

In the meantime I checked Cohttp, and the response function has a ~flush argument, that defaults to true and says "If flush is true, then every response chunk will be flushed to the network rather than being buffered.", so that explains why every chunk comes in its own segment. So switching flush to false will probably circumvent the problem.

Oh, I have an idea! Does the mirage TCP/IP stack also implement the Nagle's Algorithm? That would explain why I see it with both the mirage and the Linux TCP/IP stack, and it is known that the Nagle's Algorithm (disabled with the TCP_NODELAY socket option) and the delayed ACKs don't work well together, because the Nagle's Algorithm does wait for ACKs.

@hannesm
Copy link
Member

hannesm commented Oct 5, 2023

Thanks for your comment.

Oh, I have an idea! Does the mirage TCP/IP stack also implement the Nagle's Algorithm? That would explain why I see it with both the mirage and the Linux TCP/IP stack, and it is known that the Nagle's Algorithm (disabled with the TCP_NODELAY socket option) and the delayed ACKs don't work well together, because the Nagle's Algorithm does wait for ACKs.

You may want to ask on the mirage/mirage-tcpip repository, or on the MirageOS mailing list about the design and implementation of the mirage-tcpip stack.

@hannesm
Copy link
Member

hannesm commented Oct 5, 2023

To summarize:

  1. you can observe the behaviour with mirage-tcpip
  2. you can observe the behaviour with a Linux TCP/IP stack
  3. you can not observe the behaviour with a Linux TCP/IP stack when you set the TCP_NODELAY socket option (on the server side)
  4. you can not observe the behaviour with a client that either is macOS or delayed ACK is disabled (TCP_QUICKACK socket option on the client socket)

Is this a valid summary of the scenarios you tested?

@ansiwen
Copy link
Author

ansiwen commented Oct 5, 2023

Yes, 1, 2 and 4 are correct, sorry for my confusing prose style. 🙈
Number 3: that would be my guess now, but I haven't tried yet. (Can I easily set socket options with the mirage socket interface? Then I could quickly confirm that.)

I would now agree, that this is probably not related to the TLS stack. It's rather an issue of the not-disabled Nagle in both TCP stacks. Shall we close this issue and I find a better place for it?

@ansiwen
Copy link
Author

ansiwen commented Oct 5, 2023

Here are some logs:

Logs (Click me)
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [359] (state-in
                         ((handshake
                           ((session
                             ((TLS13
                               ((common_session_data13
                                 ((server_random
                                   "\031\141\2285z\180\138l\002\128\003\1894c!\175\237\141\127\251\012\003\025\217L\200\172\252\198o\144k")
                                  (client_random
                                   "@?(u1@\025\149\1882M}\209\024RD\216`\005\139w\017\011%O\248hQV\011\191\160")
                                  (peer_certificate_chain ())
                                  (peer_certificate ()) (trust_anchor ())
                                  (received_certificates ())
                                  (own_certificate (certificate))
                                  (own_private_key ("private key"))
                                  (own_name ()) (client_auth false)
                                  (master_secret
                                   "\007\236 U\131j\030\145n\128\171\244\219::\011 \240\\\137\248\246\001\184\150\012b\189\161\162\246;")
                                  (alpn_protocol ())))
                                (ciphersuite13 AES_128_GCM_SHA256)
                                (master_secret
                                 ((secret
                                   "\007\236 U\131j\030\145n\128\171\244\219::\011 \240\\\137\248\246\001\184\150\012b\189\161\162\246;")
                                  (cipher AES_128_GCM_SHA256) (hash sha256)))
                                (resumption_secret "") (state Established)
                                (resumed false)
                                (client_app_secret
                                 "O\150dG\rQ\244\255\230\027i\016\146\246\200\240w\168\170\018\1698\025\175_P\129.T\b>+")
                                (server_app_secret
                                 "4\194fo\239QEd=!\217\222\167\240\029{M\235\246\203\243C\167\170\246\230U\171\134~M\210")))))
                            (protocol_version TLS_1_3) (early_data_left 0)
                            (machina (Server13 Established13))
                            (config
                             ((ciphers
                               (AES_128_GCM_SHA256 AES_256_GCM_SHA384
                                CHACHA20_POLY1305_SHA256 AES_128_CCM_SHA256
                                ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
                                ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
                                ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256))
                              (protocol_versions (TLS_1_2 TLS_1_3))
                              (signature_algorithms
                               (ECDSA_SECP256R1_SHA256 ECDSA_SECP384R1_SHA384
                                ECDSA_SECP521R1_SHA512 ED25519
                                RSA_PSS_RSAENC_SHA256 RSA_PSS_RSAENC_SHA384
                                RSA_PSS_RSAENC_SHA512 RSA_PKCS1_SHA256
                                RSA_PKCS1_SHA384 RSA_PKCS1_SHA512
                                RSA_PKCS1_SHA224 ECDSA_SECP256R1_SHA1
                                RSA_PKCS1_SHA1 RSA_PKCS1_MD5))
                              (use_reneg false) (authenticator ())
                              (peer_name ())
                              (own_certificates
                               (Single ((certificate) "private key")))
                              (acceptable_cas ())
                              (session_cache SESSION_CACHE)
                              (ticket_cache TICKET_CACHE) (cached_session ())
                              (cached_ticket ()) (alpn_protocols ())
                              (groups
                               (X25519 P384 P256 P521 FFDHE2048 FFDHE3072
                                FFDHE4096 FFDHE6144 FFDHE8192))
                              (zero_rtt 0) (ip ())))
                            (hs_fragment "")))
                          (decryptor
                           (((sequence 6) (cipher_st <aead-state>))))
                          (encryptor
                           (((sequence 18) (cipher_st <aead-state>))))
                          (fragment "")))

17 03 03 00 69 13 75 85  c1 8a c3 80 b7 83 8b ab
ff 90 83 18 bb 48 e9 83  87 8b d9 4e d8 0e b3 8a
0b 4e 4d a5 a8 f5 2c fd  54 2a db 82 e7 aa e8 7b
ef 61 6e 06 b9 d3 30 6d  28 3d b4 47 6f 1d 6f 0e
45 32 d7 63 81 62 a0 2c  bb 20 e8 1a a2 01 06 9f
25 46 be c1 5d 29 c0 90  ee 06 52 f1 f0 9f 6e 7a
4b a4 81 f8 67 b4 a2 f0  ab 09 9f 0f 4d 17
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [360] (wire-in "")
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [361] (record-in
                         (((content_type APPLICATION_DATA) (version TLS_1_2))
                          "\019u\133\193\138\195\128\183\131\139\171\255\144\131\024\187H\233\131\135\139\217N\216\014\179\138\011NM\165\168\245,\253T*\219\130\231\170\232{\239an\006\185\2110m(=\180Go\029o\014E2\215c\129b\160,\187 \232\026\162\001\006\159%F\190\193])\192\144\238\006R\241\240\159nzK\164\129\248g\180\162\240\171\t\159\015M\023"))
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [362] (frame-in
                         (APPLICATION_DATA
                           "GET /api/v1/info HTTP/1.1\r\
                          \nHost: 127.0.0.1:8443\r\
                          \nUser-Agent: curl/8.0.1\r\
                          \nAccept: */*\r\
                          \n\r\
                          \n"))

47 45 54 20 2f 61 70 69  2f 76 31 2f 69 6e 66 6f
20 48 54 54 50 2f 31 2e  31 0d 0a 48 6f 73 74 3a
20 31 32 37 2e 30 2e 30  2e 31 3a 38 34 34 33 0d
0a 55 73 65 72 2d 41 67  65 6e 74 3a 20 63 75 72
6c 2f 38 2e 30 2e 31 0d  0a 41 63 63 65 70 74 3a
20 2a 2f 2a 0d 0a 0d 0a
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [363] (application-data-in "")
2023-10-05 17:04:40 -00:00: DBG [cohttp.lwt.server] [364] Handle request: ((headers ((Host 127.0.0.1:8443) (User-Agent curl/8.0.1) (Accept */*)))
 (meth GET) (scheme ()) (resource /api/v1/info) (version HTTP_1_1)
 (encoding Unknown)).
2023-10-05 17:04:40 -00:00: DBG [http.access] [365] IP of client is 192.168.1.100
2023-10-05 17:04:40 -00:00: DBG [http.access] [366] [3] serving //127.0.0.1:8443/api/v1/info.
2023-10-05 17:04:40 -00:00: INF [http.access] [367] request GET /api/v1/info
2023-10-05 17:04:40 -00:00: DBG [http.access] [368] request headers Host: 127.0.0.1:8443
User-Agent: curl/8.0.1
Accept: */*


2023-10-05 17:04:40 -00:00: INF [http.access] [369] response 200 response time 190us
2023-10-05 17:04:40 -00:00: DBG [http.access] [370] GET /api/v1/info path: v3b13, v3b12, v3b11, v3b10, v3b9, v3b8, v3b7, v3b6, v3b5, v3b4, v3b3, v3c3, v3c4, v3d4, v3e5, v3f6, v3g7, v3g8, v3h10, v3i12, v3l13, v3m16, v3n16, v3o16, v3o18

48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d
0a
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [371] (application-data-out "")
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [372] (frame-out (APPLICATION_DATA "HTTP/1.1 200 OK\r\n"))
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [373] (record-out
                         (APPLICATION_DATA
                          "\205\163S\160\005\165C\211G\128#\161\189\132\221\227\227v\192U[9\203Z\141-Lq\191%\232r\026\135"))

17 03 03 00 22 cd a3 53  a0 05 a5 43 d3 47 80 23
a1 bd 84 dd e3 e3 76 c0  55 5b 39 cb 5a 8d 2d 4c
71 bf 25 e8 72 1a 87
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [374] (wire-out "")
2023-10-05 17:04:40 -00:00: DBG [ipv4] [375] ip write: mtu is 1500, hdr_len is 20, size 59 payload len 0, needed_bytes 79

43 6f 6e 74 65 6e 74 2d  54 79 70 65 3a 20 61 70
70 6c 69 63 61 74 69 6f  6e 2f 6a 73 6f 6e 0d 0a
76 61 72 79 3a 20 41 63  63 65 70 74 2c 20 41 63
63 65 70 74 2d 45 6e 63  6f 64 69 6e 67 2c 20 41
63 63 65 70 74 2d 43 68  61 72 73 65 74 2c 20 41
63 63 65 70 74 2d 4c 61  6e 67 75 61 67 65 0d 0a
45 54 61 67 3a 20 22 37  62 61 62 36 32 35 31 30
65 30 35 63 33 33 32 37  33 35 36 32 34 62 63 37
61 35 38 35 61 33 30 22  0d 0a 44 61 74 65 3a 20
54 68 75 2c 20 30 35 20  4f 63 74 20 32 30 32 33
20 31 37 3a 30 34 3a 34  30 20 47 4d 54 0d 0a 63
6f 6e 74 65 6e 74 2d 6c  65 6e 67 74 68 3a 20 34
35 0d 0a 0d 0a
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [376] (application-data-out "")
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [377] (frame-out
                         (APPLICATION_DATA
                           "Content-Type: application/json\r\
                          \nvary: Accept, Accept-Encoding, Accept-Charset, Accept-Language\r\
                          \nETag: \"7bab62510e05c332735624bc7a585a30\"\r\
                          \nDate: Thu, 05 Oct 2023 17:04:40 GMT\r\
                          \ncontent-length: 45\r\
                          \n\r\
                          \n"))
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [378] (record-out
                         (APPLICATION_DATA
                           ")n\197\133\224m\199(8\127\031C~\169\2173\154\212\219V=\018#\182h\158\197\225?~\208\189\"Z\014~\140\006b\197\154\
                          \n8\163|\203RO\014o=W\232\
                          \n\127c\018\023\156\242R4\193v\136\"\026D\237\005*\029x`\028\246\030$\173\192mz\247\027@\144\023C\229\
                          \n\233\188g.\177\204\027\154>X\150=\202\138dN\r\240\230+\151+\170\153\012\194\247\1710\029\251\188/v\020\0020\130\188\181\160\024`E#\173S\242\130jU\144\027\rQ\198]\229`\193\171{\019@\217j\003Ww\233@\245OuGc\004\131\185|\007!\190\235\186{s\026\1699\137\014\232Ip\198 9\139:\130\153\154yV`\161jm\006!\179\1931\210\206\199\171\148\233${\234A"))

17 03 03 00 d6 29 6e c5  85 e0 6d c7 28 38 7f 1f
43 7e a9 d9 33 9a d4 db  56 3d 12 23 b6 68 9e c5
e1 3f 7e d0 bd 22 5a 0e  7e 8c 06 62 c5 9a 0a 38
a3 7c cb 52 4f 0e 6f 3d  57 e8 0a 7f 63 12 17 9c
f2 52 34 c1 76 88 22 1a  44 ed 05 2a 1d 78 60 1c
f6 1e 24 ad c0 6d 7a f7  1b 40 90 17 43 e5 0a e9
bc 67 2e b1 cc 1b 9a 3e  58 96 3d ca 8a 64 4e 0d
f0 e6 2b 97 2b aa 99 0c  c2 f7 ab 30 1d fb bc 2f
76 14 02 30 82 bc b5 a0  18 60 45 23 ad 53 f2 82
6a 55 90 1b 0d 51 c6 5d  e5 60 c1 ab 7b 13 40 d9
6a 03 57 77 e9 40 f5 4f  75 47 63 04 83 b9 7c 07
21 be eb ba 7b 73 1a a9  39 89 0e e8 49 70 c6 20
39 8b 3a 82 99 9a 79 56  60 a1 6a 6d 06 21 b3 c1
31 d2 ce c7 ab 94 e9 24  7b ea 41
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [379] (wire-out "")

7b 22 76 65 6e 64 6f 72  22 3a 22 4e 69 74 72 6f
6b 65 79 20 47 6d 62 48  22 2c 22 70 72 6f 64 75
63 74 22 3a 22 4e 65 74  48 53 4d 22 7d
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [380] (application-data-out "")
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [381] (frame-out
                         (APPLICATION_DATA
                          "{\"vendor\":\"Nitrokey GmbH\",\"product\":\"NetHSM\"}"))
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [382] (record-out
                         (APPLICATION_DATA
                           "\226\025\138\145\186=\185\r\240\235YA\254<\232\017\235\189A\192Q\190v\244\r\
                          \n\134W\149\003\154b|\188j\018<\002\144T\t[M\151mZ_q\218\027\029\183\201\213\027\228\223I\162\030s'"))

17 03 03 00 3e e2 19 8a  91 ba 3d b9 0d f0 eb 59
41 fe 3c e8 11 eb bd 41  c0 51 be 76 f4 0d 0a 86
57 95 03 9a 62 7c bc 6a  12 3c 02 90 54 09 5b 4d
97 6d 5a 5f 71 da 1b 1d  b7 c9 d5 1b e4 df 49 a2
1e 73 27
2023-10-05 17:04:40 -00:00: DBG [tls.tracing] [383] (wire-out "")
2023-10-05 17:04:40 -00:00: DBG [ipv4] [384] ip write: mtu is 1500, hdr_len is 20, size 20 payload len 0, needed_bytes 40
2023-10-05 17:04:41 -00:00: DBG [tcp.window] [385] sequence validation: seq=3941238200 range=3941238200[262140] res=true
2023-10-05 17:04:41 -00:00: DBG [tcp.state] [386] Established  - Recv_ack(2374477059) -> Established
2023-10-05 17:04:41 -00:00: DBG [ipv4] [387] ip write: mtu is 1500, hdr_len is 20, size 306 payload len 0, needed_bytes 326
2023-10-05 17:04:41 -00:00: DBG [tcp.window] [388] sequence validation: seq=3941238200 range=3941238200[262140] res=true
2023-10-05 17:04:41 -00:00: DBG [tcp.state] [389] Established  - Recv_ack(2374477345) -> Established
2023-10-05 17:04:41 -00:00: DBG [tcp.window] [390] sequence validation: seq=3941238200 range=3941238200[262140] res=true
2023-10-05 17:04:41 -00:00: DBG [tcp.state] [391] Established  - Recv_ack(2374477345) -> Established

@hannesm
Copy link
Member

hannesm commented Oct 6, 2023

Since, as you mentioned, this has something to do with "cohttp" and "TCP", I'll close this issue here.

@hannesm hannesm closed this as completed Oct 6, 2023
@hannesm
Copy link
Member

hannesm commented Oct 6, 2023

What I'd propose to proceed with, considering that clearly TCP_NODELAY is a decent socket option (as your issue has shown). Using different functions (write / write_nodelay) does not compose well (i.e. the TLS stack won't offer such a multitude of functions that are mostly the same, only calling some other function on the lower level).

Thus, maybe in the TCP interface we need a way to set socket options (or flow options) on a flow. Then, you could set the TCP_NODELAY before passing it to TLS, and the TCP stack could decide to use write / write_nodelay depending on the socket options configured.

So, to me the solution would be (I have not looked into cohttp at all):

  • propose a PR to mirage-tcpip (src/core/tcp.ml{i}) to set socket options (for now, NODELAY)
  • propose a PR to mirage-tcpip (src/tcp) to implement that interface change
  • figure out whether there are any users of write_nodelay, deprecate or remove it from the interface
  • figure out a way to set NODELAY in cohttp/conduit (i.e. either access the flow on the application side, or pass in some argument somewhere)

(take it or leave it, it is only what I'd propose -- feel free to find and propose other solutions)

@ansiwen
Copy link
Author

ansiwen commented Oct 6, 2023

@hannesm thanks for the guidelines. Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants