You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dubbo's netty 3 server implementation does not enable TCP_NODELAY option, which causes the server side not responding in time when client side is in delayed ack mode and the response size is less than MSS.
Here is an example captured, with 10.5.160.181 as the client side and 10.5.169.180 as the server side.
3.1 Case with normal ack
The demo server side logic costs 10-11 ms, so normally, the client side response time is around 12 ms.
The frame 9 highlighted below is a normal request, whose request id is 0x02.
The frame 11 highlighted below is the response, whose response id is 0x02.
The actual response time is 12 ms.
Also we can see from the above screenshots that both client side and server side respond ack normally.
3.2 Case with delayed ack
Now let's take a look when client side is in delayed ack mode.
From the screen shots below, there is no single ack packet, which means it's in delayed ack mode - ack is returned along with a data packet.
The frame 239 highlighted below is a request, whose request id is 0x7f.
We can see the 240 packet responded immediately when 239 was sent to server, considering the server logic costs 10-11 ms, this packet is not the response.
As we can see, the frame 240's response id is 0x7e, which is the response to the previous request.
Then frame 241 was sent, whose request id is 0x80.
After frame 241 was sent, the response to request 0x7f was returned (frame 242)
Request 0x7f was sent in 14:37:53.267, so we know that the response was ready around 14:37:53.277 in the server side. However, since the server side did not enable TCP_NODELAY, so according to Nagel algorithm, the response cannot be sent until an ack to previous little packet is received or a timeout occurrs(40ms).
So the response is held in server side and is sent when another request is sent from client side, which was 14:37:53.292730.
In this case, the client side response time (25ms) is much slower than it should be (12ms).
Even worse, in this situation, the response time is determined by the interval in which the client side sends the request.
We've done some experiments and the result shows in this situation, that if client side sends requests in QPS 50, then the client side response time is 20ms, because the client side request sending interval is 20ms (1000/50). If client side QPS is 40, the the response time is 25ms. If client side qps is 30, then the response time is 33ms, etc.
The response time increases when the qps decreases, until the QPS is 25. Because when the sending interval is larger than 40ms(1000/25), tcp will cancel the delayed ack mode.
4. Conclusion
Based on the above example, we can see not enabling the option TCP_NODELAY causes the performance degration dramatically in such case, so I would hope this fix can be applied so that we could always expect a stable performance.
BTW, I could submit a PR if necessary.
The text was updated successfully, but these errors were encountered:
1. Issue Description
Dubbo's netty 3 server implementation does not enable
TCP_NODELAY
option, which causes the server side not responding in time when client side is in delayed ack mode and the response size is less than MSS.However, the netty 4 server implementation does enable this option.
Considering Netty 4 enables this option by default(Enable TCP_NODELAY and SCTP_NODELAY by default, Consider turning on TCP_NODELAY by default), dubbo's netty 3 server implementation should also enable this option by default.
2. Solution
Simply set this option when constructing ServerBootstrap:
3. Issue Demonstration
Here is an example captured, with 10.5.160.181 as the client side and 10.5.169.180 as the server side.
3.1 Case with normal ack
The demo server side logic costs 10-11 ms, so normally, the client side response time is around 12 ms.
The frame 9 highlighted below is a normal request, whose request id is 0x02.
The frame 11 highlighted below is the response, whose response id is 0x02.
The actual response time is 12 ms.
Also we can see from the above screenshots that both client side and server side respond
ack
normally.3.2 Case with delayed ack
Now let's take a look when client side is in delayed ack mode.
From the screen shots below, there is no single
ack
packet, which means it's in delayed ack mode -ack
is returned along with a data packet.The frame 239 highlighted below is a request, whose request id is 0x7f.
We can see the 240 packet responded immediately when 239 was sent to server, considering the server logic costs 10-11 ms, this packet is not the response.
As we can see, the frame 240's response id is 0x7e, which is the response to the previous request.
Then frame 241 was sent, whose request id is 0x80.
After frame 241 was sent, the response to request 0x7f was returned (frame 242)
Request 0x7f was sent in 14:37:53.267, so we know that the response was ready around 14:37:53.277 in the server side. However, since the server side did not enable
TCP_NODELAY
, so according to Nagel algorithm, the response cannot be sent until anack
to previous little packet is received or a timeout occurrs(40ms).So the response is held in server side and is sent when another request is sent from client side, which was 14:37:53.292730.
In this case, the client side response time (25ms) is much slower than it should be (12ms).
Even worse, in this situation, the response time is determined by the interval in which the client side sends the request.
We've done some experiments and the result shows in this situation, that if client side sends requests in QPS 50, then the client side response time is 20ms, because the client side request sending interval is 20ms (1000/50). If client side QPS is 40, the the response time is 25ms. If client side qps is 30, then the response time is 33ms, etc.
The response time increases when the qps decreases, until the QPS is 25. Because when the sending interval is larger than 40ms(1000/25), tcp will cancel the delayed ack mode.
4. Conclusion
Based on the above example, we can see not enabling the option
TCP_NODELAY
causes the performance degration dramatically in such case, so I would hope this fix can be applied so that we could always expect a stable performance.BTW, I could submit a PR if necessary.
The text was updated successfully, but these errors were encountered: