Skip to content

Commit

Permalink
tcp: RFC6298 compliant TCP RTO calculation
Browse files Browse the repository at this point in the history
Linux RTO calculation is adjusted to be RFC6298 Standard compliant.
MinRTO is no longer added to the computed RTO, RTO damping and
overestimation are decreased.

In RFC 6298 Standard TCP Retransmission Timeout (RTO) calculation the
calculated RTO is rounded up to the Minimum RTO (MinRTO), if it is less.
The Linux implementation as a discrepancy to the Standard basically
adds the defined MinRTO to the calculated RTO. When comparing both
approaches, the Linux calculation seems to perform worse for sender
limited TCP flows like Telnet, SSH or constant bit rate encoded
transmissions, especially for Round Trip Times (RTT) of 50ms to 800ms.

Compared to the Linux implementation the RFC 6298 proposed RTO
calculation performs better and more precise in adapting to current
network characteristics. Extensive measurements for bulk data did not
show a negative impact of the adjusted calculation.

Performance Comparison for sender-limited-flows:
Rate: 10Mbit/s, Delay: 200ms, Delay Variation: 10ms, Time between each
scheduled segment: 1s, Amount Data Segments: 300, Mean of 8 runs

Mean Response Waiting Time [milliseconds]

old 205.8  208.3  217.0  220.3  227.8  249.9  271.0  308.9
new 204.3  206.5  207.1  210.5  217.3  224.2  237.8  258.3

    0.5    1      1.5    2      3      5      7      10
Packet Error Rate [percent]

Detailed Analysis:
https://docs.google.com/document/d/1pKmPfnQb6fDK4qpiNVkN8cQyGE4wYDZukcuZfR-BnnM/edit?usp=sharing

Signed-off-by: Daniel Metz <[email protected]>
  • Loading branch information
danielmgit committed Jun 10, 2016
1 parent 3d5479e commit 578910c
Showing 1 changed file with 16 additions and 57 deletions.
73 changes: 16 additions & 57 deletions net/ipv4/tcp_input.c
Original file line number Diff line number Diff line change
Expand Up @@ -680,8 +680,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
/* Called to compute a smoothed rtt estimate. The data fed to this
* routine either comes from timestamps, or from segments that were
* known _not_ to have been retransmitted [see Karn/Partridge
* Proceedings SIGCOMM 87]. The algorithm is from the SIGCOMM 88
* piece by Van Jacobson.
* Proceedings SIGCOMM 87].
* NOTE: the next three routines used to be one big routine.
* To save cycles in the RFC 1323 implementation it was better to break
* it up into three procedures. -- erics
Expand All @@ -692,59 +691,20 @@ static void tcp_rtt_estimator(struct sock *sk, long mrtt_us)
long m = mrtt_us; /* RTT */
u32 srtt = tp->srtt_us;

/* The following amusing code comes from Jacobson's
* article in SIGCOMM '88. Note that rtt and mdev
* are scaled versions of rtt and mean deviation.
* This is designed to be as fast as possible
* m stands for "measurement".
*
* On a 1990 paper the rto value is changed to:
* RTO = rtt + 4 * mdev
*
* Funny. This algorithm seems to be very broken.
* These formulae increase RTO, when it should be decreased, increase
* too slowly, when it should be increased quickly, decrease too quickly
* etc. I guess in BSD RTO takes ONE value, so that it is absolutely
* does not matter how to _calculate_ it. Seems, it was trap
* that VJ failed to avoid. 8)
*/
if (srtt != 0) {
m -= (srtt >> 3); /* m is now error in rtt est */
srtt += m; /* rtt = 7/8 rtt + 1/8 new */
if (m < 0) {
m = -m; /* m is now abs(error) */
m -= (tp->mdev_us >> 2); /* similar update on mdev */
/* This is similar to one of Eifel findings.
* Eifel blocks mdev updates when rtt decreases.
* This solution is a bit different: we use finer gain
* for mdev in this case (alpha*beta).
* Like Eifel it also prevents growth of rto,
* but also it limits too fast rto decreases,
* happening in pure Eifel.
*/
if (m > 0)
m >>= 3;
} else {
m -= (tp->mdev_us >> 2); /* similar update on mdev */
}
tp->mdev_us += m; /* mdev = 3/4 mdev + 1/4 new */
if (tp->mdev_us > tp->mdev_max_us) {
tp->mdev_max_us = tp->mdev_us;
if (tp->mdev_max_us > tp->rttvar_us)
tp->rttvar_us = tp->mdev_max_us;
}
if (after(tp->snd_una, tp->rtt_seq)) {
if (tp->mdev_max_us < tp->rttvar_us)
tp->rttvar_us -= (tp->rttvar_us - tp->mdev_max_us) >> 2;
m -= (srtt >> 3); /* m' = m - srtt/8 = (R' - SRTT) */
srtt += m; /* srtt = srtt + m’ = srtt + m - srtt/8 */
if (m < 0)
m = -m; /* abs(m') */
m -= (tp->mdev_us >> 2); /* m'' = |m'| - mdev/4 */
tp->mdev_us += m; /* mdev = mdev + m'' */
tp->rttvar_us = tp->mdev_us;
if (after(tp->snd_una, tp->rtt_seq))
tp->rtt_seq = tp->snd_nxt;
tp->mdev_max_us = tcp_rto_min_us(sk);
}
} else {
/* no previous measure. */
srtt = m << 3; /* take the measured time to be rtt */
tp->mdev_us = m << 1; /* make sure rto = 3*rtt */
tp->rttvar_us = max(tp->mdev_us, tcp_rto_min_us(sk));
tp->mdev_max_us = tp->rttvar_us;
srtt = m << 3; /* srtt = rtt (but stored as * 8) */
tp->mdev_us = tp->rttvar_us = m << 1; /* = rtt/2 (as * 4) */
tp->rtt_seq = tp->snd_nxt;
}
tp->srtt_us = max(1U, srtt);
Expand Down Expand Up @@ -809,17 +769,16 @@ static void tcp_set_rto(struct sock *sk)
* is invisible. Actually, Linux-2.4 also generates erratic
* ACKs in some circumstances.
*/
inet_csk(sk)->icsk_rto = __tcp_set_rto(tp);

u32 min_rto = tcp_rto_min_us(sk);
if (((tp->srtt_us >> 3) + tp->rttvar_us) < min_rto)
inet_csk(sk)->icsk_rto = usecs_to_jiffies(min_rto);
else
inet_csk(sk)->icsk_rto = __tcp_set_rto(tp);
/* 2. Fixups made earlier cannot be right.
* If we do not estimate RTO correctly without them,
* all the algo is pure shit and should be replaced
* with correct one. It is exactly, which we pretend to do.
*/

/* NOTE: clamping at TCP_RTO_MIN is not required, current algo
* guarantees that rto is higher.
*/
tcp_bound_rto(sk);
}

Expand Down

0 comments on commit 578910c

Please sign in to comment.