Internet DRAFT - draft-hughes-restart
draft-hughes-restart
INTERNET-DRAFT Amy S. Hughes, Joe Touch, John Heidemann
draft-hughes-restart-00.txt ISI
Dec. 1, 2001
Expires: Jun. 1, 2002
Issues in TCP Slow-Start Restart After Idle
Status of this Memo
This document is an Internet-Draft and is NOT offered in accordance
with Section 10 of RFC2026, and the author does not provide the IETF
with any rights other than to publish as an Internet-Draft.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
The distribution of this document is unlimited.
Abstract
This draft discusses variations in the TCP 'slow-start restart' (SSR)
algorithm, and the unintended failure of some variations to properly
restart in some environments. SSR is intended to avoid line-rate
bursts after idle periods, where TCP accumulates permission to send
in the form of ACKs, but does not consume that permission
immediately. SSR's original "restart after send is idle" is commonly
implemented as "restart after receive is idle". The latter
unintentionally fails to restart for bidirectional connections where
the sender's burst is triggered by a reverse-path data packet, such
as in persistent HTTP. Both the former and latter are shown to permit
bursts in other circumstances. Several solutions are discussed, and
their implementations evaluated.
This document updates draft-ietf-tcpimpl-restart-01.txt. It is a
product of the LSAM, X-Bone, and DynaBone projects at ISI. Comments
are solicited and should be addressed to the authors.
Expires Jun. 1, 2002 [Page 1]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Introduction
Slow-Start Restart (SSR) describes one TCP behavior to respond to
long sending pauses in an open connection. When a sender becomes
idle, the normal ACK-clocking mechanism which regulates traffic is no
longer present and the sender may introduce a burst of packets into
the network as large as the current congestion window (CWND). Such a
burst may be too large for the intermediate routers to handle and may
be too large for the receiver to handle at one time as well.
A send timer was first proposed [JK92] to detect idle sending
periods; the recommended response is to close the congestion window
and perform a new slow-start. However, a footnote to this first
proposed solution noted that send/receive symmetry on the channel
meant that a receive timer could be used instead to achieve the same
results. As this second solution takes advantage of a timer that is
already required (to detect packet loss) it was implemented by
Jacobson and Karels. This solution has been repeated in
implementations which derive from their work.
Bursty connections, such as the persistent connections required in
HTTP/1.1 [FGMFB97] have been found to interact in meaningful ways
with SSR [Hei97]. In fact, it was discovered that SSR never occurs
with HTTP/1.1 [Poo97]. This is because a new request will reset the
receive timer (as suggested in the footnote in [JK92]) and the
sending pause will not be detected [Tou97].
Further, both timer solutions depend on the retransmit timeout (RTO)
and cannot detect send pauses that are shorter than this duration.
In such cases, the sender may transmit a burst as large as the full
congestion window.
Burst Detection.
There are several ways of determining whether a connection is at risk
of sending a burst of packets into the channel. We will discuss each
method below, from the least radical to the most radical.
Receive Timer:
The use of a receive timer is the most common burst detection method.
It is attractive because it is simple and makes use of an existing
timer. However, a receive timer does not properly detect bursts in
HTTP/1.1 because the timer is cancelled when the request packet is
received. Further, when the connection is idle for less than a full
RTO, a burst cannot be detected. Such a burst can happen when the
connection is "nearly idle" or when ACKs are lost or reordered.
Expires Jun. 1, 2002 [Page 2]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Send Timer:
A send timer is the reciprocal solution to using a receive timer.
While it requires a new timestamp field to be maintained, it clearly
detects send pauses and corrects the problem presented by HTTP/1.1.
However, as with the receive timer, it cannot detect bursts that
could happen before a full RTO.
Packet Counting:
An alternative method examines the unused portion of the congestion
window to determine if the capacity to burst exists. This method is
simple, it uses existing information to make its decision, and it
solves both the HTTP/1.1. problem as well as the RTO problem. In
addition, it addresses the problem that needs to be solved (bursts)
instead of a specific circumstance where the problem could happen
(send pauses). However, where timer detection avoids defining a
burst (it defines idle periods instead), here a burst must be defined
before it can be detected. One possible definition is the situation
where the available portion of the sending window is some proportion
of the entire congestion window, say 50%. Another definition places
a numerical limit on the available portion of the congestion window,
say 4 or CWND-1 packets.
Burst Response
Once a burst is detected, there are several different ways to take
action. The different possibilities are listed below, again from
least to most radical.
Full Restart:
Reducing the congestion window to one packet and re-entering slow-
start, the original slow-start restart, is one response. This was
the solution proposed by Jacobson and Karels. This is a very
conservative response and it defeats most of the speedup that
HTTP/1.1 provides [HOT97]. Current proposals [FAP97] have suggested
increasing the initial window from 1 packet to 4 packets. Further,
depending on the method of burst detection, Full Restart can be far
more punitive than it should be. Coupled with a timer, full restart
is most likely to respond to a completely empty congestion window.
Coupled with Packet Counting, the response could close the window too
far, even smaller than the amount of outstanding data.
Window Limiting:
This is a modified version of Full Restart which solves the problem
created by using Packet Counting to detect bursts. With this type of
Expires Jun. 1, 2002 [Page 3]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
response, the congestion window is reduced to the amount of
outstanding data plus the slow-start initial window (1, 2, or 4). It
works exactly like Full Restart in the idle case, but is successful
at controlling bursts in an active connection. Further, in an active
connection, it effectively implements a leaky bucket of the initial
window size for the accumulation of send opportunity based on the
receipt of ACKs. This solution is fairly conservative, especially as
it defaults to Full Restart, but more importantly, sending
opportunity is simply lost if not used, and is not available for
paced output. Also, it forces negative congestion feedback on the
congestion window.
Burst Size Limitation:
When a burst is detected, its effects are limited, the sender may not
send any more than a preset number of packets into the network. This
is less conservative than the first two responses in that it does not
affect the size of the congestion window, and it is simple to
implement, simply count up the number of packets you can send and
stop when you reach the limit. The burst count is reset according to
some policy, such as when ACKs are received, each time send is
called, or when some timer triggers. The behavior is determined by
how these parameters are combined to reset the count.
Pacing:
When a burst is detected, packets are periodically sent into the
network until the sender starts receiving acks and normal maintenance
can be resumed [VH97,PK98]. This solution is very easy on the
network and scales well in cases of high bw/delay. However, it
requires a new timer and additional research for parameter tuning.
Implemented Solutions
Now we will examine combinations of the different detection and
response methods presented above. Each of the solutions below have
been implemented in some form.
BSD Implementation (Jacobson and Karels)
The most common implementation uses a receive timer coupled with Full
Restart. This is the implementation that causes the interaction
problems with HTTP/1.1. The obvious alternative is to implement a
send timer as originally intended and use Full Restart. There are
several drawbacks to this solution. First, a send timer adds
additional state and serves no purpose other than to correct the
bursting behavior after send pauses. Second, forcing a slow-start in
this situation is problematic for HTTP/1.1. A slow-start for each
Expires Jun. 1, 2002 [Page 4]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
new user request adds a delay burden to characteristically small HTTP
responses. Further, the HTTP user request pattern is unpredictable.
It is possible for the user to make a new request before the send
timer expires, triggering a burst that would defeat such a timer.
Maximum Burst Limitation (Floyd)
Floyd has proposed a coupling of Packet Counting with Burst Size
Limitation. It was developed to avoid bursts caused by gaps in the
ACK stream. Such gaps cause the send window to slide forward in
jumps, rather than smoothly by 1-2 MTUs; subsequent send calls can
thus generate bursts. This solution has been implemented in "ns" and
it prevents the sender from transmitting a series of back-to-back
packets larger than the user configured burst limit (suggested to be
5 packets) [NS97].
Maxburst defines a new function, send_much, with a parameter,
maxburst, which is called every time an external event occurs, such
as ACK is received or a timeout occurs. Each time send_much is
called, at most maxburst packets are emitted. This forces interleaved
processing of ACKs and sending of data. Sends initiated by the
arrival of data in the buffer are not affected; as a result, if there
is less than RTO of delay, and the send buffer is filled by the
application, Maxburst can still send a burst the size of an entire
window.
Maxburst does not explicitly address the sending of bursts after an
idle period; it relies on the existing mechanism to collapse CWND
after an RTO, because send_much does not replace send as the
application interface to transmitting data.
The limit of 5 packets is designed to allow non-sequental ACK losses,
and allow the CWND to grow normally under slow-start. Here is a list
of the possible burst limits, and the need for each:
1 packet per ACK or nothing gets done
2 packets per ACK if ACKs are compressed
3 packets during window growth; 2 for the compressed ACK, 1 for
the growth
4 packets per ACK received if ACKs are lost (not back-to-back),
and ACKs are compressed
5 packets per ACK received if ACKs are lost (not back-to-back),
ACKs are compressed, and the window is growing
Expires Jun. 1, 2002 [Page 5]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Use it or lose it (UI/LI) (Hughes, Touch, and Heidemann)
One proposed solution combines Packet Counting with Window Limiting,
and the IETF community refers to it as "Use-It-or-Lose-It" (it was
originally called Congestion Window Monitoring, CWM, but we will use
the acronym UI/LI, pronounced 'wee-lee', here). Whenever (CWND -
outstanding data > 4), we reduce CWND to (outstanding data + 4). The
choice of 4 packets is discussed in with the implementation details
below. UI/LI allows the congestion window to grow normally but
shrinks the congestion window as the sender becomes idle. It also
prevents the sender from transmitting any bursts larger than 4
packets in response to a new request. Because UI/LI is not dependent
on any timers, the loss of an ACK or a nearly idle connection cannot
cause any bursts. UI/LI is similar to Maxburst, but avoids the burst
by reducing CWND, rather than by inhibiting the sends directly. As a
result, we avoid the potential problem of sequential calls to
"tcp_output", which would cause bursts in Maxburst. UI/LI also
causes TCP to use the feedback of 'not using the CWND fast enough',
which results in a decrease in the CWND.
UI/LI effectively imposes a leaky bucket type limitation on the
congestion window. The window is allowed to grow and be managed
normally but the sender is not allowed to save up any sending
opportunities. Any opportunity that is not used is lost. This
property of UI/LI forces interleaved reception of ACKs and processing
of sends.
Burst-or-lose (Touch, Floyd)
We have considered a modified version of Maxburst, in which all
sends, including those initiated by data arrival from the
application, limit their burst size each time they are called, and
reset their flag at that time. We call this burst-or-lose (BOL). The
result is very similar to UI/LI, but the bucket reflects only the
permission to send created as the result of an ACK receipt, rather
than the overall limit of the congestion window.
In this case, it is unncessary to collapse CWND after an RTO. The
primary reason for the collapse is to avoid bursts; BOL avoids bursts
completely, but recovers as quickly as the ACK clocking can be
restored.
In the case when there is no data to send, multiple ACK arrivals do
not increase the bucket available for sending. This avoids the bursts
after idle periods. In the case when ACKs are lost, the remaining ACK
generates permission to send only a fixed amount of data. The
application cannot defeat this bucket by calling send multiple times.
Expires Jun. 1, 2002 [Page 6]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Rate Based Pacing (Visweswaraiah and Heidemann)
Rate Based Pacing (RBP) combines the Pacing response with either a
Send Timer or Packet Counting. It avoids slow-start when resuming
after sending pauses and allows the normal clocking of packets to be
gracefully restarted. When a burst potential is detected, the
algorithm meters a small burst of packets into the channel [VH97].
RBP is the least conservative solution to the bursting problem
because it continues to make use of the pre-pause congestion window.
If network conditions have changed significantly, maintaining the
previous window could cause the paced connection to be overly
aggressive as compared to other connections. (Although some work
suggests congestion windows are stable over multi-minute timeframes
[BSSK97].) More recently pacing has been suggested for use in
wireless networking scenarios [BPK97], and for satellite connections.
Solution Comparison
Below we present a comparison of the solutions presented above, plus
the original Send Timer solution. The Send Timer solution is not in
use, but we implemented it in FreeBSD for comparison purposes.
We contrast each solution on four criteria. A "yes" in the "HTTP
Burst" column indicates that this solution will solve the bursting
problem created by HTTP/1.1. A "yes" in the "Other Burst" column
indicates that the solution will prevent bursts in other situations
as well. A "yes" in the "Adj CWND" column indicates that the
solution involves adjusting the value of CWND. The final column,
"Code Size", gives the number of lines of code needed to correctly
implement the solution.
|
| HTTP Burst Other Burst Adj CWND Code Size
-----------+-------------------------------------------------
Send Timer | yes no yes ~5 lines
-----------+-------------------------------------------------
Rcv Timer | no no yes 3 lines
-----------+-------------------------------------------------
Maxburst | yes yes no ~5 lines
-----------+-------------------------------------------------
UI/LI | yes yes yes 3 lines
-----------+-------------------------------------------------
BOL | yes yes no ~5 lines
-----------+-------------------------------------------------
RBP | yes yes no ~30 lines
Expires Jun. 1, 2002 [Page 7]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
There are other conditions under which to compare these algorithms,
including behavior after a single ACK loss, behavior after multiple ACK
losses (sequential and non-sequential), and the window regrowth proper-
ties. These are outside the scope of this document.
|
| Recovery rate
-----------+-------------------------------------------------
Send Timer | SS then CA
-----------+-------------------------------------------------
Rcv Timer | SS then CA
-----------+-------------------------------------------------
Maxburst | (not applicable)
-----------+-------------------------------------------------
UI/LI | SS and or CA, depending on SSTHRESH
-----------+-------------------------------------------------
BOL | (not applicable)
-----------+-------------------------------------------------
RBP | 1 RTT
The only significant difference between UI/LI and BOL is whether CWND is
affected, and thus the recovery period after the loss of an ACK. UI/LI
interprets ACK loss as an event warranting traditional TCP window recov-
ery, whereas BOL and Maxburst do not. Both UI/LI and BOL avoid the need
for idle detection for the purpose of CWND adjustment, and implement a
type of leaky bucket. Although UI/LI was intended as a type of leaky-
bucket system, BOL is much more directly analogous.
Experimental Comparisons
Packet traces of the current FreeBSD implementation of SSR (using the
receive timer), of a modified version of FreeBSD using a send timer, and
of UI/LI with HTTP/1.1 support the above observations. The graphs below
show traces of two HTTP/1.1 requests. The first request opens the CWND.
How the server responds to the packet containing the second request
reveals the differences between implementations.
The first graph shows the trace with FreeBSD v2.2.6, but the relevant
portions of the code have not been altered in more current releases. In
response to the first request, the server performs slow-start normally.
After 10 seconds, and after the retransmission timeout (RTO), the second
request arrives. At this point, receipt of the second request resets
the receive timer. The server, under the belief that communication has
not gone idle, sends a CWND-sized burst of packets into the network.
Expires Jun. 1, 2002 [Page 8]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Idle trace with Receive Timer
sequence number (in bytes)
+------+-------+------+-------+------+-------+------+D-A---+
+ + + + + + + DDAA |
100000 ++ Client_ACK A DDAA ++
| Client_SEG C D A A |
| Server_SEG D D A |
80000 ++ D A ++
| D A |
| D |
60000 ++ D AA D ++
| D A |
| D A |
| D ADDAA |
40000 ++ D A ++
| D A |
| D AAD A |
20000 ++ D A ++
| D A A |
+ +D AAD A+ + + + + + |
0 ++A-D-A+-------+------+-------+------+-C-----+------+-----++
0 2 4 6 8 10 12 14
time since first SYN (in seconds)
The second graph shows the trace with FreeBSD v2.2.6 altered to use a
send timer to detect idle communication. As expected, the server
responds to the first request with normal slow-start. Now, the receipt
of the second request comes in after the RTO and the idle period is cor-
rectly detected. In response, the server re-enters slow-start for the
second response.
Expires Jun. 1, 2002 [Page 9]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Idle trace with Send Timer
sequence number (in bytes)
+-------------+-------------+------------+-------------+DAAA
+ + + + D AAAA|
100000 ++ Client_ACK A DADA ++
| Client_SEG C DA A |
| Server_SEG D D AAD |
80000 ++ DAAA ++
| D AAD |
| D AAD |
60000 ++ D AA D AA AA ++
| D AA |
| DAAA |
| D AAAA |
40000 ++ D AA ++
| DADA |
| D AAAA |
20000 ++ DADA ++
| DDAAD |
+ DAAAA A + + + |
0 ++----AADAA---+-------------+--------CC--+-------------+--++
0 5 10 15 20
time since first SYN (in seconds)
The third graph shows the trace with FreeBSD altered to use UI/LI to
limit bursts. Again, the first request and response are normal. This
shows that UI/LI allows the congestion window to grow normally. With
the second request we see the main difference of UI/LI. Again, the sec-
ond request comes after the RTO, and here we see the main difference
with UI/LI. In response to the second request, the server does not send
a burst the size of CWND or enter slow-start. Instead, it sends several
small 4-packet bursts, with the CWND growing normally from this point.
Expires Jun. 1, 2002 [Page 10]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Idle trace with UI/LI
sequence number (in bytes)
+-------+------+-------+------+-------+------+------DD-A---+
+ + + + + + + D+A |
100000 ++ Client_ACK A D A A ++
| Client_SEG C DDAA |
| Server_SEG D D A |
80000 ++ DD AD ++
| D AD A |
| D AD AA |
60000 ++ D A D ++
| DDA A |
| D A |
| D AD A |
40000 ++ DDAA ++
| D A |
| DDAAA |
20000 ++ D AD A ++
| D A |
+ D A D AAA + + + + + |
0 ++A-D-A-D------+-------+------+---C---+------+-------+----++
0 2 4 6 8 10 12 14
time since first SYN (in seconds)
Note, RTO is the usual timer limit, but any value would have the same
results, depending on when the second request was presented in relation
to the timer. If the second request arrives after the timer expires,
the reponse will behave as presented in these graphs. If the second
request arrives before the timer expires and after all outstanding pack-
ets have been acknowledged, systems with a send or receive timer will
send a CWND-sized burst. If the second request arrives before the last
ACK but after the last packet sent, the server has the potential to send
a burst no larger than CWND. UI/LI will limit any burst to 4 packets
regardless of the timing of the second request.
Implementation of UI/LI
UI/LI requires a simple modification to existing TCP output routines.
The changes required replace the current idle detection code. Below is
the suggested pseudocode from Appendix C of [JK92]:
int idle = (snd_max == snd_una);
if (idle && now - lastrcv >= rto)
cwnd = 1;
UI/LI would replace that code as below:
Expires Jun. 1, 2002 [Page 11]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
int idle = (snd_max == snd_una);
int maxwin = 4 + snd_nxt - snd_una;
if (cwnd > maxwin)
cwnd = maxwin;
Packet counting is implemented by line 2. Lines 3 and 4 implement
Window Limitation. Note that the comparison in line 1 (defining
"idle" [JK92]) is required for later Silly Window Syndrome avoidance
and could now be moved there.
We have implemented UI/LI in FreeBSD. The affected code is in the
file /sys/netinet/tcp_output.c. The lines:
idle = (tp->snd_max == tp->snd_una);
if (idle && tp->t_idle >= tp->t_rxtcur)
tp->snd_cwnd = tp->t_maxseg;
Are replaced with:
idle = (tp->snd_max == tp->snd_una);
maxwin = (4 * tp->t_maxseg) + tp->snd_nxt - tp->snd_una;
if (tp->snd_cwnd > maxwin)
tp->snd_cwnd = maxwin;
The choice of limiting the available congestion window to 4 packets
is based on the normal operation of TCP. An ACK received by the
sender may be in response to the receipt of 2 packets, allowing
another 2 to be sent. Further, normal window growth may require the
sending of a third packet. Lastly, in slow-start with delayed ACKs,
the receipt of an ACK can trigger the sending of 4 packets. Thus, 4
packets is a reasonable burst to send into the network.
Increasing the initial window in slow-start to 4 packets has already
been proposed [FAP97]. The effects of this change have been explored
in simulation in [PN98] and in practice in [AHO98]. Such a modifica-
tion to TCP would cause the same behavior as our solution in the
cases where the pause timer has expired. It does not address the
pre-timeout bursting situation we are concerned with.
Implementation of BOL
BOL can be implemented with a small amount of additional state, and a
small number of lines in TCP's input, output, and timer routines. The
state indicates the number of packets that can be emitted before the
next timer or ACK receipt. The other lines increment the state when a
notable event occurs, and decrement it when packets are sent.
Our implementation uses two state variables: bucket - the number of
Expires Jun. 1, 2002 [Page 12]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
packets that can be immediately sent, and ack_ratio - the number of
packets per ACK (upper bound). ack_ratio is implied by RFC-2581, as
at least one ACK "SHOULD" be sent for every 2 MSSs received, i.e.,
ACK compression [APS99]. This is our default; we parameterize it so
that when others modify TCP to use less frequent ACKs, our portion of
the code will operate correctly. It should be noted that less fre-
quent ACKs will, by definition, increase the burstiness of the source
as a result. We also assume 'initial_win' is defined, currently 2
MSS, but may be configurable as per Experimental RFC-2414 [FAP97].
There are two events that set the value of bucket. When an ACK is
received, we set bucket to ack_ratio * 2 + 1. This is for the same
reasons as Maxburst uses '5' - to allow for TCP to run at full rate
in the presence of isolated ACK losses. On timeout events, the bucket
is set to init_win.
When packets are emitted in the existing tcp_output routine, the
bucket is decremented. The bucket does not become negative; if it
would, that data is not sent, and the routine returns. This 'refusal
to send' behavior is identical to not having sufficient send window;
the application will retry, typically due to some timer event.
The code is as follows:
/* define the state for BOL */
int ack_ratio, /* max number of packets per ack */
int bucket, /* max burst size */
#DEFINE init_win 2
/* set the bucket on ACK receipt */
bucket = ack_ratio * 2 + 1;
/* set bucket on timeout */
bucket = init_win;
/* decrement the bucket accordingly */
can_send = min(bucket - want_send, 0);
bucket -= can_send;
do_send(can_send);
return can_send;
Local connections
We recently discovered cases where SSR adversely affected local con-
nections, i.e., connections within the same subnet. There is already
code in the current BSD-derived TCP implementation that disables ini-
tializing the congestion window (CWND) to a small value, when both
Expires Jun. 1, 2002 [Page 13]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
ends of a connection share a subnet. This code, from FreeBSD
tcp_input.c, is outlined below, where inp = tp->t_inpcb.
/*
* Don't force slow-start on local network.
*/
if (!in_localaddr(inp->inp_faddr))
tp->snd_cwnd = mss;
RFC-2414 observes that there are 3 types of window sizes used in BSD
TCP:
1. intital window
2. restart after timeout
3. restart after loss
Recent work indicates that these windows should be treated differ-
ently; e.g., the restart-after-loss window should always be set to 1,
and the initial window can be set to 4 or not used for local connec-
tions [FAP97]. We proposed that it is consistent to extend this local
exception to the restart-after-timeout window, as shown below as a
FreeBSD code segment:
idle = (tp->snd_max == tp->snd_una);
maxwin = (4 * tp->t_maxseg) + tp->snd_nxt - tp->snd_una;
if ((tp->snd_cwnd > maxwin) &&
!in_localaddr(tcp->t_inpcb->inp_faddr))
tp->snd_cwnd = maxwin;
This modification is considered reasonable because it corresponds to
the initial window determination. There are other cases where TCP
uses "local" rules, such as MSS determination, and piggybacking.
Performance Issues
The purpose of these modifications is to avoid line-rate bursts, pri-
marily as the result of the source being idle, but also due to ACK
losses. A result of many of the proposed solutions is to limit the
source, either by reducing the congestion window, or limiting the
transmission of bursts. The performance effects of congestion window
reduction are well-studied, and are not addressed here. BOL can limit
the source without modifying the congestion window, and there are
concerns that this may affect the performance of TCP.
In particular, some implementations of TCP send ACKs in a bursty
fashion, either by reducing the send/receive context switching, or by
reducing the overall number of ACKs. These modifications are used to
increase the performance of TCP, but can also increase the burstiness
Expires Jun. 1, 2002 [Page 14]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
of the source. BOL, by reducing the burstiness of the source, is con-
sidered to potentially limit the effectiveness of these enhancements.
TCP currently recommends that an ACK "SHOULD" be sent for at least
every two MSS's received. We assume that this recommendation implies
an interleaving factor for the source as well. If this factor is
encoded in a TCB variable (ack_ratio), then BOL can allow for corre-
sponding bursts consistent with reasonable (non-adjacent) ACK losses.
We assume that TCPs should not delay ACK processing to aggregate the
receive-side timeslices, in order to reduce send/receive interleav-
ing, unless this is also encoded in their ack_ratio variable. Our
initial measurements indicate that even the recommended interleaving
does not substantially affect the performance of TCP.
In addition, we recommend that any burst-avoidance modification be
disabled for direct LAN connections. For such connections, the con-
gestion window mechanism is already disabled in several TCP implemen-
tations. The resulting burstiness is handled by the link layer and
receiver buffers.
Conclusions
At this time, we propose BOL as a simple, minimal and effective fix
to the 'bug' in current TCP implementations that is exploited by
HTTP/1.1. Modifications can be made to TCP to solve the slow-start
restart problem that are consistent with the original congestion
avoidance specifications (i.e. a send timer). However, we feel that
the original intended behavior is not appropriate to some current
applications, specifically HTTP. Thus, we recommend BOL to prevent
bursts into the network. Not only does this solution solve the cur-
rent problem in a simple way, it will prevent bursting in any other
situation that might arise. The packet bursts which we allow are con-
sistent with congestion window growth algorithms and with Floyd's
conclusion about increasing the initial window size.
BOL, as well as the other solutions listed, need to be re-evaluated
within emerging TCP implementations, e.g., SACK [JB88]. In general,
TCP has no rate pacing and uses congestion control to avoid bursts in
current implementations. A more explicit mechanism, such as RBP or
similar proposals may be desirable in the future.
Security implications
There are security risks associated with these algorithms that can
result in denial of service attacks. Intermediate routers can delay
data or ACKs in ways that can cause the restarting party to restart
more conservatively. They also result in less probability of the
Expires Jun. 1, 2002 [Page 15]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
restarting party generating a line-rate burst.
In Send-Timer and Maxburst algorithms, as well as in a no-restart
system, an attacker (either an intermediate router or data receiver)
can clump the delivery of ACKs, which can result in line-rate bursts.
The Receive-Timer, Rate-Based Pacing, UI/LI, and BOL algorithms use
data or ACK arrival to moderate outgoing data and avoid bursts. If
the attacker attempts to deliver data or ACKs so as to cause a burst,
these algorithms successfully avoid creating a burst. In the process,
their throughput is reduced.
We believe the latter algorithms should be considered more robust
than the former, because slow-start restart focuses on avoiding
bursts, rather than maximizing throughput. In the recommended algo-
rithms (BOL now, RBP for the future), the system succesfully avoids
generating a burst as the result of an attacker.
Acknowledgements
We would like to acknowledge Kacheong Poon, Sally Floyd, members of
the End-to-End-Interest Mailing List, and members of the IETF TCP
Implementation Working Group for their feedback and discussions on
this document. We would also like to thank Ruth Aydt, Steve Tuecke,
and Carl Kesselman for their observation of the local connection
issue.
References
[AHO98] Mark Allman, Chris Hayes, and Shawn Ostermann. An Evaluation
of TCP with Larger Initial Windows. ACM Computer Communications
Review, 28(3), 41-52 , July 1998.
[APS99] M. Allman, V. Paxson, W. Stevens. TCP Congestion Control,
April 1999. RFC-2581.
[BPK97] Hari Balakrishnan, Venkata N. Padmanabhan, and Randy H. Katz.
The Effects of Asymmetry on TCP Performance. In Proceedings of
the ACM/IEEE Mobicom, Budapest, Hungary, ACM. September, 1997.
[BSSK97] Hari Balakrishnan, Srinivasan Seshan, Mark Stemm, and Randy
H. Katz. Analyzing Stability in Wide-Area Network Performance.
In Proceedings of the ACM SIGMETRICS, Seattle WA, USA, ACM.
June, 1997.
[FGMFB97] R. Fielding, Jim Gettys, Jeffrey C. Mogul, H. Frystyk, and
Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1,
Expires Jun. 1, 2002 [Page 16]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
January 1997. RFC-2068.
[FAP97] Mark Allman, Sally Floyd, and Craig Partridge. Increasing
TCP's Initial Window, September 1998. RFC-2414.
[Hei97] John Heidemann. Performance Interactions Between P-HTTP and
TCP Implementations. ACM Computer Communications Review, 27(2),
65-73, April 1997.
[HOT97] John Heidemann, Katia Obraczka, and Joe Touch. Modeling the
Performance of HTTP Over Several Transport Protocols. ACM/IEEE
Transactions on Networking 5(5), 616-630, October, 1997.
[JB88] Van Jacobson and R.T. Braden. TCP extensions for long-delay
paths, October 1988. RFC 1072.
[Jac88] Van Jacobson. Congestion Avoidance and Control. Proc. SIG-
COMM '88, in ACM Computer Communication Review, 18(4):314-329,
August 1988. NOTE: This paper lacks the appendix that describes
slow-start restart, which appears only in the unpublished [JK92].
[JK92] Van Jacobson and Mike Karels. Congestion Avoidance and Con-
trol. This is a revised version of [Jac88], which adds Karels as
co-author and includes an additional appendix. The revised ver-
sion has not been published, but is available at
ftp://ftp.ee.lbl.gov/papers/congavoid.ps.Z
[NS97] ns Network Simulator. http://www-mash.cs.berkeley.edu/ns/,
1997.
[PK98] Venkata N. Padmanabhan and Randy H. Katz. TCP Fast Start: A
Technique for Speeding Up Web Transfers. In Proceedings of the
IEEE GLOBECOM Internet Mini-Conference, Sydney, Australia, Novem-
ber, 1998.
[PN98] K. Poduri and K. Nichols. Simulation Studies of Increased Ini-
tial TCP Window Size, September 1998. RFC-2415.
[Poo97] Kacheong Poon, Sun Microsystems, tcp-implementors mailing
list, August, 1997.
[Tou97] Joe Touch, ISI, tcp-implementors mailing list, August 12,
1997.
[VH97] Vikram Visweswaraiah and John Heidemann. Improving Restart of
Idle TCP Connections. Technical Report 97-661, University of
Southern California, November 1997.
Expires Jun. 1, 2002 [Page 17]
Hughes et al. Issues in TCP Slow-Start Restart Dec. 1, 2001
Authors/ Address
Amy S. Hughes, Joe Touch, John Hiedemann
University of Southern California/Information Sciences Institute
4676 Admiralty Way
Marina del Rey, CA 90292-6695
USA
Phone: +1 310-822-1511
Fax: +1 310-823-6714
URLs: http://www.isi.edu/~ahughes
http://www.isi.edu/touch
http://www.isi.edu/~johnh
Email: ahughes@isi.edu
touch@isi.edu
johnh@isi.edu
Expires Jun. 1, 2002 [Page 18]