Internet DRAFT - draft-heavens-problems-rsts
draft-heavens-problems-rsts
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 00:20:00 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Thu, 13 Jun 1996 22:22:00 GMT
ETag: "2e9af8-aef0-31c09488"
Accept-Ranges: bytes
Content-Length: 44784
Connection: close
Content-Type: text/plain
Internet Draft Ian Heavens
Expires December 15, 1996 Fore Systems
June 1996
RSTs Considered Harmful
draft-heavens-problems-rsts-02.txt
Status of this Memo
This memo is being distributed to members of the Internet community
in order to solicit their reactions to the proposals contained in it.
This document is an Internet-Draft. Internet-Drafts are working do-
cuments of the Internet Engineering Task Force (IETF), its areas, and
its working groups. Note that other groups may also distribute work-
ing documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet- Drafts as reference ma-
terial or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet- Drafts Sha-
dow Directories on ds.internic.net (US East Coast), nic.nordu.net
(Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
Rim).
Abstract
This memo argues that the danger of segments from old TCP connections
occurs for connections terminated by RST segments, as well as those
terminated by exchange of FIN segments. In addition, TIME-WAIT state
alone does not provide complete protection. The likelihood of data
corruption is significant, in that it exceeds the probability of
corruption after FIN exchange for which TIME-WAIT state was designed.
Heavens [Page 1]
Internet Draft RSTs Considered Harmful June 1996
Table of Contents
1. Introduction
1.1 Overview
1.2 Background
1.3 RST-Terminated Connections
2. Old Segment Acceptance from RST-Terminated Connections
2.1 RST-Terminated Connections from Established State
2.2 RST-Terminated Connections during Closedown
2.3 Proof by Demonstration
2.4 Other Hazards
2.5 Relative Probabilities
3. TIME-WAIT after RST Transmission
3.1 User Abort with TIME-WAIT
3.2 RST Loss and Data Retransmission
3.3 RST Loss and Idle Connections
Appendix A: A Different Interpretation of RFC-1122
Appendix B: Relative Probabilities of Hazards
Appendix C: Traffic Statistics for TCP Connections
Heavens [Page 2]
Internet Draft RSTs Considered Harmful June 1996
Glossary
o FIN-Terminated Connection
A synchronised TCP connection which terminates by the 3-way
handshake, involving the exchange and reliable acknowledgement
of FIN segments.
o RST-Terminated Connection
A synchronised TCP connection which terminates by transmission
or reception of a RST.
o MSL
Maximum Segment Lifetime
Heavens [Page 3]
Internet Draft RSTs Considered Harmful June 1996
1. Introduction
1.1 Overview
Chapter 1 describes mechanisms for closing TCP connections, and the
significance of the TIME-WAIT state.
Chapter 2 identifies a series of connection terminations involving
RSTs that may lead to data corruption.
Chapter 3 shows how the use of TIME-WAIT state alone can provide some
protection against this and identifies scenarios where this solution
is insufficient.
1.2 Background
FINs, RSTs, Timers and ICMP Messages
There are four mechanisms available in [RFC-793] to close a TCP con-
nection: FINs, RSTs, Timeouts and ICMP messages.
FINs may be used to close down a connection in an orderly fashion,
guaranteeing reliable delivery of all data segments transmitted
before the FIN in each direction. The requirement to reliably ack-
nowledge FINs in both directions leads to a number of half-closed
states: FIN-WAIT-1, FIN-WAIT-2, CLOSING, CLOSE-WAIT, LAST-ACK and
TIME-WAIT.
A RST closes a connection abruptly, immediately removing connection
state on transmission or reception. There are no interim states;
transition is to CLOSED on transmission or reception of a RST.
Timeouts also close a connection abruptly; a connection that times
out optionally transmits a RST, or it may assume that the peer has
disappeared. Timeouts also cause an immediate transition to CLOSED.
ICMP messages do not usually terminate a synchronised connection, but
it is possible. In the same way as connections terminated by RST or
timeout, there is an immediate state transition to CLOSED.
This memo restricts its attention to connections closed by FINs and
RSTs.
TIME-WAIT
The TIME-WAIT state has two functions in the TCP protocol. The first
is asymmetric: to ensure the reliable acknowledgement of FINs
transmitted in CLOSE-WAIT state and so the completion of the 3-way
Heavens [Page 4]
Internet Draft RSTs Considered Harmful June 1996
closing handshake. The second is symmetric: to ensure that all TCP
segments, generated in either direction during the lifetime of the
connection, have drained from the network before initiation of a new
incarnation of the connection. The clock based ISN protects slow con-
nections against this threat [RFC-793]. For fast connections, this is
no longer true. In this case, TIME-WAIT prevents the acceptance of
old duplicate segments by a new incarnation utilising identical port
numbers. The relative threats are explained in the Appendix of [RFC-
1185], and in section 1.2 of [RFC-1323]. The problem is summarised
in relation to the danger of premature termination of TIME-WAIT state
by RST reception (TIME-WAIT assassination) in [RFC-1337].
No equivalent mechanism to TIME-WAIT exists for connections ter-
minated by transmission of a RST segment. Although RST transmission
is omitted from the TCP Connection State Diagram, the text of [RFC-
793] clearly states that where the transmission of a RST results in a
state change, it is to CLOSED state. Similarly, reception of a RST
causes a state change to CLOSED.
1.3 RST-Terminated Connections
There are several ways in which previously synchronised connections
are terminated by RST transmission. These include User Abort [RFC-
793] and reception of data after half-duplex close [RFC-1122]. How-
ever, not all RSTs result in connection termination. Reception of a
SYN segment addressed to a port for which there is no listening
socket results in transmission of a RST. This is associated with no
connection and is equivalent to an ICMP Port Unreachable. The origi-
nator of the SYN changes state from SYN-SENT to CLOSED on reception
of the RST, and the connection is never synchronised. Other connec-
tions in non-synchronised states respond to an unacceptable ACK,
security or precedence mismatch by transmitting a RST. In all these
cases, no connection has been synchronised nor data sent, so that
there is no danger of old data segments being accepted by subsequent
incarnations of the connection.
This memo distinguishes those synchronised connections which ter-
minate by transmission or reception of a RST by referring to them as
"RST-terminated connections".
Heavens [Page 5]
Internet Draft RSTs Considered Harmful June 1996
2. Old Segment Acceptance from RST-Terminated Connections
Several scenarios result in the spurious acceptance of old segments
from RST-terminated connections. Two types of examples are given
here: connections aborted in Established state, and connections
aborted during the 3-way closing handshake.
2.1 RST-Terminated Connections from Established State
There are two instances of RST-terminated connections from Esta-
blished state which involve the hazard of old data acceptance by a
subsequent incarnation of the connection.
The first is a User Abort issued in Established state; the second a
half-duplex close with unread data [RFC-1122, p.88]. The sequence of
events in both case is identical: a RST is sent by the socket from
Established state, as a result of an abort, or a close with pending
unread data.
In the worst failure mode, the socket issuing the abort is acting as
a data sink. In this case a window of data segments may be in tran-
sit when the RST is received at the data source. Any of these seg-
ments - which are not duplicates - may corrupt a subsequent incarna-
tion of the connection.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- CLOSED
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
7. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
8. CLOSED <-- <SEQ=101><CTL=RST> ...
Figure 1. Connection closed by User Abort
This is shown in Figure 1. TCP A is the data source and TCP B is the
data sink. Line 1 shows a normal data segment from TCP A. An ACK
Heavens [Page 6]
Internet Draft RSTs Considered Harmful June 1996
segment is transmitted by TCP B on line 2. TCP B user issues an
abort, transmits a RST, and enters CLOSED state on line 3, as speci-
fied in [RFC-793]. Normal data continues to be transmitted by TCP A
on line 4. Line 5 shows the arrival at TCP A of the ACK generated on
line 2. This may open the window and elicit further segments from
TCP A on lines 6 and 7, until the arrival of the RST at TCP A on line
8. At this point TCP A enters CLOSED state, and three data segments
from TCP A are in transit to TCP B.
The connection is reopened by the 3-way SYN handshake. Assume that
the clock based ISN chosen by TCP A for the new connection has been
overrun by the sequence number consumption in the previous incarna-
tion of the connection. The sequence numbers occupied by the last
three segments transmitted by TCP A during the previous incarnation
may overlap the window offered by TCP B in the current incarnation of
the connection.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
2. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
3. (old segment)...<SEQ=560><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
4. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
5. ESTABL. --> <SEQ=500><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
6. ... <SEQ=101><ACK=640><CTL=ACK> <-- ESTABL.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7a. ESTABL. --> <SEQ=600><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
8a. ESTABL. <-- <SEQ=101><ACK=640><CTL=ACK> ...
9a. ESTABL. --> <SEQ=700><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
10a ESTABL. <-- <SEQ=101><ACK=800><CTL=ACK> <-- ESTABL.
Figure 2: Accepting One Old Segment
Figure 2 shows the spurious acceptance of part of a segment from the
previous incarnation of the connection. Line 1 shows a normal data
segment from TCP A after the SYN handshake has been completed. Line
2 shows the ACK of this segment, and line 3 shows the arrival of an
old segment from the previous connection. It falls within TCP B's
Heavens [Page 7]
Internet Draft RSTs Considered Harmful June 1996
current window and is queued in the TCP reassembly queue, as its
sequence number exceeds the next expected sequence number. Since
there is a missing segment, the next ACK in line 4 acknowledges the
previous bona fide segment, and TCP A does not detect acknowledgement
of unsent data. The next segment from the current connection arrives
at TCP B in line 5. At this point, part or all of the old segment is
delivered to the user of TCP B, depending upon the implementation of
the reassembly algorithm. This behaviour is described in [RFC-1337].
TCP B transmits the acknowledgement of the two previous segments in
line 6. TCP A transmits another segment on line 7a before the arrival
of the acknowledgement in line 8a, and assumes that it is a partial
acknowledgement of this segment. Segment transmission and ack-
nowledgement continue as usual on lines 9a and 10a. Neither TCP A
nor TCP B are aware of the spurious acceptance of old data by TCP B.
To underscore the possibility of the erroneous acceptance of several
old segments, Figure 3 shows the acceptance of two such segments.
The exchange is identical to Figure 2 until 7a, when a second old
segment from TCP A arrives at TCP B. Since TCP B has queued the
first old segment from TCP A, it delivers the entire second old seg-
ment to the user. TCP B transmits the acknowledgement on line 7b.
Line 8a and subsequent lines show the arrival of the acknowledgements
of spurious segments and the transmission of further segments by TCP
A. The acknowledgements are accepted as valid, since TCP A has
already transmitted past the sequence number acknowledged in the last
ACK from TCP B.
Heavens [Page 8]
Internet Draft RSTs Considered Harmful June 1996
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
2. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
3. (old segment)...<SEQ=560><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
4. ESTABL. <-- <SEQ=101><ACK=500><CTL=ACK> <-- ESTABL.
5. ESTABL. --> <SEQ=500><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
6. ... <SEQ=101><ACK=640><CTL=ACK> <-- ESTABL.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
7a. (old segment)...<SEQ=640><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
7b. ... <SEQ=101><ACK=720><CTL=ACK> <-- ESTABL.
7c. ESTABL. --> <SEQ=600><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
7d. ... <SEQ=101><ACK=720><CTL=ACK> <-- ESTABL.
8a. ESTABL. <-- <SEQ=101><ACK=640><CTL=ACK> ...
9a. ESTABL. --> <SEQ=700><ACK=101><DATA=100><CTL=ACK> --> ESTABL.
9b. ESTABL. <-- <SEQ=101><ACK=720><CTL=ACK> ...
9c. ESTABL. <-- <SEQ=101><ACK=720><CTL=ACK> ...
10a ESTABL. <-- <SEQ=101><ACK=800><CTL=ACK> <-- ESTABL.
Figure 3: Accepting Two Old Segments
These examples may be generalised to illustrate the arrival and
acceptance of a window of old segments at TCP B.
It is also possible for old segments to persist in the case where a
user abort is issued on the socket acting as a data source. This
happens when the ensuing RST arrives before one or more of the data
segments previously transmitted. This is shown in Figure 4.
Heavens [Page 9]
Internet Draft RSTs Considered Harmful June 1996
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
3. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
4. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
(User Abort)
6. CLOSED --> <SEQ=101><CTL=RST>
7. ... <SEQ=101><CTL=RST> --> CLOSED
8. <SEQ=480><ACK=101><DATA=80><CTL=ACK> -->
9. <SEQ=560><ACK=101><DATA=80><CTL=ACK> -->
10. <SEQ=640><ACK=101><DATA=80><CTL=ACK> -->
Figure 4. User Abort and RST Reordering
The acceptance of old segments in transit on lines 8, 9 and 10 occurs
in an identical fashion to the previous example, as shown in Figures
2 and 3.
2.2 RST-Terminated Connections during Closedown
RST-terminated connections also occur from states other than Esta-
blished, during the 3-way closing handshake. Two examples are User
Abort [RFC-793] and Half Duplex Close [RFC-1122].
User Abort during Closedown
A user abort issued in FIN-WAIT-1, FIN-WAIT-2, CLOSING or CLOSE-WAIT
states results in the transmission of a RST, and the socket enters
CLOSED state [RFC-793]. The consequences of user abort in FIN-WAIT-
1, FIN-WAIT-2 and CLOSW-WAIT are similar to the previous section; an
entire window may be in transit when the RST is transmitted, if there
is data in transfer in the opposite direction to that folllowed by
the FIN. In CLOSING state, the FIN, and all data segments, have been
received by the peer before it transmits the RST, and no non-
duplicate data segments are in the network. In this case the danger
reduces to that of old duplicate segments, as in a conventionally
Heavens [Page 10]
Internet Draft RSTs Considered Harmful June 1996
closed TCP connection.
Data received after Half Duplex Close
A host may implement a half-duplex TCP close, where an application
that has called CLOSE cannot continue to read data from the connec-
tion [RFC-1122]. Subsequent arrival of data elicits a RST. RFC-1122
does not explicitly state whether the connection enters CLOSED state.
In this section the assumption is made that it does. Appendix A
shows the results if this assumption is invalid. The danger of
acceptance of old segments still exists in the latter case.
It is straightforward to demonstrate this scenario. Berkeley UNIX
implementations of FTP [RFC-959] abort transfers in this fashion when
the receiver cannot write out the file to disk, because the disk is
full or because the file is too large. Figure 5 shows this scenario.
TCP A is a 80386 running Interactive UNIX with SpiderTCP, and TCP B
is a Sparcstation running SunOS 4.1.3. An FTP client is started from
TCP A and the 'get' command used to download a file from TCP B. TCP
A aborts the connection because the file limit is reached. The FTP
control connection is closed first and then the data connection.
Further data arrives from TCP B. Since this arrives in FIN-WAIT-2,
and BSD TCP/IP implements half duplex close, it elicits a RST from
TCP A [RFC-1122], and TIME-WAIT state is bypassed. Note that figure
5 shows only the FTP data connection, not the control connection.
TCP A TCP B
1. ESTABL. <-- <SEQ=220><ACK=100><DATA=80><CTL=ACK> <-- ESTABL.
2. ESTABL. --> <SEQ=100><ACK=300><CTL=ACK> --> ESTABL.
(File Too Large: Close)
3. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
4. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
5. FIN-WAIT-2 <-- <SEQ=300><ACK=101><DATA=80> <-- CLOSE-WAIT
6. CLOSED --> <SEQ=101><CTL=RST> --> CLOSED
Figure 5. Data Received after Half Duplex Close
If the ACK in line 4 is delayed or lost, TCP A is still in FIN-WAIT-1
in line 5, when the data arrives. A RST is transmitted and there is
a state transition to CLOSED, as above. For both these scenarions,
the danger of acceptance by a subsequent incarnation of the connec-
tion occurs in identical fashion to Figure 2.
Heavens [Page 11]
Internet Draft RSTs Considered Harmful June 1996
2.3 Proof by Demonstration
The hazards described in this memo could be shown with the testbed
used to demonstrate the hazards of TIME-WAIT assassination in [RFC-
1337]. This might involve a client application acting as a data
source, and a server which, on receipt of the first data segment,
transmits a RST and closes the connection. Repetition of this over
a long period should cause the server to accept an old segment from a
previous incarnation as described in Figure 2 above. No duplication
of segments is required within the testbed, unlike demonstration of
TIME-WAIT Assassination.
2.4 Other Hazards
Two other hazards exist as a result of RST-terminated connections; a
de-synchronised connection as a result of an old ACK that is accept-
able but acknowledges something not yet sent, and connection failure,
also as a result of receiving an old ACK. The ACKs, like data, need
not be duplicate segments. [RFC-1337] shows how these two hazards,
referred to as H2 and H3, occur; this memo concentrates on examples
of the hazard, referred to as H1 in [RFC-1337], of erroneous accep-
tance of old segments containing data.
2.5 Relative Probabilities
Although RSTs are less common than FINs as a means of closing connec-
tion, the likelihood of data arriving after closedown is higher.
Appendix B derives a ratio of probability based on observed traffic
statistics. Though an informal analysis, it implies that there is a
significant risk in using RSTs to close connections.
Heavens [Page 12]
Internet Draft RSTs Considered Harmful June 1996
3. TIME-WAIT after RST Transmission
One solution to the dangers presented in the previous section
involves the extension of the TIME-WAIT state to RST-terminated con-
nections. This turns out to offer only partial protection against
data corruption.
TIME-WAIT state must be entered by the TCP endpoint that sends the
RST; if the receiver enters TIME-WAIT, loss of the RST means that
there is no TIME-WAIT state and the risk of data corruption still
exists.
A connection in any of SYN-RECVD, ESTABLISHED, FIN-WAIT-1, FIN-WAIT-
2, CLOSING and CLOSE-WAIT states enters TIME-WAIT state on transmis-
sion of a RST, rather than CLOSED. Reception of a RST causes a tran-
sition to CLOSED as in [RFC-793]. Minor modifications to the seman-
tics of TIME-WAIT are required: if entered after RST transmission,
reception of all further valid non-RST segments elicits a RST, rather
than an ACK, and the TIME-WAIT timer is restarted. Received RSTs are
ignored in TIME-WAIT, as proposed by fix F1 in [RFC-1337].
Heavens [Page 13]
Internet Draft RSTs Considered Harmful June 1996
3.1 User Abort with TIME-WAIT
This solution is shown in Figure 6 for the case of User Abort in
ESTABLISHED state. The hazards outlined in Figures 2 and 3 are less
likely to occur.
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> --> ESTABL.
2. ... <SEQ=101><ACK=480><CTL=ACK> <-- ESTABL.
(User Abort)
3. ... <SEQ=101><CTL=RST> <-- TIME-WAIT
4. ESTABL. --> <SEQ=480><ACK=101><DATA=80><CTL=ACK> ...
5. ESTABL. <-- <SEQ=101><ACK=480><CTL=ACK> ...
6. ESTABL. --> <SEQ=560><ACK=101><DATA=80><CTL=ACK> ...
7. ESTABL. --> <SEQ=640><ACK=101><DATA=80><CTL=ACK> ...
8. CLOSED <-- <SEQ=101><CTL=RST> ...
9. ... <SEQ=480><ACK=101><DATA=80><CTL=ACK> --> TIME-WAIT
10. CLOSED <-- <SEQ=101><CTL=RST> <-- TIME-WAIT
11. ... <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> TIME-WAIT
12. CLOSED <-- <SEQ=101><CTL=RST> <-- TIME-WAIT
13. ... <SEQ=560><ACK=101><DATA=80><CTL=ACK> --> TIME-WAIT
14. CLOSED <-- <SEQ=101><CTL=RST> <-- TIME-WAIT
15. (2 MSL)
CLOSED
Figure 6. Connection Closed by User Abort
Heavens [Page 14]
Internet Draft RSTs Considered Harmful June 1996
The solution outlined above offers partial protection against data
corruption hazards arising from RST-terminated connections. However,
delay or loss of a RST gives rise to a potential hazard.
For TIME-WAIT state to provide full protection, it must commence
after both ends of a connection have stopped transmitting data. This
is guaranteed for the peer that enters TIME-WAIT, since it has
transmitted a RST and no data can follow this. The transition to
TIME-WAIT must also take place after the other peer has ceased data
transmission. The 3-way closing handshake enforces this for conven-
tionally closed connections; TIME-WAIT state is always entered after
the CLOSE-WAIT to LAST-ACK transition at the last peer to transmit
data.
The lack of an equivalent mechanism for RST-terminated connections
leads to situations where the effective TIME-WAIT state is truncated
or vanishes completely.
3.2 RST Loss and Data Retransmission
Figure 7 shows a scenario where TCP A is retransmitting data seg-
ments, lost because of network congestion. Owing to exponential
backoff, as described in [RFC-1122], the interval between successive
retransmissions is now the 60 second limit common to many TCP imple-
mentations. TCP B gives up and aborts the connection, entering
TIME-WAIT state as mandated by the partial solution in chapter 3.
The ensuing RST is lost, as the network is still congested. TCP A
continues to retransmit. At some point network congestion eases, and
a retransmitted data segment reaches TCP B. A new incarnation of the
connection may be in existence, and the data segment may be errone-
ously accepted.
Heavens [Page 15]
Internet Draft RSTs Considered Harmful June 1996
TCP A TCP B
1. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ESTABL.
(lost)
(User Abort)
2. ... <SEQ=101><CTL=RST> <-- TIME-WAIT
(lost)
(RTX after 60 seconds)
3. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(RTX after 60 seconds)
4. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(RTX after 60 seconds)
5. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(RTX after 60 seconds)
6. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> TIME-WAIT
(lost)
(2 MSL)
7. CLOSED
(RTX after 60 seconds)
8. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ...
Figure 7. RST Loss and Data Retransmission
3.3 RST Loss and Idle Connections
It is not necessary for data transmission to be in progress for the
above hazard to occur. Consider the case where the user aborts an
idle connection, as shown in Figure 8. TCB B issues the abort, and
enters TIME-WAIT. The RST is lost, so that TCP A remains in ESTA-
BLISHED state. No activity occurs until TCP A tries to transmit
data, an interval that is unbounded, and so may exceed twice the MSL.
The data segment may be erroneously accepted at TCP B by a subsequent
incarnation of the connection.
Heavens [Page 16]
Internet Draft RSTs Considered Harmful June 1996
TCP A TCP B
1. ESTABL. ESTABL.
(User Abort)
2. ... <SEQ=101><CTL=RST> <-- TIME-WAIT
(lost)
(2 MSL)
3. CLOSED
(Interval > 2MSL)
4. ESTABL. --> <SEQ=400><ACK=101><DATA=80><CTL=ACK> ...
Figure 8. RST Loss and Idle Connections
Heavens [Page 17]
Internet Draft RSTs Considered Harmful June 1996
Security Considerations
Security issues are not discussed in this memo.
References
[Congestion]
V. Jacobson, "Congestion Avoidance and Control," ACM SIGCOMM-88,
August 1988.
[RFC-792]
J. Postel, "Internet Control Message Protocol", RFC-792,
USC/Information Sciences Institute, September 1981.
[RFC-793]
Postel, J., "Transmission Control Protocol", RFC-793,
USC/Information Sciences Institute, September 1981.
[RFC-959]
J. Postel, J. Reynolds, "File Transfer Protocol", RFC-959, ISI,
October 1985.
[RFC-1122]
R. Braden, "Requirements for Internet hosts - communication
layers", October 1989.
[RFC-1185]
Jacobson, V., Braden, R., and Zhang, L., "TCP Extension for High-
Speed Paths", RFC-1185, Lawrence Berkeley Labs, USC/Information
Sciences Institute, and Xerox Palo Alto Research Center, October
1990.
[RFC-1191]
J. Mogul, S. Deering, "Path MTU Discovery", RFC-1191, November
1990.
[RFC-1323]
Jacobson, V., Braden, R. and D. Borman "TCP Extensions for High
Performance", RFC-1323, Lawrence Berkeley Labs, USC/Information
Sciences Institute, and Cray Research, May 1992.
[RFC-1337]
R. Braden, "TIME-WAIT Assassination Hazards in TCP", RFC-1337,
ISI, May 1992.
[TCP/IP-Illustrated]
Gary Wright & Richard Stevens, "TCP/IP Illustrated, Volume 2",
Addison-Wesley 1995.
Heavens [Page 18]
Internet Draft RSTs Considered Harmful June 1996
Acknowledgements
Thanks to Alan Cox and Jon Crowcroft for their comments on previous
expanded versions of this memo, and to Bob Braden for [RFC-1337], which
stimulated ideas leading to it.
Author's Address:
Ian Heavens
Fore Systems Inc.
2475 The Crescent,
Solihull Parkway
Birmingham Business Park
B37 7YE
United Kingdom
Phone: +44 (0)121 717 4444
Fax: +44 (0)121 717 4455
Email: iheavens@fore.co.uk
Heavens [Page 19]
Internet Draft RSTs Considered Harmful June 1996
4. Appendix A: A Different Interpretation of RFC-1122
There are problems with interpreting [RFC-1122] to respond to the
arrival of data after half duplex close with a RST and no state
change. The connection hangs if data arrives at TCP A in FIN-WAIT-2,
as Figure 9 shows.
TCP A TCP B
1. ESTABLISHED ESTABLISHED
(Close)
2. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
3. FIN-WAIT-2 <-- <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. FIN-WAIT-2 <-- <SEQ=300><ACK=101><DATA=30> <-- CLOSE-WAIT
(user data after half duplex close)
5. FIN-WAIT-2 --> <SEQ=301><ACK=131><CTL=RST> --> CLOSED
Figure 9. Data Received in FIN-WAIT-2 after Half Duplex Close
If the ACK of the FIN is lost or delayed, and data arrives in FIN-
WAIT-1, the connection terminates without entering TIME-WAIT state.
This is shown in Figure 10.
TCP A TCP B
1. ESTABLISHED ESTABLISHED
(Close)
2. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSE-WAIT
3. (lost) ... <SEQ=300><ACK=101><CTL=ACK> <-- CLOSE-WAIT
4. FIN-WAIT-1 <-- <SEQ=300><ACK=100><DATA=30> <-- CLOSE-WAIT
(user data after half duplex close)
5. FIN-WAIT-1 --> <SEQ=101><CTL=RST> --> CLOSED
6. FIN-WAIT-1 --> <SEQ=100><ACK=300><CTL=FIN,ACK> --> CLOSED
7. CLOSED <-- <SEQ=300><CTL=RST> <-- CLOSED
Figure 10. Data Received in FIN-WAIT-1 after Half Duplex Close
Heavens [Page 20]
Internet Draft RSTs Considered Harmful June 1996
5. Appendix B : Relative Probabilities of Hazards
5.1 Introduction
This section contains a less than rigorous analysis of the relative
probabilities of the various data corruption hazards. Note that
these probabilities are zero for TCP connections operating below 250
kbytes/second; the initial sequence number selection protects against
data corruption hazards, regardless of the mechanism for closing the
connection.
5.2 FIN, RST, Timer and ICMP Related Hazards
It is useful to compare the relative probabilities of hazards arising
from FIN-, RST-, Timer- and ICMP-terminated TCP connections.
The probability of each hazard is proportional to the amount of data
received after transition to CLOSED. Complete protection requires
that this be guaranteed to be zero. Data received after connection
closure does not cause data corruption, unless it falls within the
current window of a new incarnation of the connection.
It is assumed that the connection peer displaying the hazard is act-
ing as a data sink, maximising the data received and the probability
of failure. If the proportion of TCP connections acting as data
sinks or data sources is the same regardless of how the connection
terminates, the relative probabilities remain the same.
To simplify the arithmetic, higher order effects are ignored; for
instance, those arising from the loss of more than one TCP segment in
the period considered.
The three hazards considered are data corruption arising from the
following:
o Hazard 1: A FIN-terminated TCP connection with TIME-WAIT state
omitted.
o Hazard 2: A TCP connection aborted from Established state, with
neither TIME-WAIT nor LAST-ACK states.
o Hazard 3: A TCP connection aborted from Established state, with
TIME-WAIT but without LAST-ACK state.
Other hazards, such as connections aborted during closedown, by
timeouts, or ICMP messages, are ignored. These are much less likely
than Hazard 2. The duration of closedown is typically much shorter
than that of Established state. Timeouts require multiple loss of
Heavens [Page 21]
Internet Draft RSTs Considered Harmful June 1996
segments in the network and represent higher order effects, with
correspondingly lower probabilities. ICMP termination of synchron-
ised connections is very rare.
Nomenclature
P1 - Relative probability of Hazard 1
P2 - Relative probability of Hazard 2
PL - Probability of loss of a TCP segment in the network
PR - Probability that a TCP connection terminates by RST
PT - Probability that a TCP connection terminates by timeout
PI - Probability that a TCP connection terminates by ICMP message
MSS - Maximum Segment Size
W - Maximum offered TCP window
o Hazard 1
Duplicate segments received after FIN-terminated connections usu-
ally arise because of the loss of an ACK, triggering an unneces-
sary retransmission. Slow start [Congestion] implies that only
one segment will be retransmitted without acknowledgement. The
relative probability of H1 is the segment size multiplied by the
probability of segment loss and the probability of termination by
FIN handshake:
P1 = MSS * PL * (1 - PR - PT - PI) = MSS * PL
ignoring higher order effects.
o Hazard 2
For a data sink, transmission of a RST in Established state and
transition to CLOSED state is followed by reception of up to a
window of data, all of which may be received during a subsequent
incarnation of the connection.
The relative probability of H2 is the window size multiplied by
the probability of termination by RST:
P2 = W * PR
o Hazard 3
In this case, a RST is lost. Any data received in TIME-WAIT
causes the TIME-WAIT timer to restart, so the hazard only occurs
if the gap between reception of segments exceeds the duration of
Heavens [Page 22]
Internet Draft RSTs Considered Harmful June 1996
TIME-WAIT state. This occurs if several retransmitted segments
are lost, which is a higher order effect with low probability, or
if an application spontaneously transmits data after this time,
which is also unlikely. This hazard can be ignored.
5.3 Relative Probabilities of FIN- and RST-related Hazards
The ratio of probabilities of hazard H2 and H1 is
P2/P1 = W/MSS * PR/PL
Example Calculation
If Path MTU Discovery [RFC-1191] is supported, the segment size is
the Maximum Segment Size indicated by the lowest physical packet size
on the connection path, unless negotiated to be lower during connec-
tion establishment. Implementation of [RFC-1191] is not yet
widespread, so the default figure is assumed [RFC-1122, 3.3].
TCP segment size = 576 - size of TCP and IP headers = 536
Assume a window size of 32K. Appendix C summarises statistics about
TCP connections, derived from a variety of connections. Taking the
average percentage values of PR=1.1 and PL=1.2 derived from Appendix
C:
P2/P1 = W/MSS * PR/PL = 32768/536 * 1.1/1.2 = 56.
For TCP connections on the same physical network, or where Path MTU
Discovery is supported, the default segment size is larger and rela-
tive probability smaller.
The lowest ratio consistent with the data in Appendix C can be calcu-
lated from the highest value of PL (2.9) and the lowest value of PR
(0.8):
P2/P1 = 17.
It can be concluded that erroneous acceptance of data from expired
connections is significantly more likely to occur as a result of
RST-terminated connections than the equivalent hazard after FIN-
terminated connections.
Heavens [Page 23]
Internet Draft RSTs Considered Harmful June 1996
6. Appendix C:Traffic Statistics for TCP Connections
Statistics were measured using the netstat program on six machines:
[1] A home workstation (VMS) used for telecommuting via a 56Kb Frame
Relay link to the Internet.
[2] A DNS and mail gateway (VMS) at the University of Tucson,
Arizona.
[3] A personal workstation (SunOS 4.1.3) on Spider Systems' (now
Shiva Corporation) corporate LAN.
[4] The BSD development system (BSD4.4-Lite) at the Computer Science
department, Berkeley, California (taken from [TCP/IP-Illustrated],
p.799).
[5] A file server (SunOS 4.1.3) on Spider Systems' corporate LAN.
[6] An application gateway (SunOS 4.1.3) between Spider Systems' cor-
porate LAN and the Internet.
The columns show statistics collected by the BSD netstat utility or
its VMS equivalent, with the exception of machine uptime. The
derivation of the statistics from the BSD TCP/IP "tcpstat" structure
is shown in parentheses.
o machine (M)
o time in days that the machine has been up (U)
o number of TCP connections established (tcpstat.tcp_connects).
o number of TCP connections aborted by RST transmission, expressed as
a sum of the total aborted excluding those aborted by reception of
data after half duplex close, and those aborted after half duplex
close ((tcpstat.tcps_drops - tcpstat.tcps_rcvafterclose) +
tcpstat.tcps_rcvafterclose).
o number of TCP connections timed out expressed as a sum of the
number timed out by retransmissions and keepalives
(tcpstat.tcps_timeoutdrop + tcpstat.tcps_keeptimeo).
o total number of TCP data segments transmitted, excluding
retransmissions (tcpstat.tcps_sndpack -
tcpstat.tcps_sndrexmitpack).
o total number of TCP data segments retransmitted
Heavens [Page 24]
Internet Draft RSTs Considered Harmful June 1996
(tcpstat.tcps_sndrexmitpack).
M U Establ. Dropped Timed Out TXed Segs RTXed Segs.
1 2 408 4+1 263+1 135168 250
2 5 46632 456+102 7338+551 317523 4756
3 ? 138682 13349+3686 79+2345 22761633 104440
4 30 126820 44+1017 86+3219 8920528 257295
5 20 13557 198+205 43+28 1559505 1675
6 14 48226 3943+1396 11+190 11505576 67401
Percentage values for aborted and timed out connections, and for seg-
ment loss, are as follows.
Machine Dropped (%) Timed Out (%) Retransmissions (%)
1 1.2 64.7 0.18
2 1.2 16.9 1.50
3 12.3 1.75 0.46
4 0.8 2.60 2.88
5 3.0 0.52 0.11
6 11.1 0.42 0.59
Machine 3 and 5 are internal to a LAN and mostly handle NFS traffic,
so may be expected to have different patterns of connection estab-
lishment and segment losses. Dropped connections for machine 6 are
such a high proportion that some pathological system or application
problem can be suspected. These machines are excluded from calcula-
tions.
Aborted connections yield more consistent percentages than timeouts
and segment loss rates; this may be because the latter are more sus-
ceptible to the characteristics of nearby networks, whereas aborts
are a function of application or system behaviour. For instance, an
excessive proportion of machine 1's TCP connections expire because of
retransmission timeouts; this may be due to an unreliable link.
For machines 1, 2 and 4, the average percentage drop rate is 1.1%.
The average retransmission rate is 1.2%. The lowest percentage
drop rate is 0.8%, and the highest retransmission rate is 2.9%.
Heavens [Page 25]