Internet DRAFT - draft-bvenkat-mtu-tcpmss
draft-bvenkat-mtu-tcpmss
HTTP/1.1 200 OK
Date: Mon, 08 Apr 2002 23:03:00 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Fri, 29 Jan 1999 16:31:29 GMT
ETag: "2e791b-4a44-36b1e261"
Accept-Ranges: bytes
Content-Length: 19012
Connection: close
Content-Type: text/plain
INTERNET-DRAFT Balaji Venkat
<draft-bvenkat-mtu-tcpmss-03.txt> HCL-Technologies India Pvt Limited,
Expires June 1999 (HCL-Cisco software development center),
chennai, india
December 1998
MTU discovery using TCP MSS
and Discussion on MSS value in SYN reply
Status of this memo
This document is an Internet-draft. Internet-drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute
working documents as Internet-drafts.
Internet-drafts are draft documents valid for a maximum of six months
and may be updated, replaced or obsoleted by other documents at any
time. It is inappropriate to use Internet-drafts as reference
material or cite them other than as " work in progress ".
To learn the current status of any Internet-Draft, please check the
"lid- abstracts.txt" listing contained in the Internet-Drafts Shadow
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific rim), ftp.ietf.org (US East coast), or
ftp.isi.edu (US West Coast).
Distribution of this memo is unlimited.
Abstract
Path MTU discovery as it exists now finds the least MTU of a given
path. Traceroute through IP option [3] provides a method for finding
the MTU on each hop using an ICMP message as a reply from the target
host, with output link MTU in a portion of the message. The method
proposed in this document intends to find the outbound MTU on each
hop on an internet path, without using the ICMP message for
traceroute. This mechanism intends to acheive the same goal as the
traceroute through IP option, but through a different mechanism.
Discovery of the MTU of each router on a internet path would serve
as a valuable network debugging tool. The way in which it is
proposed to be implemented, it has the advantage of being
automatically supported by all of the routers that support the TCP
layer. It has a couple of disadvantages that it generates quite a few
TCP packets and the amount of time it takes to run to discover each
MTU along the path is quite substantial.
This document specifies the MTU discovery mechanism with the
existing IP and TCP options and the ICMP message types that
Balaji Expires June 1999 [ Page 1 ]
MTU Discovery December 1998
exist on all routers that support TCP layer in the internet. This
method is suggested as an alternative to the Traceroute through
IP option [3]. The intention is not to obsolete RFC 1393.
This document also suggests that by default a reply SYN packet
from a target host should include a MSS value that is derived from
the MTU of the connected network of the outbound interface.
Table of contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . .2
2. Path MTU discovery and MTU discovery today . . . . . . . .3
3. MTU discovery (an alternative) . . . . . . . . . . . . . .3
4. Leveraging from Traceroute . . . . . . . . . . . . . . . .3
5. TCP Maximum segment size . . . . . . . . . . . . . . . . .4
6. Basic Algorithm . . . . . . . . . . . . . . . . . . . . .6
7. Exceptions (where this wont work). . . . . . . . . . . . .7
8. References . . . . . . . . . . . . . . . . . . . . . . . .7
9. Author' s address . . . . . . . . . . . . . . . . . . . .7
Acknowledgements
This proposal is a product of the author's idea.
The mechanism proposed here is a further enhancement of the RFC 1191
by Mogul & Deering [1]. It utilizes the TCP connection setup and
traceroute mechanisms (prior to RFC 1393) for achieving its purpose.
1. Introduction
When a IP host transmits a datagram to a destination, the data is
transmitted as a series of IP datagrams. It is recommended that these
datagrams be of the largest size that does not require fragmentation
anywhere along the path from the source to the destination. (For a
further analysis of this topic, see [1]). This datagram is referred
to as the Path MTU (PMTU), and it is equal to the minimum of the MTUs
of each hop in the path.
To discover the MTU of each hop on an internet path, there exists a
traceroute with IP option mechanism as suggested by Malkin [3] that
makes use of a ICMP message to get the output link MTU.
The method suggested in this draft uses a method that offers an
alternative mechanism (which is a combination of that employed by
traceroute prior to RFC 1393 [3] and the TCP connection setup) to
the traceroute with IP option.
Balaji Expires June 1999 [ Page 2 ]
MTU Discovery December 1998
2. Path MTU discovery and MTU discovery today
The technique as it exists today, involves using the Dont Fragment
bit in the IP header to dynamically discover the PMTU of a path. The
basic idea is that a source host initially assumes that the PMTU of a
path is the (known) MTU of its first hop, and sends all datagrams on
that path with the DF bit set. If any of the datagrams are too large
to be forwarded without fragmentation by some router along the path,
that router will discard them and return ICMP "Datagram too big "
message as per RFC 1191. Earlier to this the ICMP message sent was
Destination Unreachable message with a code meaning "Fragmentation
needed and DF set" [2].
The PMTU process of discovery ends when the host's estimate of the
PMTU is low enough that its datagrams can be delivered without
fragmentation. Or, the host may elect to end the discovery process
by ceasing to set the DF bit in the datagram headers; it may do
so for example, because it is willing to have datagrams fragmented
in some circumstances. Normally, the host continues to set DF in
all datagrams, so that if the route changes and the new PMTU is
lower it will be discovered.
As per RFC 1191, if an intermediate router has a MTU lower than size
of the datagram and hence requires fragmentation, an ICMP message is
sent with a field in the IP header field in the message meaning
Datagram too big, that reports the MTU of the constricting hop.
This method offers to provide the Path MTU and nothing more, in that
it does not report the MTU of each intervening hop in the path.
MTU discovery today involves using the ICMP message "Traceroute"
to discover the MTU of each intermediate hop in an internet path.
Setting an appropriate IP option (section 2.2 Malkin [3]) and
sending the datagram to the target hop acheives this and prompts
the target hop to send the ICMP "Traceroute" message with the
output link MTU.
3. MTU discovery (An alternative)
The mechanism proposed in this draft, intends to find the MTU of
each intervening hop in a given path. This information would be
provided using a technique that is a combination of traceroute
prior to RFC 1393 and TCP connection setup.
The MTU discovery mechanism would gather the information regarding
each hop's MTU on a internet path and provide the same to the user
of this mechanism.
4. Leveraging from traceroute
This utility would leverage off traceroute as it existed prior to
RFC 1393, in finding the intermediate hops to a destination on a
given internet path.
Balaji Expires June 1999 [ Page 3 ]
MTU Discovery December 1998
Traceroute's algorithm would be required for that very purpose. This
would be done as specified by the RFC 792 using the TTL field in
the IP header [2]. This method does not intend to use the traceroute
using IP option mechanism as suggested by Malkin [3]. In fact it
intends to provide an alternative mechanism for discovering the
MTU on each hop on a internet path.
5. TCP Maximum Segment Size.
The other mechanism in this alternative method which would follow
up what is done by traceroute, would be the initial packet exchange
during the TCP connection setup. The maximum segment size (MSS)
is the largest chunk of data that TCP will send to the other end.
When a connection is established, each end can announce its MSS.
The resulting IP datagram is normally 40 bytes larger; 20 bytes for
the TCP header and 20 bytes for the IP header.
When a connection is established, each end has the option of
announcing the MSS it expects to receive. The SYN segment sent in the
TCP connection setup contains the MSS option. If one end does not
receive an MSS from the other end, a default of 536 bytes is assumed.
When TCP sends a SYN segment, either because a local application
wants to initiate a connection, or when a connection request is
received from another host, it can send an MSS value up to the
outgoing interface's MTU, minus the size of the fixed TCP and IP
headers. For an Ethernet this implies an MSS of upto 1460 bytes.
The destination to which the connection is intended MAY then
announce its MSS value in the reply for the SYN. This
is a method discussed by Mogul & Deering [1]. Some implementations
set the MSS value in the reply SYN segment to the minimum of
outbound interface MTU - 40 bytes and the default MSS (536) derived
from the conservative maximum of 576. Limiting the MSS value to a
minimum of the default MSS 536 and the value derived from MTU of
the connected network, would in fact cause an unnecessary limiting
of the segment to 536 bytes if in case the least MTU along the entire
path is greater than 576. Why do we need to limit the size of the
segment to that value which is lower than what is possible to be
transmitted without fragmentation ? Thus the suggestion would be to
always return the outbound MTU derived value of the MSS to the
connection seeking host. The suggestion gets its basis from what
is suggested in section 3 of RFC 1191 [1]. The suggestion made in
this draft slightly differs in its calculation of MSS from that
proposed by RFC 1191 [1].
Section 3 of Mogul & Deering states "Actually, many
implementations always send an MSS option, but set the value to
536 if the destination is non-local. This behaviour was correct
when the internet was full of hosts that did not follow the rule
that datagrams larger than 576 octets should not be be sent to
non-local destinations. Now that most hosts do follow this rule,
it is unnecessary to limit the value in the TCP MSS option to
Balaji Expires June 1999 [ Page 4 ]
MTU Discovery December 1998
536 for non-local peers. Moreover, doing this prevents PMTU
discovery from discovering PMTUs larger than 576, so hosts
SHOULD no longer lower the value they send in the MSS option.
The MSS option should be 40 octets less than the size of the
largest datagram the host is able to reassemble (MMS_R, as
defined in [1]); in many cases, this will be the architectural
limit of 65495 ( 65535 - 40 ) octets. A host MAY send an MSS
value derived from the MTU of its connected network (the
maximum MTU over its connected networks, for a multi-homed
host); this should not cause problems for PMTU discovery, and
may dissuade a broken peer from sending enormous datagrams)."
The suggestion made by RFC 1191 states that the MTU returned
should be the maximum of the MTUs over the connected networks.
But the relevance of returning the maximum MTU of connected networks
for a request for a TCP connection over a path that might not
possibly be that path over which the maximum MTU is configured,
is brought into question. Let us suppose that the maximum MTU
of the connected networks in a host receiving a request for a
TCP connection, belongs to an FDDI interface. If this FDDI
interface is not the outbound interface for the packets
to be sent through the requested TCP connection from a source,
then returning an MSS derived from this FDDI interface would
be erroneously projecting the maximum segment size that can be sent
by that host on the true outbound interface. In that sense
sending an MSS derived from maximum MTU of connected networks
seems to be flawed. So we see that there is one set of
implementations that are at one end of the spectrum, that
always set the MSS for a non-local peer seeking connection
to a conservative maximum of 536 and another set of
implementations at the other end of the spectrum, that set
the MSS derived from the maximum of the MTUs of all of the
connected networks. There exists the median approach set of
implementations that set the MSS to the MTU of the outbound
interface. There are arguments for each of these but sadly a
lack of uniformity. There are arguments for the maximum MTU
approach that state that owing to assymetric nature of the
paths and rapidly changing routes, to set a hard limit on the size
of the datagram to be sent other than the maximum MTU, is a concern.
Thus a more median approach or method would be to calculate the MSS
value to be set in the MSS option in the SYN segment, based on
the minimum of MTU (of the outbound interface) derived MSS and
default MSS, where default MSS would be equal to the largest
datagram the host can reassemble. But there is a problem here in
that setting 65495 would quite possibly tickle (Mogul & Deering [1])
some IP implementations that have sign-bit bugs.
Consider three hosts A , B and C connected in the manner shown in fig
1.0. Let us say the host C wants to initiate a TCP connection with
host A. The MTUs of the various networks are as shown.
The SYN of the host C is sent with an MSS value of 1460 which is MTU
1500 - 40 bytes. In reply to this the host A stack responds with
MSS 256 which is the MTU of the outgoing interface on host A minus
Balaji Expires June 1999 [ Page 5 ]
MTU Discovery December 1998
40 bytes for the TCP and IP headers.
+----------+ +-----------+ MTU=1500 +-----------+
| host A |-----------------| host B |--------------| host C |
+----------+MTU=296 MTU=296 +-----------+ MTU= 1500 +-----------+
SYN <mss 1460>
<------------------------------------------------------
SYN <mss 256>
------------------------------------------------------>
Fig 1.0 SYN with MSS
This mechanism offers a way to obtain the MTU of the interface on
each of the hops in a internet path. Utilizing this and the
traceroute's mechanism of identifying the intermediate hosts, it
would be possible to discover the MTU of each hop in an internet
path.
6. Basic Algorithm.
The basic algorithm for identifying the MTU on each hop would be to
traceroute the intermediate hops to a given destination. Storing
these values and then initiating a connection to the finger/http port
of each host through an iterative method with a MSS value of 65535
in the outgoing SYN segment.
Once the connection initiation is done, the SYN packet would send the
source's MSS value and in reply the hop whose MTU is to be discovered
would reply with its MSS value. On obtaining it the MTU of the
outbound interface from that hop would be available by adding 40 to
the returned value in the MSS portion of the TCP header.
Iteratively going through the list of hops the MTU of each hop would
be found. Once the MTU is computed a FIN packet would be sent to the
finger/http port of the target hop and the connection closed with
appropriate packets exchanged for connection closure.
For those hops that do not support TCP layer as part of their stack
implementation, there would be either a timeout (if the hop does not
return a ICMP Unreachable error) from the source, or on the reciept
of the ICMP Unreachable from the IP layer, a default value of 576
would be assumed as the MTU for that hop. Thus a value of 576 bytes
returned would denote that the MTU discovery on that hop did not
work. Some implementations set the MSS value to 536 if the MTU
is more than 576 for non-local peers. In that case, effectively
MTU would be assumed to be 576.
If the finger/http port on a target hop is not available or if finger
or http port is not supported on that hop, it would be viable for the
discovery to try alternate ports of the kind that are available by
default on most routers and are kept open.
Balaji Expires June 1999 [ Page 6 ]
MTU Discovery December 1998
A certain amount of overhead is expected in terms of TCP packet
exchanges everytime a connection is sought to be setup and torn down
for finding the MTU.
7. Exceptions (Where this wont work)
This algorithm wont work in certain cases. In case the implementation
returns 536 as MSS if the MTU of the interface on which the non-local
connection seeker is accessible is greater than 576, then the MTU would
have to be assumed to be 576 for that hop.
In case the implementation returns the MSS as derived from the highest
MTU of all its connected networks, then the highest MTU of its
connected networks would be the one shown as the MTU for the
outbound interface. This would indeed be erroneous if that MTU
did not belong to the actual outbound interface.
But most implementations calculate the MSS from the MTU of the outbound
interface. Thus in most cases this mechanism would succeed in giving
the proper picture regarding the MTU.
8. References
[1] J.Mogul, S.Deering, Path MTU discovery RFC 1191, DECWRL and
Stanford University, November 1990.
[2] J. Postel, Internet Control Message Protocol. RFC 792, SRI
Network Information Center, September 1981.
[3] G.Malkin, Traceroute using an IP option, RFC 1393, Xylogics. Inc,
January 1993.
9. Author's address
V.Balaji Venkat
HCL-Technologies India Pvt Limited,
(HCL-Cisco software development center),
49/50 Nelson Manickam road,
Chennai - 600 029
Tamil Nadu,
India.
Phone : 091-44-481 9939
Fax : 091-44-481 9938
Email : bvenkat@cisco.com
Balaji Expires June 1999 [ Page 7 ]