Internet DRAFT - draft-bvenkat-mtu-tcpmss

draft-bvenkat-mtu-tcpmss



HTTP/1.1 200 OK
Date: Mon, 08 Apr 2002 23:03:00 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Fri, 29 Jan 1999 16:31:29 GMT
ETag: "2e791b-4a44-36b1e261"
Accept-Ranges: bytes
Content-Length: 19012
Connection: close
Content-Type: text/plain



INTERNET-DRAFT				  	          Balaji Venkat
<draft-bvenkat-mtu-tcpmss-03.txt>    HCL-Technologies India Pvt Limited,
Expires June 1999	        (HCL-Cisco software development center),
                                                         chennai, india
					  	          December 1998


			MTU discovery using TCP MSS
		and Discussion on MSS value in SYN reply  

Status of this memo

   This document is an Internet-draft. Internet-drafts are working 
   documents of the Internet Engineering Task Force (IETF), its areas, 
   and its working groups. Note that other groups may also distribute 
   working documents as Internet-drafts.

   Internet-drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced or obsoleted by other documents at any 
   time. It is inappropriate to use Internet-drafts as reference 
   material or cite them other than as " work in progress ".

   To learn the current status of any Internet-Draft, please check the 
   "lid- abstracts.txt" listing contained in the Internet-Drafts Shadow 
   directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),  
   munnari.oz.au (Pacific rim), ftp.ietf.org (US East coast), or 
   ftp.isi.edu (US West Coast).

   Distribution of this memo is unlimited.

Abstract

   Path MTU discovery as it exists now finds the least MTU of a given 
   path. Traceroute through IP option [3] provides a method for finding
   the MTU on each hop using an ICMP message as a reply from the target
   host, with output link MTU in a portion of the message. The method
   proposed in this document intends to find the outbound MTU on each 
   hop on an internet path, without using the ICMP message for 
   traceroute. This mechanism intends to acheive the same goal as the 
   traceroute through IP option, but through a different mechanism. 

   Discovery  of the MTU of each router on a internet path would serve 
   as a valuable network debugging tool. The way in which it is 
   proposed to be implemented, it has the advantage of being 
   automatically supported by all of the routers that support the TCP 
   layer. It has a couple of disadvantages that it generates quite a few 
   TCP packets and the amount of time it takes to run to discover each  
   MTU along the path is quite substantial.

   This document specifies the MTU discovery mechanism with the 
   existing IP and TCP options and the ICMP message types that 

Balaji                   Expires June 1999                   [ Page 1 ]

MTU Discovery                                            December 1998
 
   exist on all routers that support TCP layer in the internet. This
   method is suggested as an alternative to the Traceroute through
   IP option [3]. The intention is not to obsolete RFC 1393.

   This document also suggests that by default a reply SYN packet 
   from a target host should include a MSS value that is derived from 
   the MTU of the connected network of the outbound interface. 



Table of contents

   1. Introduction . . . . . . . . . . . . . . . . . . . . . . .2
   2. Path MTU discovery and MTU discovery today . . . . . . . .3
   3. MTU discovery (an alternative) . . . . . . . . . . . . . .3
   4. Leveraging from Traceroute . . . . . . . . . . . . . . . .3
   5. TCP Maximum segment size . . . . . . . . . . . . . . . . .4
   6. Basic Algorithm  . . . . . . . . . . . . . . . . . . . . .6
   7. Exceptions (where this wont work). . . . . . . . . . . . .7
   8. References . . . . . . . . . . . . . . . . . . . . . . . .7
   9. Author' s address  . . . . . . . . . . . . . . . . . . . .7


Acknowledgements

   This proposal is a product of the author's idea.

   The mechanism proposed here is a further enhancement of the RFC 1191 
   by Mogul & Deering [1].  It utilizes the TCP connection setup and 
   traceroute mechanisms (prior to RFC 1393) for achieving its purpose.


1. Introduction

   When a IP host transmits a datagram to a destination, the data is 
   transmitted as a series of IP datagrams. It is recommended that these 
   datagrams be of the largest size that does not require fragmentation 
   anywhere along the path from the source to the destination. (For a 
   further analysis of this topic, see [1]). This datagram is referred 
   to as the Path MTU (PMTU), and it is equal to the minimum of the MTUs
   of each hop in the path. 

   To discover the MTU of each hop on an internet path, there exists a
   traceroute with IP option mechanism as suggested by Malkin [3] that
   makes use of a ICMP message to get the output link MTU.

   The method suggested in this draft uses a method that  offers an 
   alternative mechanism (which is a combination of that employed by 
   traceroute prior to RFC 1393 [3] and the TCP connection setup) to
   the traceroute with IP option.





Balaji                    Expires June 1999                    [ Page 2 ]

MTU Discovery                                            December 1998

2. Path MTU discovery and MTU discovery today

   The technique as it exists today, involves using the Dont Fragment 
   bit  in the IP header to dynamically discover the PMTU of a path. The
   basic idea is that a source host initially assumes that the PMTU of a 
   path is the (known) MTU of its first hop, and sends all datagrams on 
   that path with the DF bit set. If any of the datagrams are too large 
   to be forwarded without fragmentation by some router along the path, 
   that router will discard them and return ICMP "Datagram too big " 
   message as per RFC 1191. Earlier to this the ICMP message sent was 
   Destination Unreachable message with a code meaning "Fragmentation 
   needed and DF set" [2]. 

   The PMTU process of discovery ends when the host's estimate of the 
   PMTU is low enough that its datagrams can be delivered without 
   fragmentation. Or, the host may elect to end the discovery process 
   by ceasing to set the DF bit in the datagram  headers; it may do 
   so for example, because it is willing to have datagrams fragmented 
   in some circumstances. Normally, the host continues to set DF in 
   all datagrams, so that if the route changes and the new PMTU is 
   lower it will be discovered.

   As per RFC 1191, if an intermediate router has a MTU lower than size 
   of the datagram and hence requires fragmentation, an ICMP message is 
   sent with a field in the IP header field in the message meaning 
   Datagram too big, that reports the MTU of the constricting hop.

   This method offers to provide the Path MTU and nothing more, in that 
   it does not report the MTU of each intervening hop in the path.

   MTU discovery today involves using the ICMP message "Traceroute" 
   to discover the MTU of each intermediate hop in an internet path.
   Setting an appropriate IP option (section 2.2 Malkin [3]) and 
   sending the datagram to the target hop acheives this and prompts
   the target hop to send the ICMP "Traceroute" message with the 
   output link MTU. 

3. MTU discovery (An alternative)

   The mechanism proposed in this draft, intends to find the MTU of 
   each intervening hop in a given path. This information would be 
   provided using a technique that is a combination of traceroute 
   prior to RFC 1393 and TCP connection setup. 

   The MTU discovery mechanism would gather the information regarding 
   each hop's MTU on a internet path and provide the same to the user 
   of this mechanism. 

4. Leveraging from  traceroute

   This utility would leverage off traceroute as it existed prior to
   RFC 1393, in finding the intermediate hops to a destination on a 
   given internet path. 


Balaji                   Expires June 1999                   [ Page 3 ]

MTU Discovery                                            December 1998

   Traceroute's algorithm would be required for that very purpose. This 
   would be done as specified by the RFC 792 using the TTL field in 
   the IP header [2]. This method does not intend to use the traceroute
   using IP option mechanism as suggested by Malkin [3]. In fact it
   intends to provide an alternative mechanism for discovering the
   MTU on each hop on a internet path.

5. TCP Maximum Segment Size.

   The other mechanism in this alternative method which would follow 
   up what is done by traceroute, would be the initial packet exchange 
   during the TCP connection setup. The maximum segment size (MSS) 
   is the largest chunk of data that TCP will send to the other end. 
   When a connection is established, each end can announce its MSS. 

   The resulting IP datagram is normally 40 bytes larger; 20 bytes for 
   the TCP header and 20 bytes for the IP header.

   When a connection is established, each end has the option of 
   announcing the MSS it expects to receive. The SYN segment sent in the 
   TCP connection setup contains the MSS option. If one end does not 
   receive an MSS from the other end, a default of 536 bytes is assumed. 

   When TCP sends a SYN segment, either because a local application 
   wants to initiate a connection, or when a connection request is 
   received from another host, it can send an MSS value up to the 
   outgoing interface's MTU, minus the size of the fixed TCP and IP 
   headers. For an Ethernet this implies an MSS of upto 1460 bytes. 

   The destination to which the connection is intended MAY then 
   announce its MSS value in the reply for the SYN. This 
   is a method discussed by Mogul & Deering [1]. Some implementations
   set the MSS value in the reply SYN segment to the minimum of 
   outbound interface MTU - 40 bytes and the default MSS (536) derived
   from the conservative maximum of 576.  Limiting the MSS value to a 
   minimum of the default MSS 536 and the value derived from MTU of 
   the connected network, would in fact cause an unnecessary limiting 
   of the segment to 536 bytes if in case the least MTU along the entire 
   path is greater than 576. Why do we need to limit the size of the 
   segment to that value which is lower than what is possible to be 
   transmitted without fragmentation ? Thus the suggestion would be to 
   always return the outbound MTU derived value of the MSS to the 
   connection seeking host. The suggestion gets its basis from what 
   is suggested in section 3 of RFC 1191 [1]. The suggestion made in 
   this draft slightly differs in its calculation of MSS from that 
   proposed by RFC 1191 [1].

   Section 3 of Mogul & Deering states "Actually, many 
   implementations always send an MSS option, but set the value to 
   536 if the destination is non-local. This behaviour was correct
   when the internet was full of hosts that did not follow the rule
   that datagrams larger than 576 octets should not be be sent to 
   non-local destinations. Now that most hosts do follow this rule,
   it is unnecessary to limit the value in the TCP MSS option to 

Balaji                 Expires June 1999                    [ Page 4 ]
   
MTU Discovery                                            December 1998

   536  for non-local peers. Moreover, doing this prevents PMTU 
   discovery from discovering PMTUs larger than 576, so hosts 
   SHOULD no longer lower the value they send in the MSS option.
   The MSS option should be 40 octets less than the size of the 
   largest datagram the host is able to reassemble (MMS_R, as 
   defined in [1]); in many cases, this will be the architectural
   limit of 65495 ( 65535 - 40 ) octets. A host MAY send an MSS 
   value  derived from the MTU of its connected network (the 
   maximum MTU over its connected networks, for a multi-homed
   host); this should not cause problems for PMTU discovery, and
   may dissuade a broken peer from sending enormous datagrams)."
   The suggestion made by RFC 1191 states that the MTU returned
   should be the maximum of the MTUs over the connected networks.
   But the relevance of returning the maximum MTU of connected networks
   for a request for a TCP connection over a path that might not
   possibly be that path over which the maximum MTU is configured,
   is brought into question. Let us suppose that the maximum MTU 
   of the connected networks in a host receiving a request for a
   TCP connection, belongs to an FDDI interface. If this FDDI
   interface is not the outbound interface for the packets 
   to be sent through the requested TCP connection from a source,
   then returning an MSS derived from this FDDI interface would
   be erroneously projecting the maximum segment size that can be sent
   by that host on the true outbound interface. In that sense 
   sending an MSS derived from maximum MTU of connected networks
   seems to be flawed. So we see that there is one set of 
   implementations that are at one end of the spectrum, that 
   always set the MSS for a non-local peer seeking connection
   to a conservative maximum of 536 and another set of 
   implementations at the other end of the spectrum, that set
   the MSS derived from the maximum of the MTUs of all of the 
   connected networks. There exists the median approach set of 
   implementations that set the MSS to the MTU of the outbound 
   interface. There are arguments for each of these but sadly a 
   lack of uniformity. There are arguments for the maximum MTU 
   approach that state that owing to assymetric nature of the 
   paths and rapidly changing routes, to set a hard limit on the size 
   of the datagram to be sent other than the maximum MTU, is a concern. 

   Thus a more median approach or  method would be to calculate the MSS
   value to be set in the MSS option in the SYN segment, based on
   the minimum of MTU (of the outbound interface) derived MSS and 
   default MSS, where default MSS would be equal to the largest 
   datagram the host can reassemble. But there is a problem here in 
   that setting 65495 would quite possibly tickle (Mogul & Deering [1]) 
   some IP implementations that have sign-bit bugs.

   Consider three hosts A , B and C connected in the manner shown in fig 
   1.0. Let us say the host C wants to initiate a TCP connection with 
   host A. The MTUs of the various networks are as shown.

   The SYN of the host C is sent with an MSS value of 1460 which is MTU 
   1500 - 40 bytes. In reply to this the host A stack responds with 
   MSS 256 which is the MTU of the outgoing interface on host A minus 

Balaji                   Expires June 1999                    [ Page 5 ]

MTU Discovery                                            December 1998

   40 bytes for the TCP and IP headers.


   +----------+                 +-----------+     MTU=1500 +-----------+ 
   |  host A  |-----------------|  host B   |--------------|   host C  |
   +----------+MTU=296  MTU=296 +-----------+ MTU= 1500    +-----------+

                                SYN <mss 1460>
          <------------------------------------------------------
                                SYN <mss 256>
          ------------------------------------------------------>

				Fig 1.0 SYN with MSS

   This mechanism offers a way to obtain the MTU of the interface on 
   each of the hops in a internet path. Utilizing this and the 
   traceroute's mechanism of identifying the intermediate hosts, it 
   would  be possible to discover the MTU of each hop in an internet 
   path.


6. Basic Algorithm.

   The basic algorithm for identifying the MTU on each hop would be to 
   traceroute the intermediate hops to a given destination. Storing 
   these values and then initiating a connection to the finger/http port 
   of each host through an iterative method with a MSS value of 65535
   in the outgoing SYN segment. 

   Once the connection initiation is done, the SYN packet would send the 
   source's MSS value and in reply the hop whose MTU is to be discovered 
   would reply with its MSS value. On obtaining it the MTU of the 
   outbound interface from that hop would be available by adding 40 to 
   the returned value in the MSS portion of the TCP header.

   Iteratively going through the list of hops the MTU of each hop would 
   be found. Once the MTU is computed a FIN packet would be sent to the 
   finger/http port of the target hop and the connection closed with 
   appropriate packets exchanged for connection closure.
   For those hops that do not support TCP layer as part of their stack 
   implementation, there would be either a timeout (if the hop does not 
   return a ICMP Unreachable error) from the source, or on the reciept 
   of the ICMP Unreachable from the IP layer, a default value of 576 
   would be assumed as the MTU for that hop. Thus a value of 576 bytes 
   returned would denote that the MTU discovery on that hop did not 
   work. Some implementations set the MSS value to 536 if the MTU 
   is more than 576 for non-local peers. In that case, effectively
   MTU would be assumed to be 576.

   If the finger/http port on a target hop is not available or if finger 
   or http port is not supported on that hop, it would be viable for the 
   discovery to try alternate ports of the kind that are available by 
   default on most routers and are kept open.


Balaji                    Expires June 1999                    [ Page 6 ]

MTU Discovery                                            December 1998

   A certain amount of overhead is expected in terms of TCP packet 
   exchanges everytime a connection is sought to be setup and torn down 
   for finding the MTU. 

7. Exceptions (Where this wont work)

   This algorithm wont work in certain cases. In case the implementation 
   returns 536 as MSS if the MTU of the interface on which the non-local 
   connection seeker is accessible is greater than 576, then the MTU would 
   have to be assumed to be 576 for that hop.
 
   In case the implementation returns the MSS as derived from the highest
   MTU of all its connected networks, then the highest MTU of its 
   connected networks would be the one shown as the MTU for the 
   outbound interface. This would indeed be erroneous if that MTU
   did not belong to the actual outbound interface.   

   But most implementations calculate the MSS from the MTU of the outbound
   interface. Thus in most cases this mechanism would succeed in giving
   the proper picture regarding the MTU. 
   
8. References

   [1] J.Mogul, S.Deering, Path MTU discovery RFC 1191, DECWRL and 
       Stanford University, November 1990.

   [2] J. Postel, Internet Control Message Protocol. RFC 792, SRI 
       Network Information Center, September 1981.

   [3] G.Malkin,  Traceroute using an IP option, RFC 1393, Xylogics. Inc, 
       January 1993.

9. Author's address

   V.Balaji Venkat
   HCL-Technologies India Pvt Limited,
   (HCL-Cisco software development center),
   49/50 Nelson Manickam road,
   Chennai - 600 029
   Tamil Nadu,
   India.

   Phone : 091-44-481 9939
   Fax   : 091-44-481 9938
   Email : bvenkat@cisco.com










Balaji                    Expires June 1999                    [ Page 7 ]