Internet DRAFT - draft-elzur-iwarp-mpa-tcp-analysis

draft-elzur-iwarp-mpa-tcp-analysis






INTERNET DRAFT                                     Uri Elzur 
draft-elzur-iwarp-mpa-tcp-analysis-00.txt            Broadcom 
Expires: July, 2003                                Bob Teisberg 
                                                   Dwight Barron 
                                                   Paul Culley 
                                                     Hewlett-Packard 
                                                   Jim Pinkerton 
                                                     Microsoft 
                                                   John Carrier 
                                                     Adaptec 
                                                   February 2003 
    
                    Analysis of MPA over TCP Operations 

1  Status of this Memo 

   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026. 

   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that 
   other groups may also distribute working documents as Internet-
   Drafts. 

   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time.  It is inappropriate to use Internet-Drafts as 
   reference material or to cite them other than as "work in progress." 

   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt 
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 

2  Abstract 

   Further explanation and analysis of architectural recommendations 
   contained in the recent Internet-Draft, "Marker PDU Aligned Framing 
   for TCP Specification" [MPA], is provided.  The impact of the 
   following three attributes of MPA over TCP is examined:  

   *  packing of multiple DDP ULPDUs into one TCP segment;  

   *  simplifying the receiver due to transmitter alignment of DDP 
      headers with TCP headers; and  

   *  mistakenly attempting to interoperate an MPA-enabled endpoint 
      with a non-MPA-enabled endpoint. 



Elzur, et al.            Expires - August 2003                 [Page 1] 
                       Analysis of MPA over TCP           February 2003 


   Table of Contents 

   1    Status of this Memo.........................................1 
   2    Abstract....................................................1 
   3    Introduction................................................3 
   4    Definitions.................................................4 
   5    Assumptions.................................................6 
   5.1  MPA is layered beneath DDP [DDP]............................6 
   5.2  MPA preserves DDP message framing...........................6 
   5.3  The size of the ULPDU passed to MPA is less than EMSS under 
        normal conditions...........................................6 
   5.4  Out-of-order placement but NO out-of-order delivery.........6 
   6    No Packing..................................................7 
   7    The Value of Header Alignment..............................11 
   7.1  Impact of lack of Header Alignment on the receiver 
        computational load and complexity..........................12 
   7.2  Header Alignment effects on TCP wire protocol..............16 
   8    Interoperating between MPA applications and non-MPA 
        applications...............................................19 
   8.1  Negotiation of MPA-enabled mode............................20 
   8.2  Analysis of existing TCP services..........................23 
   8.2.1  The "Little" TCP Services................................23 
   8.2.2  ULPs Using Only Text Messages............................24 
   8.2.3  ULPs with Fixed Initial Message..........................25 
   8.2.4  Protocols with framed command headers....................25 
   9    Security Considerations....................................30 
   10   IANA Considerations........................................31 
   11   References.................................................32 
   11.1   Primary References.......................................32 
   11.2   "Little" TCP Services....................................32 
   11.3   ULPs using only Text Messages............................33 
   11.4   ULPs with Fixed Initial Message..........................34 
   11.5   ULPs with Framed Command Headers.........................34 
   12   Author's Addresses.........................................36 
   13   Acknowledgments............................................37 
   14   Full Copyright Statement...................................38 
    

   Table of Figures 

   Figure 1: Non-aligned FPDU freely placed in TCP octet stream....13 
   Figure 2: Aligned FPDU placed immediately after TCP header......15 
   Figure 3: MPA Enablement Error..................................19 
   Figure 4: MPA Transition via Well Known Port or Service Location 
             Protocol..............................................21 
   Figure 5: MPA Transition via Octet Stream Negotiation...........22 
   Figure 6: Effect of Improperly MPA-Enabled DNS Resolver.........27 
    



Elzur, et al.            Expires - August 2003                 [Page 2] 
                       Analysis of MPA over TCP           February 2003 


3  Introduction  

   This paper analyzes the impact of MPA (Marker PDU Aligned Framing 
   for TCP [MPA]) on the TCP sender, receiver, and wire protocol.  

   One of MPA's high level goals is to provide enough information, when 
   combined with the Direct Data Placement Protocol [DDP], to enable 
   out-of-order placement of DDP payload into the final Upper Layer 
   Protocol (ULP) buffer. Note that DDP separates the act of placing 
   data into a ULP buffer from that of notifying the ULP that the ULP 
   buffer is available for use. In DDP terminology, the former is 
   defined as "Placement", and the later is defined as "Delivery". MPA 
   supports in-order delivery of the data to the ULP, including support 
   for direct data placementin the final ULP buffer location when TCP 
   segments arrive out-of-order. Effectively, the goal is to use the 
   pre-posted ULP buffers as the TCP receive buffer, where the 
   reassembly of the ULP Protocol Data Unit (PDU) by TCP (with MPA and 
   DDP) is done in place, in the ULP buffer, with no data copies. 

   The paper walks through the advantages and disadvantages of the two 
   main TCP sender modifications proposed by MPA: 

   1) that MPA require the TCP sender to do "Header Alignment", where a 
   TCP segment is required to begin with an MPA Framing Protocol Data 
   Unit (FPDU) (if there is payload present) and that there be an 
   integral number of FPDUs in a TCP segment (under conditions where 
   the Path MTU is not changing). 

   2) that MPA require "no packing" of FPDUs -- i.e. exactly zero or 
   one FPDUs are present in a single TCP segment. 

   The paper concludes that the worst case analysis for "no packing" is 
   bad enough that it out-weighs the advantages and should be removed 
   from the MPA specification.  

   The paper also concludes that the scaling advantages of Header 
   Alignment are strong, based primarily on fairly drastic TCP receive 
   buffer reduction requirements and simplified receive handling. The 
   analysis also shows that there is little effect to TCP wire 
   behavior. 

   Finally, the paper examines interoperability issues between an 
   unmodified TCP stack and a modified TCP stack, for a wide variety of 
   applications and a wide variety of combinations. 







Elzur, et al.            Expires - August 2003                 [Page 3] 
                       Analysis of MPA over TCP           February 2003 


4  Definitions 

   DDP - Direct Data Placement Protocol [DDP] 

   Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined 
       as the process of informing the ULP or consumer that a 
       particular Message is available for use.  This is specifically 
       different from "Placement", which may generally occur in any 
       order, while the order of "Delivery" is strictly defined. See 
       "Data Placement".   

   Data Placement (Placement, Placed, Places) - For DDP, this term is 
       specifically used to indicate the process of writing to a data 
       buffer by a DDP implementation.  DDP Segments carry Placement 
       information, which may be used by the receiving DDP 
       implementation to perform Data Placement of the DDP Segment ULP 
       Payload. See "Data Delivery". 

   EMSS - Effective Maximum Segment Size.  EMSS is the smaller of the 
       TCP maximum segment size (MSS) [RFC0793], and the current Path 
       Maximum Transfer Unit (PMTU) [RFC1191]. 

   FPDU - Framing Protocol Data Unit.  The unit of data created by a  
       ULP utilizing the MPA framing protocol. A complete MPA FPDU 
       includes the MPA length, MPA payload, MPA CRC and potentially 
       Markers as appropriate. 

   Header Alignment  - the property that a TCP segment begins with an 
       FPDU and the TCP segment includes an integer number of FPDUs. 

   MPA - the protocol defined by the "Marker PDU Aligned Framing for 
       TCP Specification" [MPA]. 

   MPA-aware TCP - a TCP implementation that is aware of the receiver 
       efficiencies of MPA Header Alignment and is capable of sending 
       TCP segments that begin with an FPDU. 

   MPA-enabled -  MPA is enabled if the MPA protocol is visible on the 
       wire.  When the sender is MPA-enabled, it is inserting framing 
       and markers.  When the receiver is MPA-enabled, it is 
       interpreting framing and markers. 

   MULPDU - Maximum ULPDU. The current maximum size of the record that 
       is acceptable for DDP to pass to MPA for transmission. 

   PDU - protocol data unit 





Elzur, et al.            Expires - August 2003                 [Page 4] 
                       Analysis of MPA over TCP           February 2003 


   ULP - Upper Layer Protocol. The protocol layer above the protocol 
       layer currently being referenced. The ULP for MPA is DDP [DDP].  
       ULPs may be classified as Passive if they are awaiting a 
       connection request or Active if they are initiating a connection 
       request.  

   ULPDU - Upper Layer Protocol Data Unit.  The data record defined by 
       the layer above MPA (DDP).  ULPDU corresponds to DDP's "DDP 
       Segment". 

 








































Elzur, et al.            Expires - August 2003                 [Page 5] 
                       Analysis of MPA over TCP           February 2003 


5  Assumptions 

5.1  MPA is layered beneath DDP [DDP] 

   MPA is an adaptation layer between DDP and TCP.  DDP requires 
   preservation of DDP segment boundaries and a CRC32C digest covering 
   the DDP header and data.   MPA adds these features to the TCP stream 
   so that DDP over TCP has the same basic properties as DDP over SCTP. 

5.2  MPA preserves DDP message framing 

   MPA was designed as a framing layer specifically for DDP and was not 
   intended as a general-purpose framing layer for any other ULP using 
   TCP.   

   A framing layer allows ULPs using it to receive indications from the 
   transport layer only when complete ULPDUs are present.  As a framing 
   layer, MPA is not aware of the content of the DDP PDU, only that it 
   has received and, if necessary, reassembled a complete PDU for 
   delivery to the DDP.   

5.3  The size of the ULPDU passed to MPA is less than EMSS under normal 
     conditions  

   To make reception of a complete DDP PDU on every received segment 
   possible, DDP passes to MPA a PDU that is no larger than the EMSS of 
   the underlying fabric. Each FPDU that MPA creates contains 
   sufficient information for the receiver to directly place the ULP 
   payload in the correct location in the correct receive buffer.  

   Edge cases when this condition does not occur are dealt with, but do 
   not need to be on the fast path 

5.4  Out-of-order placement but NO out-of-order delivery 

   DDP receives complete DDP PDUs from MPA.  Each DDP PDU contains the 
   information necessary to place its ULP payload directly in the 
   correct location in host memory. 

   Because each DDP segment is self-describing, it is possible for DDP 
   segments received out of order to have their ULP payload placed 
   immediately in the ULP receive buffer.  

   Data delivery to the ULP is guaranteed to be in the order the data 
   was sent.  DDP only indicates data delivery to the ULP after TCP has 
   acknowledged the complete byte stream.   

    



Elzur, et al.            Expires - August 2003                 [Page 6] 
                       Analysis of MPA over TCP           February 2003 


6  No Packing 

   MPA as originally proposed requires an MPA-aware TCP sender to 
   segment the datastream in such a way that each TCP segment contains 
   a single FPDU.  This requirement is referred to in this document as 
   the "no packing" rule.  Let us examine the costs and benefits of the 
   "no packing" rule. 

   The Header Alignment rule means that Placement information is 
   guaranteed to immediately follow the TCP header in the typical case 
   (See Section 7, The Value of Header Alignment, page 11, for the 
   analysis of the value of Header Alignment). If the "no packing" rule 
   is in effect, it further guarantees (in the typical case) that 
   because no additional FPDUs will be in the TCP payload, the receiver 
   does not have to look for additional placement information within 
   the TCP payload. This allows the receiver logic to be simplified. 
   For instance, the receiver logic needs to support one context lookup 
   per frame and one data movement operation per frame. Only in the 
   instance of a PMTU change is it necessary to examine the remainder 
   of the TCP payload for DDP headers.  Because PMTU changes are 
   presumed to be rare, the latter case can be delegated to a "slow 
   path" processing mode, while the "fast path" for the common case can 
   be extremely simple.  

   The original argument for "no packing" also examined typical ULP 
   behavior for applications expected to see strong advantages from 
   Direct Data Placement -- specifically transaction based applications 
   or throughput oriented applications. Request/response protocols 
   typically send one FPDU per TCP segment. A response may be short or 
   quite long, but in any case would fill all TCP segments up to the 
   last one, providing TCP segmentation behavior similar to an 
   unmodified TCP stack. A similar argument applies to ULPs optimized 
   for throughput, which send long, uninterrupted sequences of PMTU-
   sized FPDUs.  

   Thus for many applications the rule has no effect on TCP 
   segmentation, and it enabled simplified receive logic because the 
   receiver did not have to peak into the TCP segment at some arbitrary 
   offset to find the MPA/DDP headers.  

   On the other hand, a ULP which sends long sequences of small FPDUs 
   is strongly affected by the "no packing" rule. 

   Several specific consequences of the "no packing" rule deserve 
   detailed discussion. 

   The "no packing" rule tends to increase the total number of TCP 
   segments transmitted.  In the best case, as noted earlier, the 
   number of segments is unchanged.  In the worst case, where all 


Elzur, et al.            Expires - August 2003                 [Page 7] 
                       Analysis of MPA over TCP           February 2003 


   ULPDUs are 1 octet long, the number of segments increases by dozens 
   of times.  We will show the penalty for the "no packing" rule for 
   networks using the full Ethernet frames or the smaller default IP 
   frame size.   

   For both calculations, the Minimum FPDU Size (MinFPDUSize) for the 
   worst case ULPDU is the sum of the MPA Header Size (MPAHdrSize), the 
   DDP Header Size (DDPHdrSize), the worst case DDP Payload Size 
   (DDPPayLd), the number of MPA Pad octets (MPAPad) needed to make the 
   FPDU size a multiple of 4, and the MPA CRC size (MPACRC): 

           MPAHdrSize   = 2 octets 

           DDPHdrSize   = 14 octets 

           DDPPayld     = 1 octet 

           MPAPad       = 3 octets 

           MPACRC       = 4 octets 

           MinFPDUSize  = MPAHdrSize + DDPHdrSize + DDPPayld + MPAPad 
                         + MPACRC  

                        = 2 + 14 + 1 + 3 + 4 

                        = 24 

   1.  Ethernet frames 

       a.  The expected Number of Markers in an Ethernet frame 
           (EthNMarkers) is calculated by dividing the MPA Marker 
           Interval (MPAMrkIntvl) into the Ethernet Frame Size 
           (EthFrmSize): 

           MPAMrkIntvl  = 512 octets 

           EthFrmSize   = 1460 octets of Ethernet payload 

           EthNMarkers  =~ EthFrmSize / MPAMrkIntvl 

                        =~ 2.9 

            







Elzur, et al.            Expires - August 2003                 [Page 8] 
                       Analysis of MPA over TCP           February 2003 


       b.  The expansion of FPDUs into an Ethernet frame (EthExpansion) 
           is the number of times MinFPDUSize octets can be put into an 
           Ethernet Frame (EthFrmSize) after removing the number of 
           octets consumed by markers (MPAMrkSize * EthNMarkers) in the 
           frame:  

           MPAMrkSize   = 4 octets 

           EthExpansion = (EthFrmSize - MPAMrkSize * EthNMarkers) /    
                           MinFPDUSize 

                        =~ (1460 - 4 * 2.9) / 24 

                        =~ 60 

   2.  Default IP packets  

       a.  The expected Number of Markers in the Default IP packet 
           (DefNMarkers) is calculated by dividing the MPA Marker 
           Interval (MPAMrkIntvl) into the Default IP Packet Size 
           (DefPktSize): 

           DefPktSize   = 536 octets 

           DefNMarkers  = DefPktSize / MPAMrkIntvl 

                        =~ 1.05 

       b.  The expansion of FPDUs into the Default IP Packet 
           (DefExpansion) is the number of times MinFPDUSize octets can 
           be put into an IP Packet (DefPktSize) after removing the 
           number of octets consumed by markers (MPAMrkSize * 
           DefNMarkers) in the packet: 

           DefExpansion = (DefPktSize - MPAMrkSize * DefNMarkers) / 
                           MinFPDUSize 

                        =~ (536 - 4 * 1.05) / 24 

                        =~ 22 

   In the worst case where all ULPDU's are one octet long, the "no 
   packing" rule forces the transmission of roughly 60 times as many 
   packets on Ethernet as MPA with packing allowed.  Even with smaller 
   IP packets, the "no packing" rule would force using more than 20 
   times as many packets. 





Elzur, et al.            Expires - August 2003                 [Page 9] 
                       Analysis of MPA over TCP           February 2003 


   Clearly the effect of the "no packing" rule on the number of extra 
   packets depends on the nature of the ULP and workload.  The worst 
   case involves protocols that send long sequences of small ULPDUs.  
   The existence of protocols such as telnet [RFC0854] shows that at 
   least in some cases real-world applications may tend to approach the 
   worst case. 

   As a direct consequence of the increased number of data segments 
   transmitted, the number of TCP ACKs increases proportionally.  A 
   series of minimum-sized ULPDUs which could have been packed into two 
   TCP segments on an Ethernet network, prompting a single ACK in 
   response, would consume 120 segments with the "no packing" rule in 
   effect, resulting in as many as 60 ACKs. 

   Dividing a datastream into a large number of small segments impairs 
   the efficiency of the slow start algorithm.  While the number of 
   packets necessary to reach an efficient line utilization is the same 
   as in a conventional TCP implementation, the total payload 
   transmitted during slow start is reduced substantially.  In other 
   words, a sender obeying the "no packing" rule could pay a 
   substantial performance penalty during the slow start phase if long 
   sequences of small ULPDUs are used. 

   It is clear from this analysis that the drawbacks of the "no 
   packing" rule are substantial, but the benefits are small.  The "no 
   packing" requirement should be changed to state that MPA-aware TCP MAY 
   support packing at the transmitter and MUST support packing at the receiver. 
   However, as noted above, certain applications (e.g., transaction-
   based applications) will prioritize minimal latency over maximum 
   wire efficiency. In such scenarios it is anticipated there will be 
   minimal opportunity for packing at the transmitter, and receivers 
   may choose to optimize their performance for this anticipated 
   behavior.  









    








Elzur, et al.            Expires - August 2003                [Page 10] 
                       Analysis of MPA over TCP           February 2003 


7  The Value of Header Alignment 

   Significant receiver optimizations can be achieved when Header 
   Alignment and complete FPDUs are the common case. The optimizations 
   allow utilizing significantly fewer buffers on the receiver and less 
   computation per FPDU. The net effect is the ability to build a 
   "Flow-Through" receiver that enables TCP-based solutions to scale to 
   10G and beyond in an economical way. The optimizations are 
   especially relevant to hardware implementations of receivers that 
   process multiple protocol layers - Data Link Layer (e.g., Ethernet), 
   Network and Transport Layer (e.g., TCP/IP), and even some ULP on top 
   of TCP (e.g., MPA/DDP). As network speed increases, there is an 
   increasing desire to use a hardware based receiver in order to 
   achieve an efficient high performance solution.  

   A TCP receiver, under worst case conditions, has to allocate buffers 
   (BufferSizeTCP) whose capacities are a function of the bandwidth-
   delay product. Thus: 

        BufferSizeTCP = K * bandwidth [octets/S] * Delay [S].  

   Where bandwidth is the end-to-end bandwidth of the connection, delay 
   is the round trip delay of the connection, and K is an 
   implementation dependent constant. 

   Thus BufferSizeTCP scales with the end-to-end bandwidth (10x more 
   buffers for a 10x increase in end-to-end bandwidth). As this 
   buffering approach may scale poorly for hardware or software 
   implementations alike, several approaches allow reduction in the 
   amount of buffering required for high-speed TCP communication.  

   The MPA/DDP approach is to enable the ULP's buffer to be used as the 
   TCP receive buffer. If the application pre-posts a sufficient amount 
   of buffering, and each TCP segment has sufficient information to 
   place the payload into the right application buffer, when an out-of-
   order TCP segment arrives it could potentially be placed directly in 
   the ULP buffer. However, placement can only be done when a complete 
   FPDU with the placement information is available to the receiver, 
   and the FPDU contents contain enough information to place the data 
   into the correct ULP buffer (e.g., there is a DDP header available).  

   For the case when the FPDU is not aligned with the TCP segment, it 
   may take, on average, 2 TCP segments to assemble one FPDU. 
   Therefore, the receiver has to allocate BufferSizeNAF (Buffer Size, 
   Non-Aligned FPDU) octets: 

       BufferSizeNAF = K1* EMSS * number_of_connections + K2 * EMSS 




Elzur, et al.            Expires - August 2003                [Page 11] 
                       Analysis of MPA over TCP           February 2003 


   Where K1 and K2 are implementation dependent constants and EMSS is 
   the effective maximum segment size.  

   For example, a 1 Gbps link with 10,000 connections and an EMSS of 
   1500B would require 15 MB of memory. Often the number of connections 
   used scales with the network speed, aggravating the situation for 
   higher speeds.  

   A Header Aligned FPDU would allow the receiver to allocate 
   BufferSizeAF (Buffer Size, Aligned FPDU) octets:  

       BufferSizeAF = K2 * EMSS  

   for the same conditions. A Header Aligned receiver may require 
   memory in the range of ~100s of KB - which is feasible for an on-
   chip memory and enables a "Flow-Through" design, in which the data 
   flows through the NIC and is placed directly in the destination 
   buffer. Assuming most of the connections support Header Alignment, 
   the receiver buffers no longer scale with number of connections.  

   Additional optimizations can be achieved in a balanced I/O sub-
   system -- where the system interface of the network controller 
   provides ample bandwidth as compared with the network bandwidth. For 
   almost twenty years this has been the case and the trend is expected 
   to continue - while Ethernet speeds have scaled by 1000 (from 10 
   megabit/sec to 10 gigabit/sec), I/O bus bandwidth of volume CPU 
   architectures has scaled from ~2 MB/sec to ~2 GB/sec (PC-XT bus to 
   PCI-X DDR). Under these conditions, the Header Aligned FPDU approach 
   allows BufferSizeAF to be indifferent to network speed. It is 
   primarily a function of the local processing time for a given frame. 
   Thus when the Header Aligned FPDU approach is used, receive 
   buffering is expected to scale gracefully (i.e. less than linear 
   scaling) as network speed is increased. 

    

7.1  Impact of lack of Header Alignment on the receiver computational 
     load and complexity 

   The receiver must perform IP and TCP processing, and then perform 
   FPDU CRC checks, before it can trust the FPDU header placement 
   information. For simplicity of the description, the assumption is 
   that a FPDU is carried in no more than 2 TCP segments. In reality, 
   with no Header Alignment, an FPDU can be carried by more than 2 TCP 
   segments (e.g., if the PMTU was reduced). 

    




Elzur, et al.            Expires - August 2003                [Page 12] 
                       Analysis of MPA over TCP           February 2003 


   ----++-----------------------------++-----------------------++----- 
   +---||---------------+    +--------||--------+   +----------||----+ 
   |   TCP Seg X-1      |    |     TCP Seg X    |   |  TCP Seg X+1   | 
   +---||---------------+    +--------||--------+   +----------||----+ 
   ----++-----------------------------++-----------------------++----- 
                   FPDU #N-1                  FPDU #N 

       Figure 1: Non-aligned FPDU freely placed in TCP octet stream 

   The receiver algorithm for processing TCP segments (e.g., TCP 
   segment #X in Figure 1: Non-aligned FPDU freely placed in TCP octet 
   stream) carrying non-aligned FPDUs (in-order or out-of-order) 
   includes: 

    

   1.  Data Link Layer processing (whole frame) - typically including a 
       CRC calculation. 

   2.  Network Layer processing (assuming not an IP fragment, the whole 
       Data Link Layer frame contains one IP datagram. IP fragments 
       should be reassembled in a local buffer. This is not a 
       performance optimization goal) 

   3.  Transport Layer processing -- TCP protocol processing, header 
       and checksum checks.  

       a.  Classify incoming TCP segment using the 5 tuple (IP SRC, IP 
           DST, TCP SRC Port, TCP DST Port, protocol) 

   4.  Find FPDU message boundaries.  

       a.  Get MPA state information for the connection 

           i.  If the TCP segment is in-order, use the receiver managed 
               MPA state information to calculate where the previous 
               FPDU message (#N-1) ends in the current TCP segment X. 
               (previously, when the MPA receiver processed the first 
               part of FPDU #N-1, it calculated the number of bytes 
               remaining to complete FPDU #N-1 by using the MPA Length 
               field).  

               .1. Get the stored partial CRC for FPDU #N-1  

               .2. Complete CRC calculation for FPDU #N-1 data (first 
                   portion of TCP segment #X) 

               .3. Check CRC calculation for FPDU #N-1  



Elzur, et al.            Expires - August 2003                [Page 13] 
                       Analysis of MPA over TCP           February 2003 


               .4. If no FPDU CRC errors, placement is allowed 

               .5. Locate the local buffer for the first portion of 
                   FPDU#N-1, CopyData(local buffer of first portion of 
                   FPDU #N-1, host buffer address, length) 

               .6. Compute host buffer address for second portion of 
                   FPDU #N-1 

               .7. CopyData (local buffer of second portion of FPDU #N-
                   1, host buffer address for second portion, length)  

               .8. Calculate the octet offset into the TCP segment for 
                   the next FPDU #N. 

               .9. Start Calculation of CRC for available data for FPDU 
                   #N 

               .10. Store partial CRC results for FPDU #N 

               .11. Store local buffer address of first portion of FPDU 
                   #N 

               .12. No further action is possible on FPDU #N, before it 
                   is completely received 

           ii. If TCP out-of-order, receiver must buffer the data until 
               at least one complete FPDU is received. Typically 
               buffering for more than one TCP segment per connection 
               is required. Use the MPA based Markers to calculate 
               where FPDU boundaries are.  

               .1. When a complete FPDU is available, a similar 
                   procedure to the in-order algorithm above is used. 
                   There is additional complexity, though, because when 
                   the missing segment arrives, this TCP segment must 
                   be run through the CRC engine after the CRC is 
                   calculated for the missing segment.  

   If we assume Header Alignment, the following diagram and the 
   algorithm below apply. Note that when using MPA, the receiver is 
   assumed to actively detect presence or loss of Header Alignment for 
   every TCP segment received. 

    






Elzur, et al.            Expires - August 2003                [Page 14] 
                       Analysis of MPA over TCP           February 2003 


      +--------------------------+      +--------------------------+ 
   +--|--------------------------+   +--|--------------------------+ 
   |  |       TCP Seg X          |   |  |         TCP Seg X+1      | 
   +--|--------------------------+   +--|--------------------------+ 
      +--------------------------+      +--------------------------+ 
                FPDU #N                          FPDU #N+1 

        Figure 2: Aligned FPDU placed immediately after TCP header 

   The receiver algorithm for Header Aligned frames (in-order or out-
   of-order) includes: 

    

   1.  Data Link Layer processing (whole frame) - typically including a 
       CRC calculation. 

   2.  Network Layer processing (assuming not an IP fragment, the whole 
       Data Link Layer frame contains one IP datagram. IP fragments 
       should be reassembled in a local buffer. This is not a 
       performance optimization goal) 

   3.  Transport Layer processing -- TCP protocol processing, header 
       and checksum checks.  

       a.  Classify incoming TCP segment using the 5 tuple (IP SRC, IP 
           DST, TCP SRC Port, TCP DST Port, protocol) 

   4.  Check for Header Alignment. (Described in detail in [MPA] 
       section 7.4). Assuming Header Alignment for the rest of the 
       algorithm below.  

       a.  If the header is not aligned, see the algorithm defined in 
           the prior section. 

   5.  If TCP is in-order or out-of-order the MPA header is at the 
       beginning of the current TCP payload. Get the FPDU length from 
       the FPDU header.  

   6.  Calculate CRC over FPDU 

   7.  Check CRC calculation for FPDU #N 

   8.  If no FPDU CRC errors, placement is allowed 

   9.  CopyData(TCP segment #X, host buffer address, length) 

   10. Loop to #5 until all the FPDUs in the TCP segment are consumed 
       in order to handle FPDU packing (see section 6). 


Elzur, et al.            Expires - August 2003                [Page 15] 
                       Analysis of MPA over TCP           February 2003 


   Implementation note: In both cases the receiver has to classify the 
   incoming TCP segment and associate it with one of the flows it 
   maintains. In the case of no Header Alignment, the receiver is 
   forced to classify incoming traffic before it can calculate the FPDU 
   CRC. In the case of Header Alignment the operations order is left to 
   the implementor. 

   The Header Aligned receiver algorithm is significantly simpler. 
   There is no need to locally buffer portions of FPDUs. Accessing 
   state information is also substantially simplified -  the normal 
   case does not require retrieving information to find out where a 
   FPDU starts and ends or retrieval of a partial CRC before the CRC 
   calculation can commence. This avoids adding internal latencies, 
   having multiple data passes through the CRC machine, or scheduling 
   multiple commands for moving the data to the host buffer.  

   The aligned FPDU approach is useful for in-order and out-of-order 
   reception. The receiver can use the same mechanisms for data storage 
   in both cases, and only needs to account for when all the TCP segments have 
   arrived to enable delivery. . The Header Alignment, along with the high 
   probability that at least one complete FPDU is found with every TCP 
   segment, allows the receiver to perform data placement for out-of-
   order TCP segments with no need for intermediate buffering. 
   Essentially the TCP receive buffer has been eliminated and TCP 
   reassembly is done in place within the ULP buffer. 

   In case Header Alignment is not found, the receiver should follow 
   the algorithm for non aligned FPDU reception which may be slower and 
   less efficient. 

7.2  Header Alignment effects on TCP wire protocol 

   An MPA-aware TCP exposes its EMSS to MPA.  MPA uses the EMSS to 
   calculate its MULPDU, which it then exposes to DDP, its ULP.  DDP 
   uses the MULPDU to segment its payload so that each FPDU sent by MPA 
   fits completely into one TCP segment. This has no impact on wire 
   protocol and exposing this information is already supported on many 
   TCP implementations, including all modern flavors of BSD networking, 
   through the TCP_MAXSEG socket option. 

   In the common case, the ULP (i.e. DDP over MPA) messages provided to 
   the TCP layer are segmented to MULPDU size. It is assumed that the 
   ULP message size is bounded by MULPDU, such that a single ULP 
   message can be encapsulated in a single TCP segment. Therefore, in 
   the common case, there is no increase in the number of TCP segments 
   emitted. For smaller ULP messages, the sender can also apply 
   packing, i.e. the sender packs as many complete FPDUs as possible 
   into one TCP segment (See Section 6, No Packing, on page 7). The 
   requirement to always have a complete FPDU may increase the number 


Elzur, et al.            Expires - August 2003                [Page 16] 
                       Analysis of MPA over TCP           February 2003 


   of TCP segments emitted. Typically, a ULP message size varies from 
   few bytes to multiple EMSS (e.g., 64 Kbytes). In some cases the ULP 
   may post more than one message at the time for transmission, giving 
   the sender an opportunity for packing. In the case where more than 
   one FPDU is available for transmission and the FPDUs are 
   encapsulated into a TCP segment and there is no room in the TCP 
   segment to include the next complete FPDU, another TCP segment is 
   sent. In this corner case some of the TCP segments are not full 
   size. In the  worst case scenario, the ULP may choose  a FPDU size 
   that is EMSS/2 +1 and has multiple messages available for 
   transmission. For this poor choice of FPDU size,  the average TCP 
   segment size is therefore about 1/2 of the EMSS and the number of 
   TCP segments emitted is approaching 2x of what is possible without 
   the requirement to encapsulate an integer number of complete FPDUs 
   in every TCP segment. This is a dynamic situation that only lasts 
   for the duration where the sender ULP has multiple non-optimal 
   messages for transmission and this causes a minor impact on the wire 
   utilization.  

   However, it is not expected that requiring Header Alignment will 
   have a measurable impact on wire behavior of most applications. 
   Throughput applications with large I/Os are expected to take full 
   advantage of the EMSS.  Another class of applications with many 
   small outstanding buffers (as compared to EMSS) is expected to use 
   packing when applicable. Transaction oriented applications are also 
   optimal. 

   TCP retransmission is another area that can affect sender behavior. 
   TCP supports retransmission of the exact, originally transmitted 
   segment (see [RFC0793] section 2.6, [RFC0793] section 3.7 "managing 
   the window" and [RFC1122] section 4.2.2.15 ). In the unlikely event 
   that part of the original segment has been received and acknowledged 
   by the remote peer (e.g., a resegmenting middlebox, as documented in 
   [MPA]), a better available bandwidth utilization may be possible by 
   re-transmitting only the missing octets. If an MPA-aware TCP 
   retransmits complete FPDUs, there may be some marginal bandwidth 
   loss. 

   Another area where a change in the TCP segment number may have 
   impact is that of Slow Start and Congestion Avoidance. Slow-start 
   exponential increase is measured in segments per second, as the 
   algorithm focuses on the overhead per segment at the source for 
   congestion that eventually results in dropped segments. Slow-start 
   exponential bandwidth growth for MPA-aware TCP is similar to any TCP 
   implementation. Congestion Avoidance allows for a linear growth in 
   available bandwidth when recovering after a packet drop. Similar to 
   the analysis for slow-start, MPA-aware TCP doesn't change the 
   behavior of the algorithm. Therefore the average size of the segment 
   versus EMSS is not a major factor in the assessment of the bandwidth 


Elzur, et al.            Expires - August 2003                [Page 17] 
                       Analysis of MPA over TCP           February 2003 


   growth for a sender. Both Slow Start and Congestion Avoidance for an 
   MPA-aware TCP will behave similarly to any TCP sender and allow an 
   MPA-aware TCP to enjoy the theoretical performance limits of the 
   algorithms. 

   In summary, the ULP messages generated at the sender (e.g., the 
   amount of messages grouped for every transmission request) and 
   message size distribution has the most significant impact over the 
   number of TCP segments emitted. The worst case effect for certain 
   ULPs (with average message size of EMSS/2+1 to EMSS), is bounded by 
   an increase of up to 2x in the number of TCP segments and 
   acknowledges.  In reality the effect is expected to be marginal.  

   See the MPA specification for additional documentation on corner 
   cases which are expected to lose Header Alignment and cause the 
   previously documented algorithm to be executed. 



































Elzur, et al.            Expires - August 2003                [Page 18] 
                       Analysis of MPA over TCP           February 2003 


8  Interoperating between MPA applications and non-MPA applications  

   ULPs that use MPA are required to enable MPA at an agreed-upon point 
   in the TCP datastream.  If they fail to do so, the condition 
   illustrated in Figure 3: MPA Enablement Error arises.  This 
   condition is referred to as an MPA enablement error.  With the 
   understanding that MPA enablement errors should not occur, some 
   concerns have been raised about their effects if they do.  The 
   remainder of this section addresses those concerns for a variety of 
   ULPs. 

   MPA is enabled if the MPA protocol is visible on the wire.  When the 
   sender is MPA-enabled, it is inserting framing and markers.  When 
   the receiver is MPA-enabled, it is interpreting framing and markers.  
   MPA enablement is orthogonal to MPA awareness.  MPA can be enabled 
   on a strictly layered MPA implementation running over a non-MPA-
   aware TCP.  It can be disabled on an MPA-aware TCP implementation. 

   When first enabled, MPA always sends a marker preceding the first 
   FPDU.  Because the marker is located on the boundary between FPDUs, 
   its initial value is always 0.  Consequently, any ULP which never 
   starts its datastream with four zero octets is easily proved safe 
   with respect to MPA enablement errors.  

    

                    Node A                     Node B 
                 (MPA active)             (MPA not active) 
              +---------------+          +---------------+ 
              |      ULP      |          |      ULP      | 
              +---------------+          |               | 
              |      MPA      |          |               | 
              +---------------+          +---------------+ 
              |      TCP      |          |      TCP      | 
              +---------------+          +---------------+ 
              |      IP       |          |      IP       | 
              +---------------+          +---------------+ 
                      |                          | 
                      +--------------------------+ 

                      Figure 3: MPA Enablement Error 

    








Elzur, et al.            Expires - August 2003                [Page 19] 
                       Analysis of MPA over TCP           February 2003 


8.1  Negotiation of MPA-enabled mode 

   Transition to MPA can be accomplished by three possible methods 
   examined herein. The first two methods involve the use of specific 
   TCP ports for ULPs using MPA, either by use of a well known IANA 
   port or use of a service locator protocol. In these usage models it 
   is anticipated that both the Active and Passive ULPs will enable MPA 
   mode operation for their respective transmitters and receivers prior 
   to any data exchange.  The third model is by negotiation of MPA 
   transition by the ULP using octet stream messages to accomplish a 
   ULP specific <MPA Hello>, <MPA Hello ACK>, <MPA ACK> three way 
   exchange. Transition to MPA framing will cause the transmitter to 
   always insert a 4 octet marker modulo the marker interval, and the 
   receiver to check the MPA CRC. The first MPA marker is defined to 
   have a value of 0, and it will always follow the last expected octet 
   to be transferred in octet stream mode. This leads to two distinct 
   error detection scenarios. Receivers that are expecting MPA framing 
   will quickly detect a CRC error in addition to any ULP header errors 
   (e.g., DDP) if given octet stream data from a non MPA-enabled 
   transmitter. This will cause MPA to drop the connection. Receivers 
   that are not expecting MPA framing will see four octets of zeros 
   immediately in the octet stream at the point the transition to MPA 
   was expected to occur.  

   Anticipated error cases are examined in the following figures. 
   Figure 4: MPA Transition via Well Known Port or Service Location 
   Protocol deals with the cases where the transition to MPA is 
   expected prior to data exchange and examines the four possible 
   combinations of mismatching MPA-enabled endpoints with non-MPA-
   enabled endpoints. These error scenarios imply a configuration error 
   between active and passive nodes and are likely to simultaneously 
   represent a configuration error of the ULPs as well. Figure 5: MPA 
   Transition via Octet Stream Negotiation examines cases where 
   transition to MPA mode is by exchange of ULP specific MPA Hello/ACK 
   messages. 

    














Elzur, et al.            Expires - August 2003                [Page 20] 
                       Analysis of MPA over TCP           February 2003 


+-------------+---------------------------+---------------------------+ 
|             | MPA-Enabled               | Non-MPA-Enabled           | 
|             | Passive                   | Passive                   | 
|             |                           |                           | 
+-------------+---------------------------+---------------------------+ 
| MPA-Enabled | First octets transmitted  | Passive receiver does not | 
| Active      | are MPA mode.             | understand MPA header,    | 
|             |                           | which starts with 0.      | 
|             | Receivers check for MPA   | See section 8.2 for       | 
|             | framing.                  | anticipated behavior from | 
|             |                           | existing protocols that   | 
|             | Successful Transition to  | run over TCP.             | 
|             | MPA mode.                 |                           | 
|             |                           | Active receiver expects   | 
|             |                           | MPA framing and CRC,      | 
|             |                           | which will not be         | 
|             |                           | present.  Expect quick    | 
|             |                           | detection and closing     | 
|             |                           | the connection in error.  | 
|             |                           |                           | 
+-------------+---------------------------+---------------------------+ 
| Non-        | First octets transmitted  | Normal octet stream mode  | 
| MPA-Enabled | in octet streaming mode.  | operation could occur,    | 
| Active      |                           | but this indicates a con- | 
|             | Passive receiver expects  | figuration error that     | 
|             | MPA framing and CRC,      | should be detected by the | 
|             | which will not be present.| ULP at either node.       | 
|             | Expect quick detection    |                           | 
|             | and closing the connec-   |                           | 
|             | tion in error.            |                           | 
|             |                           |                           | 
+-------------+---------------------------+---------------------------+ 

     Figure 4: MPA Transition via Well Known Port or Service Location 
                                 Protocol 
















Elzur, et al.            Expires - August 2003                [Page 21] 
                       Analysis of MPA over TCP           February 2003 


+-------------+----------------------------+--------------------------+ 
|             | MPA-Enabled                | Non-MPA-Enabled          | 
|             | Passive                    | Passive                  | 
|             |                            |                          | 
+-------------+----------------------------+--------------------------+ 
| MPA-Enabled | Completes Hello, Hello     | Passive does not under-  | 
| Active      | Ack exchange.              | stand MPA Hello sequence | 
|             |                            | in octet-stream mode,    | 
|             | Successful Transition      | either                   | 
|             | to MPA mode.               |                          | 
|             |                            | * Passive side ULP dis-  | 
|             |                            |   connects due to ULP    | 
|             |                            |   protocol error         | 
|             |                            |                          | 
|             |                            | * Active side ULP times  | 
|             |                            |   out waiting for MPA    | 
|             |                            |   Hello ACK.             | 
|             |                            |                          | 
+-------------+----------------------------+--------------------------+ 
| Non-        | Active side sends an octet | Normal octet-stream mode | 
| MPA-Enabled | stream sequence that looks | operation                | 
| Active      | like MPA Hello without     |                          | 
|             | being aware of MPA.        |                          | 
|             | Passive side ULP sees this |                          | 
|             | message and mistakenly     |                          | 
|             | tries to transition to MPA |                          | 
|             | mode. This situation would |                          | 
|             | most likely be the result  |                          | 
|             | of a ULP protocol error.   |                          | 
|             | Passive side will trans-   |                          | 
|             | ition to MPA mode on the   |                          | 
|             | receiver and transmit MPA  |                          | 
|             | Hello Ack, resulting in    |                          | 
|             | one of two error cases:    |                          | 
|             |                            |                          | 
|             | * MPA Hello Ack should     |                          | 
|             |   cause a ULP protocol     |                          | 
|             |   error at the Active      |                          | 
|             |   side receiver            |                          | 
|             |                            |                          | 
|             | * Passive side receiver    |                          | 
|             |   will expect MPA framing  |                          | 
|             |   and CRC, which will      |                          | 
|             |   fail very quickly        |                          | 
|             |                            |                          | 
+-------------+----------------------------+--------------------------+ 

           Figure 5: MPA Transition via Octet Stream Negotiation 



Elzur, et al.            Expires - August 2003                [Page 22] 
                       Analysis of MPA over TCP           February 2003 


8.2  Analysis of existing TCP services  

   The following sections examine the initial octet stream exchanges 
   for many of the ULPs used over a TCP transport to determine behavior 
   if they were inadvertently subjected to MPA framed messages.  It 
   should be kept in mind that if the procedures described in the 
   preceding section are followed, none of the conditions analyzed here 
   are possible. If a ULP is determined to be unsafe with respect to 
   MPA enablement errors, it means only that the ULP does not protect 
   itself against connection attempts by clients using some other ULP.  
   Any such protection is the responsibility of the ULP, not MPA.  

8.2.1  The "Little" TCP Services 

   There is a group of TCP services, collectively referred to as the 
   "little" TCP services, which are occasionally useful for debugging 
   networks.  Normally the "client" for these services is telnet, which 
   simply sends all typed character and displays all received 
   characters verbatim.  Most of the "little" services are easily 
   proved safe. 

   The echo [RFC0862] server copies all received data back to the 
   client.  Its behavior is identical even if the echo client or server 
   is accidentally MPA-enabled. 

   The discard [RFC0863] server never sends anything and discards 
   everything it receives.  Neither the client nor the server can 
   detect the presence or absence of MPA on either side. 

   The chargen [RFC0864] server ignores all data sent to it and sends a 
   continuous stream of data.  The format of the data is unspecified, 
   although printable, ASCII text is recommended.  In any case, a non-
   MPA-enabled chargen client receiving MPA frames will simply treat 
   them as data.  An MPA-enabled chargen client receiving non-MPA 
   frames will detect a bad initial marker or a CRC error in the first 
   "frame". 

   The quote [RFC0865] server ignores all data sent to it and sends a 
   short octet string before closing the connection.  The format of the 
   data is unspecified, although printable, ASCII text is recommended.  
   In any case, a non-MPA-enabled quote client receiving MPA frames 
   will simply treat them as data.  An MPA-enabled quote client 
   receiving non-MPA frames will almost certainly detect a CRC error in 
   the first "frame".  In the unlikely event that the quote of the day 
   is a properly formed MPA frame, an MPA-enabled client will treat the 
   "payload" of the bogus frame as data. 

   The daytime [RFC0867] server ignores all data sent to it and sends a 
   text timestamp.  A non-MPA-enabled daytime client receiving MPA 


Elzur, et al.            Expires - August 2003                [Page 23] 
                       Analysis of MPA over TCP           February 2003 


   frames will simply treat them as data.  An MPA-enabled daytime 
   client receiving non-MPA frames will detect a CRC error in the first 
   "frame". 

   The time [RFC0868] server ignores all data sent to it and sends a 
   32-bit binary timestamp representing seconds since 00:00 GMT 01 
   January, 1900.  The timestamp will wrap sometime in 2036.  An MPA-
   enabled time client receiving non-MPA frames will attempt interpret 
   the time as an initial marker.  If the timestamp is 0, it will be 
   considered valid, but the server will close the connection before 
   sending an MPA header.  If the timestamp is not 0, it will be 
   rejected.  A non MPA-enabled time client receiving MPA frames will 
   interpret the initial marker as a 0 timestamp and display the time 
   as 00:00 GMT 01 January, 1900. 

8.2.2  ULPs Using Only Text Messages 

   Many ULPs format all TCP payload as lines of printable, ASCII text.  
   Such ULPs may all be analyzed together. 

   If a non-MPA-enabled endpoint A somehow becomes connected to an MPA-
   enabled endpoint B, the precise sequence of events depends on 
   whether A or B is the first to send.  If A sends first, B expects 
   the first four octets to be an initial marker, which must be all 
   zeros.  Since A sends printable, ASCII text, and 0h is not printable 
   ASCII, then B detects a protocol violation and should terminate the 
   connection.  If B sends first, A expects printable, ASCII text, but 
   receives four octets of zeros, so A detects a protocol violation and 
   should terminate the connection. 

   ULPs of this type are: 

      NNTP       [RFC0977] 

      finger     [RFC1288] 

      gopher     [RFC1436] the first two octets are always CR LF 

      POP3       [RFC1939] 

      IMAP4      [RFC2060] 

      IRC client [RFC2812] 

      IRC server [RFC2813] 

      BEEP       [RFC3081] 

      SIP        [RFC3261] 


Elzur, et al.            Expires - August 2003                [Page 24] 
                       Analysis of MPA over TCP           February 2003 


      whois      [RFC0954] 

      TACACS     [RFC1492] 

      ident      [RFC1413] 

      rwhois     [RFC2167] 

      ACAP       [RFC2244] 

      TIP        [RFC2371] 

8.2.3  ULPs with Fixed Initial Message 

   Several ULPs begin every connection with a fixed octet-string.  
   These ULPs are safe with respect to MPA enablement errors if the 
   first four octets of the initial string cannot be mistaken for valid 
   initial marker (four zero octets).  These ULPs are: 

      HTTP/1.0 [RFC1945] Initial string begins with "HTTP" 

      RTSP     [RFC2326] Initial string begins with "RTSP" 

      HTTP/1.1 [RFC2616] Initial string begins with "HTTP" 

      SMTP     [RFC2821] Initial string begins with "HELO" or "EHLO" 

      CIFS     [CIFS]    Initial string begins with "\xffSMB" 

      gnutella [GNUT]    Initial string begins with "GNUTELLA CONNECT" 

8.2.4  Protocols with framed command headers 

   Several protocols always exchange command header.  This analysis 
   looks at the impact of the first four octets of an MPA datastream 
   being interpreted as a command header and vice versa. 

8.2.4.1  iSCSI 

   The first messages on any iSCSI [iSCSI] connection are a login 
   exchange.  Since the first octet of a login request or response is a 
   login command code (value 3), iSCSI is safe with respect to MPA 
   enablement errors. 

8.2.4.2  iSNS 

   The first two octets of every iSNS [iSNS] transmission are the 
   protocol version, which is currently 1.  Consequently iSNS is safe 
   with respect to MPA enablement errors. 


Elzur, et al.            Expires - August 2003                [Page 25] 
                       Analysis of MPA over TCP           February 2003 


8.2.4.3  DNS 

   In many cases, DNS [RFC1035] uses UDP, rendering MPA irrelevant.  
   This section analyzes the implications of MPA enablement errors in 
   the case where DNS runs over TCP. 

   Every DNS message sent over TCP is preceded by a two-octet length 
   field.  Consequently, the initial marker of an MPA datastream would 
   be interpreted by a non-MPA-enabled receiver as a length field of 0.  
   The precise consequences of the error depend on whether the receiver 
   is the server or the resolver (i.e. client). 

   If a non-MPA-enabled resolver connects to an (improperly) MPA-
   enabled DNS server, the server will interpret the length field of 
   the first request as an invalid initial marker and terminate the 
   connection.  This configuration is safe with respect to MPA 
   enablement errors.  

   If an (improperly) MPA-enabled client connects to a non-MPA-enabled 
   DNS server, the server will interpret the upper 16 bits of the 
   initial marker as a length field of 0, indicating a null message.  
   The server might be expected to respond to a null message with a 
   format error, which is a safe outcome.  However, [RFC1035] does not 
   mandate this behavior.  The receiver could instead simply discard 
   the "null message" and continue.  This latter case requires more 
   analysis. 

























Elzur, et al.            Expires - August 2003                [Page 26] 
                       Analysis of MPA over TCP           February 2003 


          intended by resolver              interpreted by server 

    0                   1              0                   1 
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 
   +-------------------------------+  +-------------------------------+ 
   |marker[0..15] = 0x0000         |  |length = 0x0000 (null message) | 
   +-------------------------------+  +-------------------------------+ 
   |marker[16..31] = 0x0000        |  |length = 0x0000 (null message) | 
   +-------------------------------+  +-------------------------------+ 
   |MULPDU length                  |  |message length                 | 
   +-------------------------------+  +-------------------------------+ 
   |ID                             |  |ID                             | 
   +-+-------+-+-+-+-+-----+-------+  +-+-------+-+-+-+-+-----+-------+ 
   |Q| Opcode|A|T|R|R| 0   | RCODE |  |Q| Opcode|A|T|R|R| 0   | RCODE | 
   |R|       |A|C|D|A|     |       |  |R|       |A|C|D|A|     |       | 
   +-+-------+-+-+-+-+-----+-------+  +-+-------+-+-+-+-+-----+-------+ 
   |QDCOUNT                        |  |QDCOUNT                        | 
   +-------------------------------+  +-------------------------------+ 
   |ANCOUNT                        |  |ANCOUNT                        | 
   +-------------------------------+  +-------------------------------+ 
   |NSCOUNT                        |  |NSCOUNT                        | 
   +-------------------------------+  +-------------------------------+ 
   |ARCOUNT                        |  |ARCOUNT                        | 
   +-------------------------------+  +-------------------------------+ 
   |data                           |  |data                           | 
   |    ...                        |  |    ...                        | 
   +-------------------------------+  +-------------------------------+ 
   |pad //                         |  |second message                 | 
   +-------------------------------+  |                               | 
   |CRC[0-15]                      |  |                               | 
   +-------------------------------+  |                               | 
   |CRC[16-31]                     |  |                               | 
   +-------------------------------+  +-------------------------------+ 

          Figure 6: Effect of Improperly MPA-Enabled DNS Resolver 

                                      

   Figure 6: Effect of Improperly MPA-Enabled DNS Resolver illustrates 
   the correspondence between protocol fields as intended by the sender 
   and interpreted by the receiver.  The initial marker would be 
   interpreted as a pair of null messages.  The server would then 
   interpret the MULPDU length field as a DNS request length.  As it 
   happens, the length field is correct, and the subsequent payload 
   fields line up properly, so the server will correctly interpret the 
   query and respond to it.  The response will have a non-zero length 
   field, which MPA at the resolver will interpret as the upper half of 
   an erroneous initial marker, terminating the connection. 



Elzur, et al.            Expires - August 2003                [Page 27] 
                       Analysis of MPA over TCP           February 2003 


   Meanwhile the server will continue processing the incoming 
   datastream, interpreting the first 16 bits of the concatenated pad 
   and CRC as the length field of a second query.  The second "query" 
   will be null, incomplete or too short to be valid.  If the "query" 
   is null, the server will interpret the next 16 bits as a length, and 
   so forth.  Eventually it will either safely run out of data or 
   detect a "query" that is incomplete or too short.  If the "query" is 
   incomplete, the server will wait for additional payload, doing 
   nothing until the resolver disconnects; if it is too short, the 
   server will respond with a format error.  In either case, no harm 
   results. 

   The interaction between MPA and DNS is complex, but as the analysis 
   shows, even in the worst case DNS is safe with respect to MPA 
   enablement errors. 

8.2.4.4  LPR 

   The first octet of every request made to the LPR [RFC1179] daemon is 
   a printable ASCII character or a command code from the set 
   {1,2,3,4,5}.  Consequently LPR is safe with respect to MPA 
   enablement errors. 

8.2.4.5  Kerberos 

   The first octet of every Kerberos [RFC1510] request is the version 
   number (currently 5).  Consequently Kerberos is safe with respect to 
   MPA enablement errors. 

8.2.4.6  BGP-4 

   The first PDU in the BGP-4 protocol [RFC1771] is an OPEN with a 
   marker (covering the first four octets) of all one bits.  
   Consequently BGP-4 is safe with respect to MPA enablement errors. 

8.2.4.7  LDAP v2 and LDAP v3 

   All LDAP [RFC1777, RFC2251] messages are encapsulated in an 
   LDAPMessage, which is defined as a SEQUENCE under ASN.1 Basic 
   Encoding Rules.  Since the leading octet of a SEQUENCE is an ASN.1 
   BER type code of 0x30, no LDAP datastream can begin with 4 zero 
   octets.  Consequently LDAP is safe with respect to MPA enablement 
   errors. 

8.2.4.8  RTP 

   The most significant two bits of the first octet of an RTP [RFC1889] 
   payload contain the protocol version number (currently 2).  
   Consequently RTP is safe with respect to MPA enablement errors. 


Elzur, et al.            Expires - August 2003                [Page 28] 
                       Analysis of MPA over TCP           February 2003 


8.2.4.9  Socks 

   The first octet of a SOCKS [RFC1928] datastream contain the protocol 
   version number (currently 5).  Consequently SOCKS is safe with 
   respect to MPA enablement errors. 

8.2.4.10 TLS 

   The first octet of a TLS [RFC2246] datastream is a command code from 
   the set {20,21,22,23,255}.  Consequently TLS is safe with respect to 
   MPA enablement errors.  This implies further than any ULP using TLS 
   is safe with respect to MPA enablement errors. 

8.2.4.11 SLP V2 

   The first octet of an SLPv2 [RFC2608] datastream is the protocol 
   version number (currently 2).  Consequently SLPv2 is safe with 
   respect to MPA enablement errors. 

































Elzur, et al.            Expires - August 2003                [Page 29] 
                       Analysis of MPA over TCP           February 2003 


9  Security Considerations 

   This document does not define protocols; hence it does not create 
   any new security considerations. 

    

    











































Elzur, et al.            Expires - August 2003                [Page 30] 
                       Analysis of MPA over TCP           February 2003 


10 IANA Considerations 

   This Internet Draft does not define any new protocols, thus there 
   are no IANA considerations. 















































Elzur, et al.            Expires - August 2003                [Page 31] 
                       Analysis of MPA over TCP           February 2003 


11 References 

11.1 Primary References 

   [RFC0793] J. Postel, "Transmission Control Protocol", RFC 793, 
       September 1981. 

   [RFC0854] J. Postel & J.K. Reynolds, "Telnet Protocol 
       Specification", RFC 854, May 1983. 

   [RFC1122] R. Braden, Ed., "Requirements for Internet Hosts - 
       Communication Layers", RFC 1122, October 1989. 

   [RFC1191] J.C. Mogul & S.E. Deering, "Path MTU discovery", RFC 1191, 
       November 1990. 

   [RFC2026] S. Bradner, "The Internet Standards Process -- Revision    
       3", BCP 9, RFC 2026, October 1996. 

   [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate    
       Requirement Levels", BCP 14, RFC 2119, March 1997. 

   [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP 
       Specification", draft-cully-iwarp-mpa-01.txt (work in progress), 
       October 2002  

   [RDMAP] R. Recio et al., "RDMA Protocol Specification", draft-recio-
       iwarp-01.txt (work in progress), October 2002  

   [DDP] H. Shah et al., "Direct Data Placement over Reliable 
       Transports", draft-shah-iwarp-ddp-01.txt (work in progress), 
       October 2002  

11.2 "Little" TCP Services 

   [RFC0862] J. Postel, "Echo Protocol", RFC 862, May 1983. 

   [RFC0863] J. Postel, "Discard Protocol", RFC 863, May 1983. 

   [RFC0864] J. Postel, "Character Generator Protocol", RFC 864, May 
       1983. 

   [RFC0865] J. Postel, "Quote of the Day Protocol", RFC 865, May 1983. 

   [RFC0867] J. Postel, "Daytime Protocol", RFC 867, May 1983. 

   [RFC0868] J. Postel & K. Harrenstien, "Time Protocol", RFC 868, May 
       1983. 



Elzur, et al.            Expires - August 2003                [Page 32] 
                       Analysis of MPA over TCP           February 2003 


11.3 ULPs using only Text Messages 

   [RFC0977] B. Kantor & P. Lapsley, "Network News Transfer Protocol", 
       RFC 977, February 1986. 

   [RFC1288] D. Zimmerman, "The Finger User Information Protocol", RFC 
       1288, December 1991. 

   [RFC1436] F. Anklesaria, et al., "The Internet Gopher Protocol (a 
       distributed document search and retrieval protocol)", RFC 1436, 
       March 1993. 

   [RFC1939] J. Myers & M. Rose, "Post Office Protocol - Version 3", 
       RFC 1939, May 1996. 

   [RFC2060] M. Crispin, "Internet Message Access Protocol - Version 
       4rev1", RFC 2060, December 1996. 

   [RFC2812] C. Kalt, "Internet Relay Chat: Client Protocol", RFC 2812, 
       April 2000. 

   [RFC2813] C. Kalt, "Internet Relay Chat: Server Protocol", RFC 2813, 
       April 2000. 

   [RFC3081] M. Rose, "Mapping the BEEP Core onto TCP", RFC 3081, March 
       2001. 

   [RFC3261] J. Rosenberg, et al., "SIP: Session Initiation Protocol", 
       RFC 3261, June 2002. 

   [RFC0954] K. Harrenstien, et al., "NICNAME/WHOIS", RFC 954, October 
       1985. 

   [RFC1492] C. Finseth, "An Access Control Protocol, Sometimes Called 
       TACACS", RFC 1492, July 1993. 

   [RFC1413] M. St. Johns, "Identification Protocol", RFC 1413, 
       February 1993. 

   [RFC2167] S. Williamson, et al., "Referral Whois (RWhois) Protocol 
       V1.5", RFC 2167, June 1997. 

   [RFC2244] C. Newman & J. G. Myers, "ACAP -- Application 
       Configuration Access Protocol", RFC 2244, November 1997. 

   [RFC2371] J. Lyon, et al., "Transaction Internet Protocol Version 
       3.0", RFC 2371, July 1998. 




Elzur, et al.            Expires - August 2003                [Page 33] 
                       Analysis of MPA over TCP           February 2003 


11.4 ULPs with Fixed Initial Message 

   [RFC1945] T. Berners-Lee, et al., "Hypertext Transfer Protocol -- 
       HTTP/1.0", RFC 1945, May 1996. 

   [RFC2326] H. Schulzrinne, et al., "Real Time Streaming Protocol 
       (RTSP)", RFC 2326, April 1998. 

   [RFC2616] R. Fielding, et al., "Hypertext Transfer Protocol -- 
       HTTP/1.1", RFC 2616, June 1999. 

   [RFC2821] J. Klensin, ed., "Simple Mail Transfer Protocol", RFC 
       2821, April 2001. 

   [CIFS] Storage Networking Industry Association, "Common Internet 
       File System (CIFS) Technical Reference",  
       http://www.snia.org/tech_activities/CIFS/CIFS-TR-1p00_FINAL.pdf, 
       March 2002. 

   [GNUT] Anonymous, "The Gnutella Protocol Specification v0.4", 
       http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf. 

11.5 ULPs with Framed Command Headers 

   [iSCSI] J. Satran et al., "iSCSI", draft-ietf-iscsi-19.txt (work in 
       progress), January 2003 

   [iSNS] J. Tseng et al., "Internet Storage Name Service (iSNS)", 
       draft-ietf-ips-isns-16.txt (work in progress), January 2003 

   [RFC1035] P.V. Mockapetris, "Domain names - implementation and 
       specification", RFC 1035, November 1987. 

   [RFC1179] L. McLaughlin, "Line printer daemon protocol", RFC 1179, 
       August 1990. 

   [RFC1510] J. Kohl & C. Neuman, "The Kerberos Network Authentication 
       Service (V5)", RFC 1510, September 1993. 

   [RFC1771] Y. Rekhter & T. Li, "A Border Gateway Protocol 4 (BGP-4)", 
       RFC 1771, March 1995 

   [RFC1777] W. Yeong, et al., "Lightweight Directory Access Protocol", 
       RFC 1777, March 1995. 

   [RFC2251] M. Wahl, et al., "Lightweight Directory Access Protocol 
       (v3)", RFC 2251, December 1997. 




Elzur, et al.            Expires - August 2003                [Page 34] 
                       Analysis of MPA over TCP           February 2003 


   [RFC1889] H. Schulzrinne, et al., "RTP: A Transport Protocol for 
       Real-Time Applications", RFC 1889, January 1996. 

   [RFC1928] M. Leech, et al., "SOCKS Protocol Version 5", RFC 1928,. 
       March 1996. 

   [RFC2246] T. Dierks & C. Allen, "The TLS Protocol Version 1.0", RFC 
       2246, January 1999. 

   [RFC2608] E. Guttman, et al., "Service Location Protocol, Version 
       2", RFC 2608, June 1999. 








































Elzur, et al.            Expires - August 2003                [Page 35] 
                       Analysis of MPA over TCP           February 2003 


12 Author's Addresses 

   Uri Elzur  
   Broadcom Corporation 
   16215 Alton Parkway 
   Irvine, CA 92619-7013 USA 
   Phone: +1 (949) 585-6432 
   Email: Uri@Broadcom.com  

   James Pinkerton 
   Microsoft Corporation 
   One Microsoft Way  
   Redmond, WA 98052 USA 
   Phone: +1 (425) 705-5442 
   Email: jpink@microsoft.com 

   Robert Teisberg 
   Hewlett-Packard Company 
   14231 Tandem Blvd. 
   Austin, TX 78728 
   Phone: +1 (512) 432-8119 
   Email: Robert.Teisberg@hp.com 

   Dwight Barron  
   Hewlett-Packard Company 
   20555 SH 249  
   Houston, TX 77070-2698  USA 
   Phone: +1 (281) 514-2769 
   Email: Dwight.Barron@Hp.com  

   John Carrier 
   Adaptec, Inc. 
   691 S. Milpitas Blvd. 
   Milpitas, CA 95035 USA 
   Phone: +1 (360) 378-8526 
   Email: john_carrier@adaptec.com 

   Paul R. Culley 
   Hewlett-Packard Company 
   20555 SH 249 
   Houston, TX 77070-2698  USA 
   Phone: +1 (281) 514-5543 
   Email: paul.culley@hp.com 








Elzur, et al.            Expires - August 2003                [Page 36] 
                       Analysis of MPA over TCP           February 2003 


13 Acknowledgments 

   Vadim Makhervaks 
   IBM Corp., Haifa Development Lab 
   Haifa, Israel 
   Phone: +972-4-829-6537 
   Email: VADIK@il.ibm.com 

   Renato Recio 
   IBM Corp. 
   11501 Burnett Road 
   Austin, Tx. USA 78758 
   Phone: 512-838-3685 
   Email: recio@us.ibm.com 

   Tom Talpey 
   Network Appliance 
   375 Totten Pond Road 
   Waltham, MA 02451 USA 
   Phone: +1 (781) 768-5329 
   EMail: thomas.talpey@netapp.com 

   Patricia Thaler 
   Agilent Technologies, Inc. 
   1101 Creekside Ridge Drive, #100  
   M/S-RG10 
   Roseville, CA 95678 
   Phone: +1-916-788-5662 
   Email: pat_thaler@agilent.com 






















Elzur, et al.            Expires - August 2003                [Page 37] 
                       Analysis of MPA over TCP           February 2003 


14 Full Copyright Statement 

   Copyright (C) The Internet Society (2003).  All Rights Reserved. 

   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it 
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any 
   kind, provided that the above copyright notice and this paragraph 
   are included on all such copies and derivative works.  However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other 
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be 
   followed, or as required to translate it into languages other than 
   English. 

   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns. 

   This document and the information contained herein is provided on an 
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 

   Funding for the RFC Editor function is currently provided by the 
   Internet Society. 





















Elzur, et al.            Expires - August 2003                [Page 38]