Internet DRAFT - draft-cho-rohc-tcp-interflow-behaviour

draft-cho-rohc-tcp-interflow-behaviour






Network Working Group                                     Chia Yuan Cho
Internet-document                                   Sukanta Kumar Hazra
Expires: August 2004

                                                       February 9, 2004

                 Statistical Inter-flow Field Behaviour 
                  for Context Replication in ROHC-TCP
             <draft-cho-rohc-tcp-interflow-behaviour-00.txt>

Status of This Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is  inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Copyright Notice

   Copyright (C) The Internet Society (2004).  All Rights Reserved.

Abstract

   Context replication increases header compression gains by reducing 
   the redundancy between flows via efficient replicate (IR-CR) packets. 
   The optimum design of IR-CR packet formats requires elaborate    
   understanding of the inter-flow redundancy. As context replication is 
   most well-suited for TCP, this document presents a statistical 
   analysis of TCP/IP inter-flow field behaviour. Based on the analysis, 
   recommendations on ROHC-TCP packet format specifications for context 
   replication are made. It is also shown that inter-flow field 
   behaviour is inherently and significantly asymmetrical, and various 
   ways of handling it are considered. Finally, based on the inter-flow 
   behaviour of TCP Window field, it is noted that current encoding 
   methods do not compress it efficiently.




Cho & Hazra                                                     [Page 1]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


Table of contents

   1.  Introduction....................................................2

   2.  Terminology.....................................................3

   3.  Header Compression Model........................................4

   4.  Methodology.....................................................6

   5.  Results.........................................................9

       5.1.  IPv4 Identification......................................11
       5.2.  IP DonÆt Fragment and Time To Live.......................13
       5.3.  IP Destination Address...................................14
       5.4.  TCP Source Port..........................................15
       5.5.  TCP Destination Port.....................................16
       5.6.  TCP Sequence Number and Acknowledgement Number...........17
       5.7.  TCP Flags and Urgent Pointer.............................18
       5.8.  TCP Window...............................................18
       5.9.  TCP Checksum.............................................21
       5.10. TCP Options..............................................21
       5.11. Mean Sizes of Compressed Fields..........................21

   6.  Handling Asymmetrical Inter-flow Behaviour.....................22

   7.  Security Considerations........................................23

   8.  References.....................................................23

   9.  Authors' Addresses.............................................24

   Appendix A.  State Transition Threshold............................26


1.  Introduction

   Context replication offers an alternative to the conventional context
   initialization procedure by performing context initialization via 
   more efficient IR-CR packets. In contrast to IR packets, which 
   contain mostly uncompressed fields, IR-CR packets carry compressed 
   header fields, obtained by reducing the redundancy between packets of 
   different flows. As such, header compression can possibly start right 
   from the first packet of a flow and compression efficiency is 
   improved.

   The motivations for context replication, as well as elaborations on 
   the context replication mechanism are already in [ROHC-CR]. Although 
   context replication is a general ROHC mechanism, this document 
   focuses on the application of context replication to the ROHC-TCP 
   profile in particular. This is because the motivation for context 


Cho & Hazra                                                     [Page 2]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   replication originated from the ROHC-TCP profile, and furthermore due 
   to TCP's æshort-lived' characteristic, context replication is able to 
   improve header compression gains most significantly for the ROHC-TCP 
   profile.

   Context replication is possible due to significant redundancy between 
   multiple simultaneous, or near-simultaneous flows passing through the 
   same compressor-decompressor pair. For any header compression scheme 
   to work, the first step has to be towards understanding the field 
   behaviour to recognize areas of redundancy. The nature of context 
   relication focuses on relatively unexplored inter-flow field 
   behaviour, rather than well-understood intra-flow field behaviour. In 
   that aspect, [TCP-BEH] provides an elaborate qualitative analysis on 
   TCP/IP field behaviour. However, it has focused more on the intra-
   flow aspect rather than the inter-flow aspect, for which this 
   document is meant in part as an extension. The difficulty in 
   understanding and describing inter-flow field behaviour is compounded 
   by the fact that it depends on human usage patterns, in addition to 
   the underlying protocol characteristics. This gives inter-flow field 
   behaviour a much larger variance and higher degree of uncertainty.

   In this document, a method of extracting the inter-flow field 
   behaviour relevant for context replication is presented, as well as 
   the quantitative results of statistical analysis on the TCP/IP inter-
   flow behaviour, based on four TCPdump traces containing 1.9 million 
   TCP/IP packet samples. From the results, a number of   
   recommendations are made. Firstly, the possibly optimum combination 
   of encoding methods to be used for each field during context 
   replication are recommended, as well as parameters and estimated 
   probabilities of success for each encoding method. Secondly, it is 
   shown that inter-flow field behaviour is significantly asymmetrical, 
   and ways of handling this behaviour are explored. Finally, it is 
   noted that current encoding methods can be improved upon to compress 
   the Window field more efficiently.

   For verification of the replicate packet format specifications    
   prescribed in this document, the EPIC-LITE implementation [EPIC-IMPL] 
   from the University of Split was modified to support context 
   replication.


2.  Terminology

   This document reuses some of the terminology found in [RFC-3095],    
   [ROHC-TCP], [ROHC-CR], [TCP-BEH], [EPIC-LITE] and [ROHC-FN]. In  
   addition, this document defines the following terms:

   'Incoming' and 'Outgoing' Packets
     'Incoming' packets are packets traveling towards client hosts 
     through the channel of interest over which ROHC is employed.



Cho & Hazra                                                     [Page 3]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


     'Outgoing' packets are packets traveling away from client hosts 
     through the channel of interest over which ROHC is employed.

   Asymmetrical Header Compression
     Header Compression is performed asymmetrically when 'incoming' and
     'outgoing' packets are compressed differently. This requires the
     packet format specifications for compressor-decompressor pairs to
     be configured differently depending on the direction of packet
     flow they deal with.

   Replication Match Rate
     The replication match rate for a trace is defined as the percentage 
     of uni-directional flows within the trace which can be context 
     replicated. A new flow is replicable when there is at least one 
     suitable base context present in the compressor upon arrival of the 
     first packet of the flow. This is used as a form of measure to 
     estimate the probability of using context replication for context 
     initialization.

   State Transition Threshold
     The State Transition Threshold for a uni-directional flow is the 
     number of initial TCP/IP packets (near the start of a flow) 
     converted into IR or IR-CR packets.
     

3. Header Compression Model

   With the objective of extracting the TCP/IP inter-flow field  
   behaviour, we focus on the deployment of ROHC over the final hop. The 
   ROHC compressor-decompressor pair is deployed at the two endpoints of 
   the (possibly wireless) low-bandwidth channel and cooperates to 
   transmit packets efficiently in the direction towards the 
   decompressor. Since TCP requires a full-duplex channel, another 
   compressor-decompressor pair may be present to compress packets in 
   the reverse direction. Considering the direction of flow of packets 
   with respect to clients using the low-bandwidth channel, packets can 
   thus be classified as 'incoming' and 'outgoing'. 'Incoming' and 
   'outgoing' packets use different compressor-decompressor pairs. This 
   is shown in Fig. 1.

   Although ROHC was originally targeted at cellular links, the 
   convergence of the telecommunication and computer communication 
   industries means that it may be employed over wireless links in 
   general. As such, the header compression model in Fig. 1 does not 
   define the target ælow-bandwidthÆ channel explicitly. Mobile Terminal 
   clients are connected to the Internet via a last-hop router node as 
   seen in Fig. 1, on which we focus on the æheader compression entityÆ 
   situated on the data link layer of the node. This can have 
   different manifestations depending on the nature of the wireless 




Cho & Hazra                                                     [Page 4]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


    +---+  'outgoing'
    | C |--- 
    +---+   ---      +-------+                                +------+
    | D |<--   ---   | +---+ |                             -->|Server|
    +---+   ---   -->| | D | |    - - - - - - - - -      --   +------+
               ---   | +---+ |   /                 \   --
      'incoming'  ---| | C | |   |                 |<--
                     | +---+ |  |                   |         +------+
    Clients          |       |<->|    Internet     |<-------->|Server|
                     | +---+ |  |                   |         +------+
      'outgoing'  -->| | D | |   |                 |<--
               ---   | +---+ |   \                 /   --
    +---+   ---   ---| | C | |    - - - - - - - - -      --   +-------+
    | C |---   ---   | +---+ |                             -->| Other |
    +---+   ---      +-------+                                |Clients|
    | D |<--          Last-hop                                +-------+
    +---+ 'incoming'   Router
         |__________|         |______________________|________|
      
         Low-bandwidth                  Wired        Wired or Wireless
           Channel

     C - Compressor
     D - Decompressor

   Fig. 1: Header compression model showing 'incoming' and 'outgoing'
   flows


   link. For example, in Universal Mobile Telecommunications System 
   (UMTS), the ROHC entity is part of the Packet Data Convergence 
   Protocol (PDCP) sub-layer on a Base Station; if ROHC is employed over 
   Wireless Ethernet (IEEE 802.11), it can be part of the data link 
   layer on a wireless router; in Mobile Ad Hoc networks, the ROHC 
   entity can reside on a æforwarding nodeÆ.

   Due to the nature of the protocol suite under study, we expect 
   client-server computing to dominate over peer-to-peer, as is the case 
   currently. As such, 'incoming' and 'outgoing' flows are inherently 
   asymmetrical. As noted in [ROHC-TCP], some asymmetry is already 
   present in TCP/IP intra-flow field behaviour. An example is the 
   relationship between TCP Sequence Number and Acknowledgement Number, 
   for which 'outgoing' flows are likely to exhibit large deltas between 
   consecutive packets in Acknowledgement Number and small deltas in 
   Sequence Number, but the converse is likely for 'incoming' flows. 
   With respect to context replication, [ROHC-TCP] also acknowledges 
   some inter-flow asymmetry in the TCP source/destination port.






Cho & Hazra                                                     [Page 5]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   As will be shown in Section 5, asymmetry becomes even more pronounced 
   between flows. The above figure partly serves to illustrate that 
   asymmetrical header compression, if desired, can be achieved by 
   configuring compressor-decompressor pairs differently based on their 
   'incoming' or 'outgoing' role.

   Finally, it should be noted that the focus on ROHC over the final hop    
   in Fig. 1 does not reduce the scope of applicability in the obtained 
   results on inter-flow behaviour. In general, header compression may 
   be deployed over any hop, e.g. over a core network links in Multiple 
   Protocol Label Switching (MPLS), or over intermediate hops in Mobile 
   Ad Hoc networks. Regardless of the location of ROHC deployment, the 
   TCP/IP endpoints remain the same. The advantage of focusing on the 
   last hop, then, is that it allows any asymmetrical behaviour to be 
   distilled. Bi-directional asymmetry over intermediate hops causes 
   inherent asymmetrical behaviour to be lost. However, over 
   intermediate hops, inter-flow results continue to be applicable using 
   the symmetric treatment as prescribed in Section 6.


4.  Methodology

   Given the bizarre range of inter-flow field behaviour, a suitable 
   methodology for obtaining inter-flow field behaviour relevant for 
   context replication is proposed.

   Inter-flow field behaviour can be obtained by emulating a context-
   replication enabled compressor. To observe any asymmetrical 
   behaviour, Tcpdump traces are fed into the æcompressor emulatorÆ 
   separately, according to the direction they flow, i.e. æincomingÆ or 
   æoutgoingÆ. Thus, the emulator simulates the compressors found on 
   client terminals and routers in the æoutgoingÆ and æincomingÆ 
   directions respectively. In the same way as a compressor, the 
   emulator creates, maintains and updates a list of contexts 
   dynamically for each arriving packet.

   The emulator keeps an extensible list of contexts, one for each 
   unique TCP connection, arranged in a Most Recently Used (MRU) stack. 
   Each TCP/IP packet updates its context unique for that flow. A 
   context retrieved for updating or referencing is placed at the top 
   of stack, followed by its base context, if a base context has just 
   been simultaneously used as reference. Whenever possible, each new 
   flow is context replicated. Context replication is possible when a 
   base context exists, with the implementation-dependent selection 
   criteria requiring the IP source to be shared, and with preference 
   but no necessity for the same IP destination. For simplicity, all 
   contexts are assumed to be acknowledged by default. Furthermore, if 
   the first packet of a flow can be context replicated, then it is
   assumed that the subsequent two packets of the flow would also be 




Cho & Hazra                                                     [Page 6]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   replicated. This means that up to the first 3 packets of each flow 
   are converted into IR-CR packets. This number is the upper bound of 
   the State Transition Threshold range, and is based on the estimate of 
   the upper bound of TCP/IP packets possibly converted to IR-CR 
   packets. Elaboration on this would be done in Appendix A.

   Even though we show results at the upper bound of the State 
   Transition Threshold, it was also found that the inter-flow field 
   behaviour remains invariant at smaller State Transition Threshold 
   values.
   
   For the purpose of this study, four Tcpdump traces totaling 1.9 
   million packets were captured from within the Local Area Network of 
   the Institute for Infocomm Research. The LAN configuration is shown 
   in Fig. 2. Macro statistics of each trace are shown in the Table 1.


   +--------+
   | Client |
   |Terminal|<-
   +--------+  -
                -  +--------+
                 ->|Last-Hop|
                 ->| Router |<-
                -  +--------+  -
   +--------+  -                -  +--------+
   | Client |<-                  ->|  NAT   |
   |Terminal|                    ->| Router |<-
   +--------+                   -  +--------+  -
                   +--------+  -                -  +--------+
                   |Last-Hop|<-                  ->| Border |<->Internet
                 ->| Router |                    ->|Gateway |
                -  +--------+      +--------+   -  +--------+
               -                   |  NAT   |<--
         ... <-                  ->| Router |
                                -  +--------+
                               -              
                        ...  <-

   Fig. 2: Configuration of Local Area Network

   Three out of four traces were captured at the Border Gateway, so that 
   traffic from a large number of client terminals can be gathered in 
   each single trace. However, as in most LANs, Network Address 
   Translation (NAT) is in use. NAT transparently changes æoutgoingÆ 
   Source IP Address and Port, as well as æincomingÆ Destination IP 
   Address and Port. Thus, packets captured at the Border Gateway 
   reflect the changed values rather than original values. To deal with 
   this, the forth trace TCP180903 captured at a client terminal was 
   used to investigate these fields as well as to verify results from 
   traces captured at the Border Gateway.


Cho & Hazra                                                     [Page 7]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   +---------------+-----------+------------+------------+-----------+
   |    Trace      | TCP180803 | TCP080903a | TCP080903b | TCP180903 |
   |Identification |           |            |            |           |
   +---------------+-----------+------------+------------+-----------+
   |  Duration     |  30 min   |  30 min    |   30 min   |  27.4 hrs |
   +---------------+-----------+------------+------------+-----------+
   |   Location    |  Gateway  |  Gateway   |  Gateway   |  Client   |
   |               |  Router   |  Router    |  Router    | Terminal  |
   +---------------+-----------+------------+------------+-----------+
   |No. of packets |  516172   |   509281   |   507293   |   383594  |
   +---------------+-----------+------------+------------+-----------+
   |  Replication  |   97.5    |    94.4    |    94.3    |    93.4   |
   | Match Rate(%) |           |            |            |           |
   +---------------+-----------+------------+------------+-----------+

   Table 1: Macro statistics of TCPdump traces

   By using packets captured from our LAN, it is assumed that TCP/IP 
   inter-flow field behaviour does not vary significantly between the 
   wired Ethernet-based channel and the target low bandwidth, possibly 
   less reliable channel where header compression takes place. Provided 
   the header compression layer is sufficiently robust to be 
   transparent, this is reasonable because the upper (network, 
   transport and application) layer protocol characteristics and human 
   usage behaviour remains the same.

   It is desired that the inter-flow behaviour of TCP/IP fields are 
   mapped using a system of classification such that fields within a 
   category share the same characteristic. [TCP-BEH] already provides a 
   good system of classification for intra-flow field behaviour: 
   INFERRED, STATIC, STATIC-DEF, STATIC-KNOWN, CHANGING, where each 
   category follows some general trend(s) hinting how fields in that 
   category may be compressed. For inter-flow behaviour, [TCP-BEH] uses 
   a different system of classification: 'N/A/', 'No', 'Yes', which 
   unfortunately does not achieve the same level of effectiveness,
   because one can only discern whether a field is compressible for 
   context replication, but does not know how to suitably compress it. 
   Therefore, in this document, the inter-flow field behaviour is 
   classified based on the same categories as used for intra-flow 
   behaviour: INFERRED, STATIC, STATIC-KNOWN, CHANGING. However, it 
   should be noted that the context here lies in inter-flow field 
   behaviour. Furthermore, here STATIC-DEF is merged into STATIC because 
   it is meaningless to define a STATIC category for fields defining a 
   packet stream where inter-flow field behaviour is concerned.

   Classification can be done with the help of observing the range of 
   deltas. Here, delta is defined as the difference in field value
   between that in the current packet and the stored field value in the 
   base context. The delta analysis is useful for the following reasons. 
   For any field not known to be INFERRED or STATIC-KNOWN, if delta = 0 
   in all samples, then this field is a STATIC field. If not, the field 

  
Cho & Hazra                                                     [Page 8]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   is categorized as CHANGING. For CHANGING fields, by further analyzing 
   the range of deltas obtained, it can be found whether the field can 
   still be encoded using the STATIC encoding method with significant 
   probability. Since deltas tend to be small, the number of least 
   significant bits used (in LSB encoding) to encode that field with a 
   significant probability of success can be determined. Fields which 
   tend to have uniformly distributed deltas may only be suitably 
   encoded as IRREGULAR. Finally, where certain unique trends are 
   observed for a field, raw and/or network-byte-order converted  
   versions of field values are also studied.


5.  Results

   Our initial categorization is shown in Table 2. Differences between
   intra-flow classification (in [TCP-BEH]) and inter-flow
   classification here are marked with '(2)'. At this stage, there is no 
   asymmetry observed in categorization between æincoming and æoutgoingÆ 
   flows.

          +-----------------------------------+------------+
          | Field                             |Category    |
          +-----------------------------------+------------+
          |IPv4 Version                       |STATIC      |
          |IPv4 Header Length                 |STATIC-KNOWN|
          |IPv4 Type Of Service               |STATIC(1)   |
          |IPv4 ECN Capable Transport         |STATIC(1)   |
          |IPv4 Congestion Experienced        |STATIC(1)   |
          |IPv4 Packet Length                 |INFERRED    |
          |IPv4 Identification                |CHANGING    |
          |IPv4 Reserved Flag                 |STATIC(1)   |
          |IPv4 DonÆt Fragment Flag           |CHANGING    |
          |IPv4 More Fragments Flag           |STATIC-KNOWN|
          |IPv4 Fragment Offset               |STATIC-KNOWN|
          |IPv4 Time To Live                  |CHANGING    |
          |IPv4 Protocol                      |STATIC      |
          |IPv4 Header Checksum               |INFERRED    |
          |IPv4 Source Address                |STATIC      |
          |IPv4 Destination Address           |CHANGING(2) |
          |TCP Source Port                    |CHANGING(2) |
          |TCP Destination Port               |CHANGING(2) |
          |TCP Sequence Number                |CHANGING    |
          |TCP Acknowledgement Number         |CHANGING    |
          |TCP Data Offset                    |INFERRED    |
          |TCP Reserved                       |STATIC(1)   |
          |TCP Congestion Window Reduced      |STATIC(1)   |
          |TCP Echo Congestion Experienced    |STATIC(1)   |
          |TCP URG flag                       |CHANGING    |
          |TCP ACK flag                       |CHANGING    |
          |TCP PSH flag                       |CHANGING    |
          |TCP RST flag                       |CHANGING    |

    
Cho & Hazra                                                     [Page 9]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


          |TCP SYN flag                       |CHANGING    |
          |TCP FIN flag                       |CHANGING    |
          |TCP Window                         |CHANGING    |
          |TCP Checksum                       |CHANGING    |
          |TCP Urgent Pointer                 |CHANGING    |
          |TCP Options                        |CHANGING    |
          +-----------------------------------+------------+
   (1)These fields were found to be STATIC from samples, but context  
      replication should follow the classification in [TCP-BEH] for 
      future-proofing.
   (2)Differs from intra-flow classification [TCP-BEH] due to context 
      replication.
           Table 2: TCP/IP Fields and Classifications


   Some changes in categorization are made in this study because of the 
   current slow adoption of IP and TCP congestion notification fields. 
   However, these fields are expected to be used in the future and 
   should be CHANGING instead of STATIC.

   The encoding methods to be used for STATIC, STATIC-KNOWN and INFERRED 
   fields are straightforward, but CHANGING fields need to be further 
   analyzed. This will be unraveled in subsequent sub-sections. CHANGING 
   fields can sometimes be encoded with STATIC, LSB, or other encoding 
   methods with significant probability. For LSB encoding, it is desired 
   to determine the suitable number of least significant bits to be used 
   to encode that field. Therefore, our frequency bins are defined in 
   increasing ceil(log2(|delta|+1)) (the reason for this expression 
   will be elaborated later in this section), which is effectively the 
   minimum number of bits possibly used to encode delta values within 
   that bin. Negative delta values are mapped to ûceil(log2(|delta|+1)), 
   and are useful for defining the offset value used in LSB encoding. 
   From our frequency tables, we can also derive the correct combination 
   of encoding methods to use, as well as the estimated probability of 
   each encoding method being used.

   The inter-flow behaviour of CHANGING fields can be summarized 
   directly in the form of packet format specifications for IR-CR 
   packets. This is shown in Fig. 3, in EPIC-LITE terminology [EPIC-
   LITE], which is derived from the BNF input language [RFC-2234]. To 
   illustrate asymmetrical inter-flow behaviour, packet format 
   specifications with any differences between 'incoming' and 'outgoing' 
   flows are defined separately for each field with the postfix ô_inö or 
   ô_outö. Note however that if the same set of encoding methods are 
   used in both directions for the same field, and only the 
   probabilities are different, then it may mean that significant 
   asymmetrical behaviour has not been observed.






Cho & Hazra                                                    [Page 10]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   Identification_in ::= NBO(16) ;network byte order
                         LSB(3,-1,50%) | LSB(8,-1,17%) | IRREGULAR(33%)
   Identification_out ::= NBO(16)
                          LSB(3,-1,65%) | LSB(8,-1,14%) | IRREGULAR(21%)

   DonÆt_Fragment_in ::= STATIC(73%) | IRREGULAR(1,27%)
   DonÆt_Fragment_out ::= STATIC(99%) | IRREGULAR(1,1%)

   Time_To_Live_in ::= STATIC(98%) | IRREGULAR(8,2%)
   Time_To_Live_out ::= STATIC(97%) | IRREGULAR(8,3%)

   Destination_Address_in ::= STATIC(100%)
   Destination_Address_out ::= STATIC(86%) | IRREGULAR(32,14%)

   Source_Port_in  ::= STATIC(70%) | IRREGULAR(16,30%)
   Source_Port_out ::= LSB(3,0,73%) | LSB(8,0,14%) | IRREGULAR(16,13%)

   Destination_Port_in ::= LSB(3,0,73%) | LSB(8,0,14%) | 
                           IRREGULAR(16,13%)
   Destination_Port_out  ::= STATIC(70%)| IRREGULAR(16,30%)

   Sequence_Number ::= IRREGULAR(32,100%)

   Acknowledgement_Number_in ::= IRREGULAR(32,100%)
   Acknowledgement_Number_out ::= VALUE(32,0,33%) | IRREGULAR(32,67%)

   URG_flag ::= IRREGULAR(1,100%)

   ACK_flag ::= IRREGULAR(1,100%)

   PSH_flag ::= IRREGULAR(1,100%)

   RST_SYN_FIN_flag ::= VALUE(3,2,30%) | VALUE(3,0,65%) |  
                        IRREGULAR(3,5%)

   Urgent_Pointer ::= STATIC(99%) | IRREGULAR(16,1%)

   Window_in ::= STATIC(30%)| IRREGULAR(16,70%)
   Window_out ::= STATIC(43%) | IRREGULAR(16,57%)

   Fig. 3.  Packet format specifications for CHANGING fields. 


   In Fig. 3, specifications are expressed in the notation used by EPIC-
   LITE instead of the Formal Notation [ROHC-FN] due to a number of 
   reasons. Firstly, basic encoding methods used in both remain the 
   same, and so EPIC-LITE expressions can be easily converted into 
   Formal Notation. Moreover, the equivalent of the 'multiple_packet_
   formats' encoding method in ROHC-FN, used to specify multiple 
   encoding methods for a field, can be represented in a more compact 



Cho & Hazra                                                    [Page 11]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   form using the OR operator, '|' in EPIC-LITE. Also, because EPIC-LITE 
   involves Huffman coding, it allows the expression of the probability 
   of each encoding method being successful as a parameter, which is 
   also useful for expressing the frequency of use of an encoding 
   method. Finally, it allows the packet format specifications to be 
   readily verified via context replication implementation in EPIC-LITE. 

   Details of the inter-flow behaviour of each CHANGING field are 
   elaborated in the following sub-sections.


5.1.  IPv4 Identification

   Table 3 shows the distribution of delta values in logarithmic scale.
   Note that for delta > 0, the number of bits used to encode the delta 
   may be expressed as n = ceil(log2(|delta|+1)), as we are trying to 
   find the smallest n for which delta <= 2^n - 1. For delta < 0, the 
   equivalent mapping is n = -ceil(log2(|delta|+1)).


      +--------+---------------+-----------+-----------+
      |Encoded |  Delta Range  | Incoming  | Outgoing  |
      |Bits,n  |               | Frequency | Frequency |
      +--------+---------------+-----------+-----------+
      |-16     |[-65535:-32768]|   6.0%    |   2.3%    |
      |-15     |[-32767:-16384]|   4.5%    |   2.1%    |
      |-14     |[-16383:-8192] |   2.4%    |   2.1%    |
      |-13     |[-8191:-4096]  |   1.5%    |   0.8%    |
      |-12     |[-4095:-2048]  |   0.7%    |   0.6%    |
      |-11     |[-2047:-1024]  |   0.3%    |   0.3%    |
      |-10     |[-1023:-512]   |   0.2%    |   0.1%    |
      |-9      |[-511:-256]    |   0.1%    |   0.1%    |
      |-8      |[-255:-128]    |   0.1%    |   0.1%    |
      |-7      |[-127:-64]     |   0.0%    |   0.0%    |
      |-6      |[-63:-32]      |   0.0%    |   0.0%    |
      |-5      |[-31:-16]      |   0.0%    |   0.0%    |
      |-4      |[-15:-8]       |   0.0%    |   0.0%    |
      |-3      |[-7:-4]        |   0.1%    |   0.0%    |
      |-2      |[-3:-2]        |   0.2%    |   0.2%    |
      |-1      |[-1]           |   0.6%    |   0.4%    |
      |0       |[0]            |   0.3%    |   0.0%    |
      |1       |[1]            |   23.4%   |   33.7%   |
      |2       |[2:3]          |   20.6%   |   20.8%   |
      |3       |[4:7]          |   6.6%    |   10.5%   |
      |4       |[8:15]         |   3.9%    |   4.3%    |
      |5       |[16:31]        |   3.6%    |   3.3%    |
      |6       |[32:63]        |   3.6%    |   2.4%    |
      |7       |[64:127]       |   3.4%    |   2.0%    |
      |8       |[128:255]      |   2.3%    |   1.6%    |
      |9       |[256:511]      |   2.3%    |   1.2%    |



Cho & Hazra                                                    [Page 12]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |10      |[512:1023]     |   1.7%    |   1.2%    |
      |11      |[1024:2047]    |   1.4%    |   1.1%    |
      |12      |[2048:4095]    |   0.9%    |   1.0%    |
      |13      |[4096:8191]    |   1.3%    |   1.1%    |
      |14      |[8192:16383]   |   2.5%    |   2.3%    |
      |15      |[16384:3276]   |   3.0%    |   2.4%    |
      |16      |[32768:65535]  |   2.4%    |   1.9%    |
      +--------+---------------+-----------+-----------+

   Table 3: Frequency distribution of Identification delta

   Slightly asymmetrical behaviour can be observed from Table 3.  
   æIncomingÆ replicated packets are less likely to be encoded within 3 
   bits compared to æoutgoingÆ replicated packets. Moreover, æincomingÆ 
   delta values are more distributed, with higher occurrence of negative 
   deltas as well as deltas encodable between 6 to 10 bits. This is 
   reasonable because æincomingÆ replicated packets face larger deltas 
   due to busy servers handling multiple connections simultaneously or 
   near-simultaneously.

   Inter-flow Identification deltas for æoutgoingÆ replicated packets 
   tend to be smaller than for æincomingÆ, as clients do not usually 
   maintain a large number of simultaneous or near-simultaneous TCP 
   connections.

   It should be noted that Table 3 depicts network-byte-order corrected 
   Identification deltas. Typical implementation policies of IPv4 
   Identification increment are: sequential (increments by 1), 
   sequential-jump (typically increments by 256) and random. Linux based 
   implementations usually implements the sequential policy, and older 
   versions of Microsoft Windows usually implements the sequential-jump 
   policy with a jump size of 256. This is the equivalent of 
   incrementing the more significant byte of the two-byte Identification 
   field by 1. From a compression viewpoint, sequential-jump 
   implementations can be network-byte-order corrected at the compressor 
   end and reverted back to the original form at the decompressor end. 
   This approach has the advantage of compressing Identification fields 
   generated from both policies efficiently using the same encoding 
   method. A network byte order (NBO) flag is communicated to 
   differentiate between the two policies. Randomly incremented 
   Identification implementations cannot be efficiently compressed and 
   are sent as-is.

   Current proposals for context replication compresses the 
   IPv4 Identification field into 0 or 16 bits, using VALUE and 
   IRREGULAR encoding methods respectively. The VALUE encoding method is 
   suitable for protocols like DHCP, and is not seen in Fig. 3 because 
   we are focusing on TCP/IP. However, it can be seen from the above 
   inter-flow behaviour that this field can also be compressed more 
   efficiently using LSB encoding, with recommended parameters as shown 
   in Fig. 3.


Cho & Hazra                                                    [Page 13]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


5.2. IP DonÆt Fragment Flag and Time to Live

   The DF Flag is a single bit which may be set or unset. Although it 
   may be impractical to allow multiple encoding methods for a single 
   bit field, for the sake of characterizing its behaviour, STATIC and 
   IRREGULAR encoding methods are used. The IPv4 TTL (or equivalently, 
   IPv6 Hop Limit) is a 8-bit field which remains constant when the 
   route between the two endpoints is unchanged; when the route does 
   change due to congestion, it is better to simply send the field 
   uncompressed. Therefore, DF can be further analyzed in the same 
   category as TTL: we either encode them as STATIC, or uncompressed as 
   IRREGULAR. The actual probabilities associated with each encoding 
   method based on the samples is shown in Table 4. 


      +----------------+--------+-----------+
      |Encoding Method | STATIC | IRREGULAR |
      +----------------+--------+-----------+
      |          æIncomingÆ flows           |
      +-------------------------------------+
      |DonÆt Fragment  |  72.8% |   27.2%   |
      |Time To Live    |  98.1% |    1.9%   |
      +----------------+--------+-----------+
      |          æOutgoingÆ flows           |
      +-------------------------------------+
      |DonÆt Fragment  |  98.5% |    1.5%   |
      |Time To Live    |  96.9% |    3.1%   |
      +-------------------------------------+

   Table 4: Percentage frequency of STATIC and IRREGULAR for DF and TTL


5.3.  IP Destination Address

   We have allowed for an implementation to use context replication 
   for scenarios where packets share at least the same Source IP 
   Address, but the Destination IP Address may be different. Therefore, 
   the Destination IP Address may be STATIC or IRREGULAR for these two 
   scenarios.

   The proportion of IR-CR packets replicable due to the same/different 
   Destination IP Address is of interest. This determines how effective 
   the use of context replication to cover different IP Destination
   Addresses can be. This proportion is tabulated in Table 5.









Cho & Hazra                                                    [Page 14]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      +------------+----------+-----------+
      |            |  STATIC  | IRREGULAR |
      +------------+----------+-----------+
      | 'Incoming' |  100.0%  |    0.0%   |
      | 'Outgoing' |   85.8%  |   14.2%   |
      +------------+----------+-----------+

   Table 5: Percentage frequency of STATIC and IRREGULAR for IP 
   Destination Address.

   As can be noted from Table 5, the results are skewed towards STATIC 
   (same Destination IP Address). This is because our emulator selects 
   the base context with preference for sharing the same Source and 
   Destination IP Address, although it is much easier to find contexts 
   sharing only the same Source IP Address. For some intervals, the 
   proportion of 'outgoing' IRREGULAR cases got as high as 48%.

   Asymmetry is again observed to be inherent between æincomingÆ and 
   æoutgoingÆ flows. æIncomingÆ flows originating from Internet servers 
   are not likely to engage multiple common subnet clients within a 
   short period of time. However, the converse is true for æoutgoingÆ 
   flows, corresponding to prevalent usage patterns.

   Our results also justify the virtue of an implementation which 
   considers context replication for cases even when the Destination IP 
   Address is different. This maximizes context replication efficiency 
   gains for æoutgoingÆ flows.


5.4.  TCP Source Port

   As can be seen from Table 6, clearly asymmetrical inter-flow 
   behaviour is observed for the TCP Source Port field. This behaviour 
   is seen mainly because ports at servers are well-known ports which 
   remain unchanged.

      +---------------------------------------------+
      |Encoded |  Delta Range  | Incoming |Outgoing |
      |Bits,n  |               | Frequency|Frequency|
      +--------+---------------+----------+---------+
      |-16     |[-65535:-32768]|   0.0%   |   0.0%  |
      |-15     |[-32767:-16384]|   0.0%   |   0.0%  |
      |-14     |[-16383:-8192] |   0.0%   |   0.0%  |
      |-13     |[-8191:-4096]  |   0.0%   |   0.3%  |
      |-12     |[-4095:-2048]  |   5.8%   |   0.2%  |
      |-11     |[-2047:-1024]  |   1.8%   |   0.6%  |
      |-10     |[-1023:-512]   |   0.1%   |   1.7%  |
      |-9      |[-511:-256]    |   1.0%   |   0.0%  |
      |-8      |[-255:-128]    |   0.5%   |   0.0%  |




Cho & Hazra                                                    [Page 15]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |-7      |[-127:-64]     |   0.7%   |   0.0%  |
      |-6      |[-63:-32]      |   0.7%   |   0.0%  |
      |-5      |[-31:-16]      |   0.0%   |   0.0%  |
      |-4      |[-15:-8]       |   0.0%   |   0.0%  |
      |-3      |[-7:-4]        |   0.3%   |   0.1%  |
      |-2      |[-3:-2]        |   0.0%   |   0.1%  |
      |-1      |[-1]           |   0.0%   |   2.3%  |
      |0       |[0]            |  72.0%   |  15.8%  |
      |1       |[1]            |   0.0%   |  31.9%  |
      |2       |[2:3]          |   0.0%   |  17.4%  |
      |3       |[4:7]          |   0.0%   |   7.8%  |
      |4       |[8:15]         |   0.1%   |   4.7%  |
      |5       |[16:31]        |   0.1%   |   3.3%  |
      |6       |[32:63]        |   0.3%   |   2.0%  |
      |7       |[64:127]       |   0.3%   |   3.0%  |
      |8       |[128:255]      |   0.7%   |   1.1%  |
      |9       |[256:511]      |   0.8%   |   2.7%  |
      |10      |[512:1023]     |   3.0%   |   3.2%  |
      |11      |[1024:2047]    |  10.5%   |   1.5%  |
      |12      |[2048:4095]    |   1.2%   |   0.1%  |
      |13      |[4096:8191]    |   0.0%   |   0.3%  |
      |14      |[8192:16383]   |   0.0%   |   0.0%  |
      |15      |[16384:3276]   |   0.0%   |   0.0%  |
      |16      |[32768:65535]  |   0.1%   |   0.0%  |
      +--------+---------------+----------+---------+

   Table 6: Frequency distribution of Source Port delta


5.5.  TCP Destination Port

   The inter-flow behaviour of the TCP Destination Port field is shown 
   in Table 7. It can be observed that the trend is the opposite to 
   that of the TCP Source Port presented previously. This can be 
   accounted for obviously because the Destination Ports of æoutgoingÆ 
   packets are the Source Ports of replying æincomingÆ packets.

      +--------+---------------+-----------+-----------+
      |Encoded |  Delta Range  | Incoming  | Outgoing  |
      |Bits,n  |               | Frequency | Frequency |
      +--------+---------------+-----------+-----------+
      |-16     |[-65535:-32768]|   0.0%    |   0.0%    |
      |-15     |[-32767:-16384]|   0.0%    |   0.0%    |
      |-14     |[-16383:-8192] |   0.0%    |   0.0%    |
      |-13     |[-8191:-4096]  |   0.3%    |   0.0%    |
      |-12     |[-4095:-2048]  |   0.0%    |   0.4%    |
      |-11     |[-2047:-1024]  |   0.0%    |   4.1%    |
      |-10     |[-1023:-512]   |   0.0%    |   2.0%    |
      |-9      |[-511:-256]    |   0.0%    |   0.1%    |
      |-8      |[-255:-128]    |   0.0%    |   0.9%    |
      |-7      |[-127:-64]     |   0.0%    |   0.4%    |


Cho & Hazra                                                    [Page 16]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |-6      |[-63:-32]      |   0.0%    |   0.5%    |
      |-5      |[-31:-16]      |   0.0%    |   1.9%    |
      |-4      |[-15:-8]       |   0.0%    |   0.0%    |
      |-3      |[-7:-4]        |   0.3%    |   0.0%    |
      |-2      |[-3:-2]        |   0.2%    |   0.2%    |
      |-1      |[-1]           |   6.8%    |   0.0%    |
      |0       |[0]            |  23.3%    |  74.3%    |
      |1       |[1]            |  33.4%    |   0.0%    |
      |2       |[2:3]          |   8.4%    |   0.1%    |
      |3       |[4:7]          |   6.9%    |   0.0%    |
      |4       |[8:15]         |   3.8%    |   0.1%    |
      |5       |[16:31]        |   2.8%    |   0.1%    |
      |6       |[32:63]        |   2.3%    |   0.8%    |
      |7       |[64:127]       |   3.4%    |   0.2%    |
      |8       |[128:255]      |   1.2%    |   0.4%    |
      |9       |[256:511]      |   2.7%    |   0.8%    |
      |10      |[512:1023]     |   2.4%    |   2.1%    |
      |11      |[1024:2047]    |   1.4%    |   8.2%    |
      |12      |[2048:4095]    |   0.0%    |   1.8%    |
      |13      |[4096:8191]    |   0.4%    |   0.4%    |
      |14      |[8192:16383]   |   0.0%    |   0.1%    |
      |15      |[16384:3276]   |   0.0%    |   0.0%    |
      |16      |[32768:65535]  |   0.0%    |   0.0%    |
      +--------+---------------+-----------+-----------+

   Table 7: Frequency distribution of Destination Port delta


5.6.  TCP Sequence Number and Acknowledgement Number

   The TCP Sequence Number (SEQNUM) cannot be replicated as the inter-
   flow delta is random with a uniform probability density function, 
   regardless of the direction of flow. The TCP Acknowledgement Number 
   (ACKNUM) generally follows the randomness of SEQNUM, but a particular 
   behaviour can be exploited for compression of the first packet of 
   most æoutgoingÆ flows. All handshaking packets with SYN set but ACK 
   clear (the first packet of TCP connections) carry ACKNUM with zero 
   value. This is a behaviour unique to æoutgoingÆ flows because 
   service-requesting clients typically initiate the first packet within 
   TCP connections. The first æincomingÆ packet typically carries both 
   SYN and ACK set, and ACKNUM would be non-zero. Because up to the 
   third packet of each flow may be replicated, this represents at least 
   30% to 100% of all æoutgoingÆ replicated packets. Thus, ACKNUM can at 
   worst be compressed as shown in Fig. 3.

   Alternatively, instead of basing the specifications on asymmetry, all 
   compressor-decompressor pairs can treat the SYN-set ACK-not-set case 
   as a flag to infer that the value of ACKNUM is 0. These fields are 
   already appropriately handled as prescribed in [ROHC-TCP].




Cho & Hazra                                                    [Page 17]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


5.7.  TCP Flags and Urgent Pointer

   ôTCP Flagsö refers to the TCP group of six flags: URG (Urgent), ACK 
   (Acknowledgement), PSH (Push), RST (Reset), SYN (Synchronize) and FIN 
   (Finish).

   The URG flag was not found to be set in almost our entire sample, 
   i.e. it is much more likely to be 0 than 1. In some applications, 
   however, the URG flag may be used extensively. Thus, it can be 
   encoded as IRREGULAR(1,100%). The URG flag is also useful for 
   indicating the presence of the Urgent Pointer field. The compressor-
   decompressor pair can treat this field as IRREGULAR when URG is set 
   and zero when URG is not set.

   ACK is not set only in the first handshaking packet of all 
   connections (similar to ACKNUM), as well as in some minority packets 
   with RST set. Since the proportion of IR-CR packets carrying an unset 
   ACK can range from 33% to 100%, it should be sent as 
   IRREGULAR(1,100%).

   PSH was found to be varying unpredictably between 0 and 1, and is 
   thus best left as IRREGULAR(1,100%).

   There is high correlation between RST, SYN and FIN behaviour, 
   allowing them to be encoded together. RST and FIN are not set in 
   almost 100% of replicated packets. These three flags can 
   therefore encoded as: VALUE(3,2,30%) | VALUE(3,0,65%) | 
   IRREGULAR(3,5%). Equivalently, these three flags can also be 
   encoded as prescribed in [ROHC-TCP] using the ôindexö encoding 
   method, with FIN or RST exclusively set as the two other common 
   values.


5.8.  TCP Window

   Table 8 shows the delta distribution.  For flows in both directions, 
   the main peak is at delta = 0, with amplitude 43% for æoutgoingÆ 
   replicated packets and 30% for æincomingÆ packets. We can encode 
   these cases with STATIC encoding.

      +--------+---------------+-----------+-----------+
      |Encoded |  Delta Range  | Incoming  | Outgoing  |
      |Bits,n  |               | Frequency | Frequency |
      +--------+---------------+-----------+-----------+
      |-16     |[-65535:-32768]|   0.0%    |   0.0%    |
      |-15     |[-32767:-16384]|   3.4%    |   2.8%    |
      |-14     |[-16383:-8192] |   0.2%    |   0.4%    |
      |-13     |[-8191:-4096]  |  14.0%    |   2.1%    |
      |-12     |[-4095:-2048]  |  20.7%    |   0.9%    |
      |-11     |[-2047:-1024]  |   1.3%    |   0.1%    |
      |-10     |[-1023:-512]   |   6.6%    |   1.7%    |


Cho & Hazra                                                    [Page 18]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      |-9      |[-511:-256]    |   4.4%    |   2.3%    |
      |-8      |[-255:-128]    |   4.1%    |   0.8%    |
      |-7      |[-127:-64]     |   0.6%    |   2.6%    |
      |-6      |[-63:-32]      |   0.4%    |   1.2%    |
      |-5      |[-31:-16]      |   0.2%    |   0.7%    |
      |-4      |[-15:-8]       |   0.1%    |   0.5%    |
      |-3      |[-7:-4]        |   0.1%    |   0.1%    |
      |-2      |[-3:-2]        |   0.2%    |   0.0%    |
      |-1      |[-1]           |   0.2%    |   0.0%    |
      |0       |[0]            |  30.4%    |  43.2%    |
      |1       |[1]            |   0.1%    |   0.0%    |
      |2       |[2:3]          |   0.1%    |   0.1%    |
      |3       |[4:7]          |   0.1%    |   0.1%    |
      |4       |[8:15]         |   0.1%    |   0.2%    |
      |5       |[16:31]        |   0.2%    |   0.2%    |
      |6       |[32:63]        |   0.1%    |   0.8%    |
      |7       |[64:127]       |   0.4%    |   1.7%    |
      |8       |[128:255]      |   0.2%    |   3.4%    |
      |9       |[256:511]      |   1.1%    |   4.0%    |
      |10      |[512:1023]     |   1.1%    |   6.8%    |
      |11      |[1024:2047]    |   2.0%    |   3.0%    |
      |12      |[2048:4095]    |   0.5%    |   0.1%    |
      |13      |[4096:8191]    |   2.3%    |   0.3%    |
      |14      |[8192:16383]   |   2.5%    |   3.2%    |
      |15      |[16384:3276]   |   0.1%    |   3.5%    |
      |16      |[32768:65535]  |   2.2%    |  13.1%    |
      +--------+---------------+-----------+-----------+

   Table 8: Frequency distribution of Window delta


   Unlike other fields, Window delta values tend not to cluster 
   near the main peak. This is an expected behaviour. Naturally, LSB 
   would not be a suitable encoding method for the Window field. A 
   number of secondary peaks can be observed in Table 8, which suggests 
   that Windows tend to vary among a few discontinuous but commonly 
   used values.

   We determine the most common Window values for æincomingÆ and 
   æoutgoingÆ flows separately and obtain a distribution of these 
   common Window values. This is shown in Table 9. It can 
   be observed again that asymmetry is inherent between æincomingÆ and 
   æoutgoingÆ flows. In this case, asymmetry is due to the use of a 
   different range of popular Window values between æincomingÆ and 
   æoutgoingÆ flows. æIncomingÆ advertised Window fields typically come 
   from HTTP servers sending data more than receiving data. Servers 
   typically advertise their receiver window conservatively and are slow 
   to grow their windows, to prevent data overloads from handling 
   multiple clients concurrently, and because of the congestion window 




Cho & Hazra                                                    [Page 19]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   slow start algorithm [RFC-2581]. On the other 
   hand, sources of æoutgoingÆ traffic are normally clients downloading 
   data from servers. To utilize bandwidth efficiently, the advertised 
   window is usually large, usually right from the first packet. This is 
   consistent with recent proposals for increasing the TCP initial 
   Window size [RFC-3390].


      +----------------------+----------------------+
      |       Incoming       |       Outgoing       |
      +--------+-------------+--------+-------------+
      | Value  | Probability | Value  | Probability |
      |        |     (%)     |        |     (%)     |
      +--------+-------------+--------+-------------+
      |  1380  |     1.1     |  1460  |     1.6     |
      |  1460  |    23.5     |  2920  |     1.6     |
      |  2760  |     1.3     |  8192  |     3.1     |
      |  2920  |    22.2     |  8280  |     6.6     |
      |  5840  |     2.2     | 16384  |    10.3     |
      |  8280  |    11.7     | 16560  |     8.0     |
      | 11680  |     4.9     | 64240  |    26.3     |
      | 16384  |     6.9     | 64860  |     8.8     |
      | 16560  |     2.1     | 65520  |     2.6     |
      | 65535  |     4.6     | 65535  |    18.3     |
      +--------+-------------+--------+-------------+
      | Total  |    80.4     |   -    |    87.2     |
      +--------+-------------+--------+-------------+

   Table 9: Common Window field values

   The common values of the Window field, inclusive of all category    
   values found in Table 9, can be typically expressed as either (i) a 
   multiple of the Maximum Segment Size of the end-to-end channel, or 
   (ii) a raised power of 2, with possibly an offset of 1.

   The Maximum Segment Size (MSS) is negotiated between both TCP 
   endpoints, through the TCP Options in TCP handshaking packets. The 
   negotiated MSS and is in turn derived from the IP Maximum Transfer 
   Unit (MTU) of the underlying network [RFC-1122]. The MTU over 
   Ethernet is 1500 bytes, or 1492 if used with Sub-network Attachment 
   Point (SNAP), or 1300 if used with PPP over Ethernet (for ADSL 
   links). Subtracting 40 bytes for TCP/IPv4 protocol stack, or 60 bytes 
   for the TCP/IPv6 protocol stack, or 120 bytes for maximum TCP/IP 
   header size, typically advertised MSS values are 1460, 1380, 1260, 
   1440 or 1452 bytes, in decreasing popularity. From the above set of 
   MSS values, 1460 and 1380 are used almost exclusively. Consequently, 
   almost all the Window values found in Table 9 can be expressed either 
   as multiples of 1460 or 1380. Exceptions are 8192, 16384, 65535, 
   which are raised powers of 2 with possibly offset of 1, and 65520, 
   which is a multiple of 1260.



Cho & Hazra                                                    [Page 20]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   Thus, commonly used Window values not expressible as multiples 
   of the MSS values are raised powers of 2 with possibly an offset of 
   1. From Table 9, 8192, 16384 and 65535 are 2^13, 2^14 and 2^16 - 1 
   respectively.

   Also, the TCP Window is always 0 when RST (Reset flag) is set. 
   Therefore, the decompressor can infer the Window value whenever 
   RST is set and there is no need to send it.

   The TCP Window field is used in both congestion and flow 
   control. The use of congestion control can account partly for the 
   commonly used values discussed above, as congestion control changes 
   are in multiples of the MSS. However, values due to flow control do 
   not follow the pattern discussed above but are typically small 
   offsets from the above commonly used values.

   Currently, the Window field is either encoded as STATIC or IRREGULAR 
   for context replication [ROHC-TCP]. The above observations illustrate 
   that current use of encoding methods do not sufficiently make use of 
   the unique behaviour of the Window field. It also provides the 
   motivation for devising a more efficient way of encoding the Window 
   field. This encoding method is elaborated upon in [TCP-WIN].


5.9.  TCP Checksum

   The TCP Checksum field covers the pseudo-header, payload and TCP 
   header, and varies between packets. Although ROHC packets may contain 
   a CRC field, the CRC does not cover the payload. Since it is 
   important to preserve data integrity, the Checksum field is sent 
   uncompressed as IRREGULAR (16,100%).


5.10.  TCP Options

   TCP options contain a wide variety of optional fields, but commonly 
   used options include the MSS, Window Scale and SACK-Permitted found 
   in handshaking packets. These fields do not change between replicated 
   packets and can thus be compressed efficiently as STATIC for context 
   replication.


5.11.  Mean Sizes of Compressed Fields

   Table 10 shows the TCP/IP fields found in æincomingÆ IR-CR packets  
   and calculates the mean sizes of their encoded forms. Compressed 
   TCP/IP fields take up a mean size of 107.3 bits for æincomingÆ flows. 
   By repeating the calculation based on æoutgoingÆ packet format 
   specifications, it can be shown that the mean æoutgoingÆ IR-CR size 
   is 97.5 bits.



Cho & Hazra                                                    [Page 21]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


      +---------------------+------+--------------------------+-------+
      |                     | Size |  Encoded size (bits) &   |  Mean |
      |       Field         |      |        probability       |Encoded|
      |                     |(bits)|                          |  Size |
      |                     |      |                          | (bits)|
      +---------------------+------+--------------------------+-------+
      |IPv4 Identification  |  16  | 3(50%) | 8(17%) | 16(33%)|  8.14 |
      |IPv4 DonÆt Fragment  |   1  |      0(73%) | 1(27%)     |  0.27 |
      |IPv4 Time To Live    |   8  |      0(98%) | 8(2%)      |  0.16 |
      |IPv4 Dest. Address   |  32  |      0(98%) | 32(2%)     |  0.64 |
      |TCP Source Port      |  16  |     0(70%) | 16(30%)     |  4.80 |
      |TCP Dest. Port       |  16  | 3(73%) | 8(14%) | 16(13%)|  5.39 |
      |TCP Sequence Number  |  32  |         32(100%)         |   32  |
      |TCP Ack. Num         |  32  |         32(100%)         |   32  |
      |TCP flags            |   8  |      2(95%) | 5(5%)      |  2.15 |
      |TCP Window           |  16  | 0(30%) | 6(47%) | 4(8%)  |  5.54 |
      |                     |      |        | 16(15%)         |       |
      |TCP Checksum         |  16  |         16(100%)         |   16  |
      |TCP Urgent Pointer   |  16  |      0(99%) | 16(1%)     |  0.16 |
      +---------------------+------+--------------------------+-------+
      |TOTAL                | 209  |             -            | 107.3 |
      +---------------------+------+--------------------------+-------+

   Table 10: Mean Encoded Sizes of æincomingÆ TCP/IP Fields


6.  Handling Asymmetrical Inter-flow Behaviour

   From the previous section, and as summarized in Fig. 3, some TCP/IP 
   fields exhibit inherently asymmetrical behaviour. The issue, then, is 
   to explore various ways of handling such asymmetrical behaviour such 
   that the gain versus complexity tradeoff can be optimized.

   As observable from the header compression model in Fig. 1 and 
   asymmetrical packet format specifications in Fig. 3, asymmetrical 
   inter-flow behaviour can be handled by asymmetrical header 
   compression. This can be done by configuring compressor-decompressor 
   using a different set of packet format specifications, based on their 
   'incoming' or 'outgoing' role. While this treatment has the highest 
   compression efficiency, its main disadvantage is that it may be more 
   complicated than symmetrical header compression.

   Alternatively, asymmetrical behaviour can also be handled using 
   symmetrical packet format specifcations, by expanding the use of the 
   'multiple_packet_formats' encoding method [ROHC-FN] to cover 
   asymmetrical behaviour, at the cost of using a few more 
   'discriminator bits'. This is the methodology being adopted in
   current ROHC drafts.





Cho & Hazra                                                    [Page 22]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   From Fig. 3, the fields exhibiting significant asymmetrical behaviour 
   are the IP Destination Address, TCP Source Port, Destination Port and 
   Acknowledgement Number. (The behaviour of TCP Window is in fact 
   also asymmetrical, but asymmetry cannot be expressed using current 
   encoding methods) To handle these fields symmetrically, the following 
   packet format specifications can be used instead:

   Destination_Address ::= STATIC(.) | IRREGULAR(32,.) %1 discriminator    
                                                       % bit

   Source_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) |    
                   IRREGULAR(16,.) %2 discriminator bits

   Destination_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) |    
                        IRREGULAR(16,.) %2 discriminator bits

   Acknowledgement_Number ::= VALUE(32,0,.) | IRREGULAR(32,.)    
                                                    %1 discriminator bit

   Fig. 4: Symmetrical packet format specifications for fields with    
   asymmetrical behaviour

   The asymmetrical behaviour of Window field may be handled 
   efficiently using a proposed encoding method as elaborated in [TCP-
   WIN]. This encoding method can be either symmetrical or asymmetrical.


7.  Security Considerations

   This document does not bring any new additional security    
   considerations.


8.  References

   [RFC-3390]  Allman, M., Floyd, S., Partridge, C.,. ôIncreasing TCPÆs  
               Initial Windowö, RFC 3390, October 2002.

   [RFC-3095]  Bormann, C., Burmeister, C., Degermark, M., Fukushima,
               H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T.,
               Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro,
               K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust
               Header Compression (ROHC): Framework and four profiles:
               RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001.

   [RFC-2581]  Allman, M., Paxon, V., Stevens, W., ôTCP Congestion 
               Controlö, RFC 2581, April 1999.
 
   [RFC-2234]  Crocker D, et al, "Augmented BNF for Syntax 
               Specifications: ABNF", RFC 2234, 1997.



Cho & Hazra                                                    [Page 23]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP

        
   [RFC-1122]  R. Braden, Editor, ôRequirements for Internet Hosts û 
               Communication Layersö, RFC 1122, 1989.

   [ROHC-TCP]  Pelletier, G., Zhang, Q., Jonsson, L-E., Liao, H., West,
               M., "RObust Header Compression (ROHC): TCP/IP Profile
               (ROHC-TCP)", Internet Draft (work in progress), <draft-
               ietf-rohc-tcp-04.txt>, May 2003.

   [TCP-BEH]   West, M. and S. McCann, "TCP/IP Field behavior", Internet
               Draft (work in progress), <draft-ietf-rohc-tcp-field-
               behavior-02.txt>, March 2003.

   [ROHC-CR]   Pelletier, G., "RObust Header Compression (ROHC): Context
               Replication for ROHC Profiles", Internet Draft (work in 
               progress),  <draft-ietf-rohc-context-replication-01.txt>,
               October 2003.
    
   [ROHC-FN]   "Formal Notation for Robust Header Compression
               (ROHC-FN)", R. Price et al., <draft-ietf-rohc-formal-
               notation-01.txt> (work in progress), March 2003

   [EPIC-LITE] Price, R., Hancock, R., McCann, S., Surtees, A., Ollis, 
               P., West, M., "Framework for EPIC-LITE", Internet Draft
               (work in progress), <draft-ietf-rohc-epic-lite-01.txt>, 
               2002.

   [EPIC-IMPL] L. Vidjak, M. Stula, J. Ozegovic, "Program Structures 
               for EPIC-LITE Experimental Implementation", SoftCOM 2002.

   [TCP-WIN]   Cho, C.Y., Hazra, S.K., ôEncoding Method for TCP Window 
               in Context Replicationö, Internet Draft, to be submitted.


9.  Authors' Addresses

   Chia Yuan Cho
   Institute for Infocomm Research (I2R)
   21 Heng Mui Keng Terrace
   Singapore 119613

   Phone: +65 6874 6643
   Email: stucyc2@i2r.a-star.edu.sg











Cho & Hazra                                                    [Page 24]

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


   Sukanta Kumar Hazra
   Institute for Infocomm Research (I2R)
   21 Heng Mui Keng Terrace
   Singapore 119613

   Phone: +65 6874 1953
   Email: sukanta@i2r.a-star.edu.sg














































Cho & Hazra                                                    [Page 25] 

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


Appendix A.  State Transition Threshold

   The aim of this section is to determine a reasonable range for the 
   number of initial TCP/IP packets possibly converted into IR or IR-CR 
   packets, which is defined as the State Transition Threshold.

   The compressor state machine controls the type of packet transmitted 
   to the decompressor. As elaborated in [ROHC-TCP], transition from the 
   CR state to CO state at the compressor is initiated optimistically or 
   explicitly through reception of an ROHC ACK from the decompressor. 
   Because at least 1 IR/IR-CR packet must be sent before state 
   transition, the State Transition Threshold, H is such that H: H >= 1.    
   The State Transition Threshold is different from simply the number of 
   context initializing IR/IR-CR packets sent because in uni-directional 
   mode or optimistic bidirectional mode, a single TCP/IP packet may be 
   sent as a number of duplicate IR/IR-CR packets (To allow the 
   compressor to gain the optimistism necessary for upwards transition).

   A range of suitable values for H is derived the protocol stack nature 
   and channel characteristics. For the TCP/IP protocol stack, we begin 
   by looking at the first few packets exchanged for a TCP connection. 

   Fig. 4 shows a TCP connection using TCP/IP header compression over a    
   low-bandwidth channel. Packets in the forward direction are numbered. 
   The first TCP packet is always converted into an IR/IR-CR packet. In 
   the following analysis, we focus on the compressor at the client and 
   the decompressor at the router.

   Suppose the channel is full-duplex, and an ROHC ACK is sent upon the 
   successful decompression of the first packet. ROHC ACKs may be 
   piggybacked. The earliest possible ROHC ACK sent is indicated in Fig. 
   4 as a dotted arrow. When the compressor receives the ROHC ACK, it 
   transits from IR/CR to CO state. Subsequently, it starts sending CO 
   packets instead. If the channel is reliable, then the compressor 
   receives its ROHC ACK before it sends the second TCP/IP packet and 
   only a single TCP/IP packet becomes an IR/IR-CR packet, i.e. H = 1. 
   This is also likely if the router-server RTT >> client-router RTT, 
   for which case even if the first ROHC ACK is lost, the compressor may 
   be offered ample opportunity to receive retransmitted ROHC ACKs 
   before it sends the packet #2. Conversely, if the channel is   
   unreliable, and/or if client-router RTT >> router-server RTT (as is 
   likely the case for cellular links), then it is likely that the ROCH 
   ACK is not received immediately and subsequent TCP/IP packets are 
   still sent as IR-CR packets. However, as seen from Fig. 4, the time 
   lapse between TCP/IP packet #1 and packet #4 is long compared to all 
   subsequent packets (when the TCP sliding window mechanism kicks in), 
   and it is reasonable to assume that the ROHC ACK is received before 
   packet #4 is sent. Thus, a reasonable range is 1 <= H <= 3.





Cho & Hazra                                                    [Page 26] 

Internet-document  Statistical Inter-flow Field Behaviour  February 2004
                    for Context Replication in ROHC-TCP


                  Client    Router   Server 
                    |         |         |
                SYN |--- #1   |         |
                    |   ---   |         |
                    |      -->|---      |
                    |      ...|   ---   |
                    |   ...   |      -->|
     +--  ROHC ACK  |<..      |      ---| SYN,ACK
     |  (best case) |         |   ---   |
     |              |      ---|<--      |
     |              |   ---   |         |
     |              |<--      |         |
     |          ACK |--- #2   |         |
     |              |   ---   |         |
     |      request |--- #3-->|---      |
     |              |   ---   |   ---   |
     |              |      -->|---   -->|
     | large        |         |   ---   |
     | time         |         |      -->|
     | lapse        |         |      ---| reply
     |              |         |   ---   |
     |              |      ---|<--      |
     |              |   ---   |         |
     +--(worst case)|<--      |         |
                    |--- #4   |         |
                    |   ---   |         |
                    |      -->|---      |
                    |         |   ---   |
                    |         |      -->|
               Compressor  Decompressor

                    |_________|_________|
                        Low      Wired
                     Bandwidth     or
                      Channel   Wireless

   Fig. 4: TCP handshaking and ROHC ACKs

   Finally, because TCP/IP contains bi-directional traffic, header 
   compression may occur in both directions and in this case the overall 
   state transition threshold is Ho = 2H. For uni-directional protocol 
   stacks like RTP/UDP/IP, the overall state transition threshold Ho 
   remains at H.










Cho & Hazra                                                    [Page 27]