Internet DRAFT - draft-horlait-clep

draft-horlait-clep



HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 00:26:55 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Fri, 23 Jul 1999 08:04:00 GMT
ETag: "2e9b8a-6365-379821f0"
Accept-Ranges: bytes
Content-Length: 25445
Connection: close
Content-Type: text/plain

Integrated Services over Specific Link Layers                E. Horlait
Internet Draft                                                M. Bouyer
Document: draft-horlait-clep-00.txt                  Paris 6 University
                                                              July 1999
 
 
   CLEP (Controlled Load Ethernet Protocol): Bandwidth Management and 
                 Reservation Protocol for Shared Media 
 
 
Status of this Memo 
 
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026 except that the right to 
   produce derivative works is not granted.  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups. Note that 
   other groups may also distribute working documents as Internet-
   Drafts. Internet-Drafts are draft documents valid for a maximum of 
   six months and may be updated, replaced, or obsoleted by other 
   documents at any time. It is inappropriate to use Internet-Drafts as 
   reference material or to cite them other than as "work in progress."  
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt  
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This memo is filed as <draft-horlait-clep-00.txt>, and expires Feb 
   1, 2000. Please send comments to the authors. 
    
   The protocol described in this memo is patented. 
    
1. Abstract 
    
   There are various aspects in Quality of Service management. In this 
   draft, we address the problem of bandwidth allocation and 
   reservation over shared media (e.g. an Ethernet network). In order 
   to do so, we define a protocol (CLEP: Controlled Load Ethernet 
   Protocol) in charge of the management, allocation and fair sharing 
   of the available bandwidth among users of the network.  
    
   The load control is done via token bucket filters on outgoing 
   interfaces of network elements. Our protocol efficiently manages the 
   parameters of the token buckets in order to perform admission 
   control. This service can be used alone, or with the Resources 
   Reservation Protocol RSVP [1]. 
    
   The distributed algorithm is described in section 4 and an 
   implementation framework of this proposal is given in section 3.  
    

  
Horlait, Bouyer          Expires January 2000                        1 
                      Draft-horlait-clep-00.txt             July, 1999  
 
2. Conventions used in this document 
    
   This document is based on the service defined in [2] and the service 
   specification templates given in [3]. A summary of the most 
   important definitions is given hereafter. 
    
   Quality of Service (QoS)  
        This refers to the nature of the achieved packet delivery. A 
        network offering dynamically controllable QoS will allow 
        individual applications to request packet delivery 
        characteristics that fit their needs.  
    
   Network Element  
        A Network Element (or Element), is any component of an 
        internetwork which directly handle data packets, and thus may 
        exercise QoS control over the data flow. These are, for example 
        (but are not limited to) routers, subnetworks, or end-node 
        operating systems.  
    
   Flow  
        A Flow is a set of packets all covered by the same request for 
        QoS control. This may be the packets from a single application 
        session, or the aggregation of combined traffics of several 
        application sessions.  
    
   TSpec and RSpec  
        A TSpec (for Traffic Specification), is a description of the 
        traffic pattern for which a QoS control service is requested. A 
        Service Request Specification (or RSpec), specifies a Quality 
        of Service a flow wishes to request from a network element.  
    
   QoS control Service  
        QoS control Service (or, when there is no ambiguity, Service) 
        is a named set of QoS control capabilities provided by a single 
        network element.  
    
   Token Bucket  
        A Token Bucket is a particular form of TSpec, consisting of a 
        "token rate" r and a "bucket size" b. Essentially, the r 
        parameter specifies the continually sustained data rate, and b 
        the extend to which the data rate can extend the sustained 
        level for short period of time. 
    
   Best effort traffic (or best effort flow) 
        Best effort traffic (or best effort flow) is a flow generated 
        by and application that doesn't request any special QoS control 
        service. A privileged traffic (or privileged flow) is a flow 
        which has a special QoS control requirement (e.g. in term of 
        bandwidth). 
    


  
Horlait, Bouyer          Expires January 2000                        2 
                      Draft-horlait-clep-00.txt             July, 1999  
 
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
   this document are to be interpreted as described in [4]. 
    
    
3. Controlling the load of an Ethernet shared network 
    
   In an Ethernet Bus architecture, all the transmitters share the same 
   resources. This means that a transmitter don't have any guarantee 
   about the available bandwidth for its own use, unless the other 
   transmitters of the bus restrict their throughput. A simple priority 
   queuing algorithm will not meet the requirements of the Controlled-
   Load service: if one or several transmitters starts overflowing the 
   link, the other transmitters will see their throughput fall to a 
   value close to 0. So all the transmitters on the bus must restrict 
   their maximum throughput value, to a per-transmitter value which 
   will be called R.  
    
   In any case, restricting the throughput of the transmitters will not 
   avoid collisions, nor packet lost. This implies that the offered 
   service is still a best-effort service, but if the sum of the 
   throughputs of all stations is less than the bandwidth of the link, 
   all the transmitter will statistically see a throughput close to its 
   limit value R.  
    
   There are several ways to limit the rate of a data flow. The most 
   suited method here is to use a leaky bucket or a token bucket style 
   filter. As we have to manage several flows, with different level of 
   QoS, there should be a filter per flow, and the throughput R will be 
   the sum of the rates of the different filters.  
    
   For each transmitter, there is at least one filter, for the standard 
   best-effort class of traffic plus one filter per privileged flow. As 
   the TSpec provided for the flows which require a special QoS control 
   is characterized by a token bucket filter, we propose to implement 
   the filter for these flows with a token bucket. The filter for the 
   best effort traffic will also be a token bucket. This presents the 
   interest, over a single leaky bucket, to allow bursts of traffic, 
   which minimize the effect of the bandwidth limitation for the usual 
   traffics (NFS, TCP connections, ...).  
    
   For implementation reasons, we define here a token bucket with two 
   parameters (n,t) were n is the number of token and t is the time 
   needed for a token to return to the free token pool. The relation 
   between this definition and the definition given in section 2 is:  
   b = n and r = n/t. 
    
   Packets generated by the applications are classified with respect to 
   their QoS requirements before being submitted to the filters. These 
   filters must also insure some flow conformance control. The handling 
   of best effort flows and that of privileged flows is, of course, 
   quite different. Packets from the best-effort flow are stored in a 
  
Horlait, Bouyer          Expires January 2000                        3 
                      Draft-horlait-clep-00.txt             July, 1999  
 
   queue before being submitted to the filter. If the queue overflows, 
   the packets are simply discarded.  
    
   Packets from a privileged flow have to be handled in a different way 
   as the specification of the controlled load service requires that 
   packets which don't conform to the TSpec should be handled as best-
   effort packets. However, they can't be added at the end of the best-
   effort queue, because there may be a lot of packets waiting in the 
   best-effort queue, so a non conforming packet would be delayed 
   significantly, and would probably be discarded by the receiver. So 
   these packets have to be forwarded as soon as possible, but 
   shouldn't disrupt the best effort flow. To achieve this, the 
   following algorithm is used: if a packet from a privileged flow does 
   not conform to the token bucket filter, it is forwarded as a best-
   effort packet if this doesn't create a resource shortage for the 
   best effort flow. That is to say, if there is more buckets in the 
   best-effort free bucket pool, than bytes of packets waiting in the 
   best-effort queue plus the size of the packet to be forwarded. 
   Otherwise the packet is discarded. 
    
   Some applications generate very low rate data flow, with traffic 
   bursts, but need a much better reliability than that provided by the 
   best-effort queue when the traffic exceeds the capacity of the token 
   bucket filter. Examples of such applications are routing protocols, 
   NTP or RSVP. Such protocols won't work at all with a high packet 
   loss rate. The generated flow does not require a dedicated QoS 
   handling with its own token bucket (the generated flow is, however, 
   somewhat difficult to characterize with a token bucket, because of 
   its low rate), it just requires a special priority. For this 
   purpose, two queues are needed before the best effort token bucket, 
   with different priorities. Figure 1 shows the overall architecture 
   of a network element implementing our Controlled Load Service. 
 
   Best effort    --------+     Token Bucket 
   Flow (low  -->         |---+ Filter Nbe, Tbe 
   priority)      --------+   |  +---+           --------+ 
                              +->|   |---------->        | ---> Medium 
   Best effort    --------+   |  +---+   ^ ^     --------+ 
   Flow (high -->         |---+          | | 
   Priority)      --------+              | | 
                                N1, T1   | | 
   Privileged     --------+      +---+   | | 
   Flow #1    -->         |----->|   |---+ | 
                  --------+      +---+     | 
                      .                    | 
                      .                    | 
                      .         Nn, Tn     | 
   Privileged     --------+      +---+     | 
   Flow #n    -->         |----->|   |-----+ 
                  --------+      +---+ 
    
               Figure 1: Architecture of the Network Element 
  
Horlait, Bouyer          Expires January 2000                        4 
                      Draft-horlait-clep-00.txt             July, 1999  
 
    
   It is to be noted that the maximum datagram size of the best effort 
   flow is the MTU of the link, so the Nbe parameter of the best effort 
   token bucket filter must be greater than this MTU. 
    
4. The CLEP Protocol 
    
   The Network Elements implementing the architecture described in the 
   previous section need to exchange information, in order to adjust 
   their token bucket parameters. Doing so, they are able to use the 
   maximum available bandwidth of the underlying link without exceeding 
   it. This section describes the rules used to compute the parameters 
   of the token buckets, as well as the network protocol used by the 
   network elements to keep their states consistent.  
    
   From the resource sharing among network elements point of view, 
   there are only two parameters to take into account: the amount of 
   resources allocated to best-effort flows, and the amount of 
   resources allocated to the privileged flows. These resources are 
   evaluated as allocated bandwidth, so the value exchanged by the 
   network elements are the rates of the token bucket, defined by 
   R=N/T.  
    
   To compute the reserved and available bandwidth, every network 
   element needs to know the amount of bandwidth reserved for the best-
   effort and privileged flows by all the other network elements. We 
   call Rbe and Rpriv the rate of the best effort token bucket and the 
   sum of the rates of the privileged token buckets respectively. These 
   two parameters are to be exchanged between network elements using 
   CLEP protocol. 
    
   Each network element periodically broadcasts on the link its Rbe and 
   Rpriv parameters, as well as a flag WM (Wants More) indicating the 
   need of resource and Rmin, minimum value for Rbe (this value is set 
   by the administrator of the network element). Each network element 
   keeps all the received (Rbe, Rpriv, Rmin, WM) parameters in a table 
   which is used to compute Rfree, the available bandwidth for this 
   machine. This parameter is computed as 
    
                        Rmax _ sum(Rbe + Rpriv) 
          Rfree = ----------------------------------- 
                   Number of elements with WM active 
    
   In this formula, Rmax is the total available bandwidth of the link. 
    
   Another parameter, RfreeBE is also evaluated. RfreeBE is equal to 
   Rfree if Rbe is less than the average per network element best 
   effort bandwidth available, and to Rfree-Rmax/100 otherwise. Doing 
   so, elements that use more bandwidth than the average per network 
   element bandwidth will decrease their resources consumption, where 
   others can still increase it. 
    
  
Horlait, Bouyer          Expires January 2000                        5 
                      Draft-horlait-clep-00.txt             July, 1999  
 
   When a change occurs in the table, new values are computed 
   immediately. If RfreeBE becomes negative, the network element 
   decreases its Rbe by 
    
             Rbe _ Rmin 
          ----------------- * (-RfreeBE + 0.5) 
           sum(Rbe _ Rmin) 
    
   if it is not already at its minimal value (this formula has been 
   evaluated in order to provide a fair decreasing process). After 
   computing these values, a broadcast message is send over the 
   network. As all network elements perform the same calculations, 
   Rfree becomes positive again except if all available resources are 
   still allocated. 
    
   Given this information, the admission control algorithm for a new 
   reservation (Dr) is:  
   - if Dr is less or equal than Rfree, the reservation is accepted, 
     Rpriv is increased, Rfree is decreased;  
   - if Dr is greater than sum(Rpriv + Rmin), the reservation is 
     rejected;  
   - in any other case, replace Rpriv by (Rpriv + Dr), broadcast a 
     message with these parameters; the new Rfree should be negative 
     and a decreasing process of Rbe is started; after a certain time, 
     if Rfree is still negative, the reservation is rejected; if Rfree 
     became positive, the reservation is accepted. 
    
   A race condition can appear here: if two elements request a new 
   reservation at the same time, the two reservations may fail where 
   one of the two would have succeeded. In this case, it is possible to 
   retry the reservation after a short random delay. 
    
   A network element may decide to raise its Rbe if its best effort 
   queue is (too much) overflowed. In this case, it may raise it up to 
   Rfree, depending of its own needs and that of other network 
   elements. A network element may also decrease its Rbe if the local 
   element does not use all the allocated bandwidth or to redistribute 
   the best effort bandwidth among other network elements requesting 
   more resources. This allows the network elements to dynamically use 
   the available best-effort bandwidth, and to adapt their Rbe to cope 
   with their needs. 
    
5. Architecture of a CLEP Network Element 
    
   In order to use this control method, a network element must 
   implement some dedicated functions. Mainly, token bucket filters, 
   packet classifier, CLEP daemon, signaling protocol are base 
   components of a node. Figure 2 gives an overview of the relationship 
   between these components. 
    


  
Horlait, Bouyer          Expires January 2000                        6 
                      Draft-horlait-clep-00.txt             July, 1999  
 
   The CLEP daemon is responsible of state data management and is in 
   charge of computations of token bucket parameters that it sets in 
   the system. It receives and produces CLEP messages. 
    
   Applications can send to the CLEP daemon their QoS requests via a 
   local interface. This same interface can also be used by signaling 
   protocols like RSVP that can also issue QoS requests. 
    
   The CLEP daemon sets parameter in the packet classifier in order to 
   adequately route packets from applications to the token bucket 
   filter and queue corresponding to the traffic class. 
    
   The token buckets module receives parameters from the CLEP daemon 
   and gives back to it statistics on queue length, bucket size, drop 
   statistics, and so on. 
    
    
   +---------------+ 
   | Applications  | 
   +---------------+ 
     |          |             +---------------+  Parameters 
     |          +------------>| CLEP daemon   |-----------+ 
     |  +---------------+  +->+---------------+           V 
     |  |  Signaling    |--+      |       ^         +---------------+ 
     |  +---------------+         |       +---------| Token Buckets | 
     |                            V      Statistics +---------------+ 
     |                        +---------------+              ^ 
     |                        |    Packet     |              | 
     +----------------------->|  Classifier   |--------------+ 
                              +---------------+ 
    
           Figure 2: Functional structure of a CLEP capable node 
    
   As far as implementation is concerned, Token buckets as well as 
   packet classifier are to be implemented where networking protocols 
   are, that is probably in the kernel. The CLEP daemon, signaling and 
   applications are in the user space.  
    
6. CLEP protocol Elements 
    
   CLEP protocol is using UDP port 580. The message structure is shown 
   on figure 3. All values are in network byte order. 
    
   Vers 
        Version of the protocol, currently version is one. 
   W 
        Wants More flag. 
   X 
        Exit flag. 
   Current value of Rbe 
        This value is an unsigned integer in bytes per second. 
   Value of Rmin 
  
Horlait, Bouyer          Expires January 2000                        7 
                      Draft-horlait-clep-00.txt             July, 1999  
 
        This value is an unsigned integer in bytes per second. This 
        parameter is set by the node administrator. 
   Current value of Rpriv 
        This value is an unsigned integer in bytes per second. This 
        field is used to convey the current Rpriv value or the expected 
        one in case of reservation request. 
   Value of Rmax 
        This value is an unsigned integer in bytes per second. This 
        parameter is set by the administrator of the node and must be 
        the same for all nodes. This field is used for consistency 
        check. 
    
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   +  Vers |                    Unused                         |W|X| 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   +                    Current Value of Rbe                       | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   +                        Value of Rmin                          | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   +                    Current value of Rpriv                     | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
   +                         Value of Rmax                         | 
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
    
                     Figure 3: CLEP message structure 
    
   Two timer are used for protocol control purposes: Tbroadcast and 
   Tcheck. We set Tbroadcast to 30 seconds and Tcheck to 1 second after 
   some experiments. 
    
   At startup, a network element sets its Rpriv to 0, its Rbe to Rmin, 
   sends a CLEP message, and starts listening to the UDP port. Under 
   normal circumstances, without any modifications of local parameters, 
   a CLEP message is sent every  (Tbroadcast _ Delta), where Delta is a 
   random value in the range 0-1 second. This random value is here to 
   avoid a synchronization between the network elements.  
    
   Every Tcheck, the network element checks its interfaces and increase 
   or decrease its Rbe if needed and/or allowed. It then sends the new 
   Rbe value in a CLEP message with, in case of increase, the W flag 
   set. This flag is also set if the network element requires more best 
   effort resources than currently available. 
    
   Upon CLEP message arrival, the protocol version is checked. If the 
   version number doesn't match one of the versions supported by the 
   network element, the message is dropped and an error is logged. If 
   the Rmax parameter doesn't match that of the receiving element, an 
   error message should also be logged. If the sending network element 
   is a new one (it has never sent CLEP messages before), it is added 
   to the local table, with the content of the message, otherwise the 
   content of the table is updated with the information of the incoming 
   message. The new RfreeBE is computed. If it is negative, Rbe should 
  
Horlait, Bouyer          Expires January 2000                        8 
                      Draft-horlait-clep-00.txt             July, 1999  
 
   be decreased according to the rules given in section 4. If it has 
   changed a CLEP message with the new parameters must be sent as soon 
   as possible. The parameters may not be changed before a delay of 
   Tbroadcast/2, or if RfreeBE becomes positive again. If RfreeBE stays 
   negative for more than 3*Tbroadcast/2, an error message should be 
   logged. 
    
   When a network element is to be shut down, it should send a CLEP 
   message with the X flag set, and all his parameters set to 0. If a 
   network element information in the local table has not been updated 
   (no CLEP information received from this network element) in the last 
   2Tbroadcast seconds, it should be removed from the host table, and 
   Rfree computed again. 
    
 
7. Experiments and results 
    
    
   An implementation of this controlled load service using CLEP is 
   available. The development has been carried out using the NetBSD [5] 
   operating system version 1.3 and 1.4. The interface between CLEP and 
   the ISI implementation of RSVP [6]is also running. 
    
   In parallel with the actual implementation, we have also developed a 
   simulator of this system, using NS [7] network simulator. 
    
   All this code (simulator, as well as NetBSD code) is available upon 
   request. Please contact the authors.  
    
    
8. References 
    
   1.   Braden, R., et al., Resource ReSerVation Protocol (RSVP) -- 
        Version 1 Functional Specification, 1997 , Internet Engineering 
        Task Force, RFC 2205. 
   2.   Wroclawski, J., Specification of the Controlled-Load Network 
        Element Service, 1997 , Internet Engineering Task Force, RFC 
        2211. 
   3.   Shenker, S. and J. Wroclawski, General Characterization 
        Parameters for Integrated Service Network Elements, 1997 , 
        Internet Engineering Task Force, RFC 2215. 
   4.   Bradner, S., Key words for use in RFCs to Indicate Requirement 
        Levels, 1997 , Internet Engineering Task Force, RFC 2119. 
   5.   http://www.netbsd.org, NetBSD Operating System, NetBSD Project. 
   6.   http://www.isi.edu/div7/rsvp/, RSVP, Reservation Setup 
        Protocol, USC Information Sciences Institutes. 
   7.   http://www-mash.cs.berkeley.edu/ns, Network Simulator (version 
        2), UCB/LBNL/VINT project. 
    
    


  
Horlait, Bouyer          Expires January 2000                        9 
                      Draft-horlait-clep-00.txt             July, 1999  
 
9. Acknowledgements 
    
   This protocol has been specified, developed and implemented under a 
   grant from ALCATEL CRC, France. 
    
   Thanks to Pascal Anelli from Universite Pierre et Marie Curie, 
   Laboratoire LIP6 who develop the simulation model of CLEP. 
    
    
    
10. Authors' addresses 
    
   Eric Horlait 
   Universite Pierre et Marie Curie 
   Laboratoire LIP6 
   8, rue du Capitaine Scott 
   75015 PARIS 
   France 
   Email: Eric.Horlait@lip6.fr 
    
   Manuel Bouyer 
   Universite Pierre et Marie Curie 
   Laboratoire LIP6 
   8, rue du Capitaine Scott 
   75015 PARIS 
   France 
   Email: Manuel.Bouyer@lip6.fr  
    
    























  
Horlait, Bouyer          Expires January 2000                       10