Internet DRAFT - draft-horlait-clep
draft-horlait-clep
HTTP/1.1 200 OK
Date: Tue, 09 Apr 2002 00:26:55 GMT
Server: Apache/1.3.20 (Unix)
Last-Modified: Fri, 23 Jul 1999 08:04:00 GMT
ETag: "2e9b8a-6365-379821f0"
Accept-Ranges: bytes
Content-Length: 25445
Connection: close
Content-Type: text/plain
Integrated Services over Specific Link Layers E. Horlait
Internet Draft M. Bouyer
Document: draft-horlait-clep-00.txt Paris 6 University
July 1999
CLEP (Controlled Load Ethernet Protocol): Bandwidth Management and
Reservation Protocol for Shared Media
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026 except that the right to
produce derivative works is not granted.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of
six months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This memo is filed as <draft-horlait-clep-00.txt>, and expires Feb
1, 2000. Please send comments to the authors.
The protocol described in this memo is patented.
1. Abstract
There are various aspects in Quality of Service management. In this
draft, we address the problem of bandwidth allocation and
reservation over shared media (e.g. an Ethernet network). In order
to do so, we define a protocol (CLEP: Controlled Load Ethernet
Protocol) in charge of the management, allocation and fair sharing
of the available bandwidth among users of the network.
The load control is done via token bucket filters on outgoing
interfaces of network elements. Our protocol efficiently manages the
parameters of the token buckets in order to perform admission
control. This service can be used alone, or with the Resources
Reservation Protocol RSVP [1].
The distributed algorithm is described in section 4 and an
implementation framework of this proposal is given in section 3.
Horlait, Bouyer Expires January 2000 1
Draft-horlait-clep-00.txt July, 1999
2. Conventions used in this document
This document is based on the service defined in [2] and the service
specification templates given in [3]. A summary of the most
important definitions is given hereafter.
Quality of Service (QoS)
This refers to the nature of the achieved packet delivery. A
network offering dynamically controllable QoS will allow
individual applications to request packet delivery
characteristics that fit their needs.
Network Element
A Network Element (or Element), is any component of an
internetwork which directly handle data packets, and thus may
exercise QoS control over the data flow. These are, for example
(but are not limited to) routers, subnetworks, or end-node
operating systems.
Flow
A Flow is a set of packets all covered by the same request for
QoS control. This may be the packets from a single application
session, or the aggregation of combined traffics of several
application sessions.
TSpec and RSpec
A TSpec (for Traffic Specification), is a description of the
traffic pattern for which a QoS control service is requested. A
Service Request Specification (or RSpec), specifies a Quality
of Service a flow wishes to request from a network element.
QoS control Service
QoS control Service (or, when there is no ambiguity, Service)
is a named set of QoS control capabilities provided by a single
network element.
Token Bucket
A Token Bucket is a particular form of TSpec, consisting of a
"token rate" r and a "bucket size" b. Essentially, the r
parameter specifies the continually sustained data rate, and b
the extend to which the data rate can extend the sustained
level for short period of time.
Best effort traffic (or best effort flow)
Best effort traffic (or best effort flow) is a flow generated
by and application that doesn't request any special QoS control
service. A privileged traffic (or privileged flow) is a flow
which has a special QoS control requirement (e.g. in term of
bandwidth).
Horlait, Bouyer Expires January 2000 2
Draft-horlait-clep-00.txt July, 1999
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in [4].
3. Controlling the load of an Ethernet shared network
In an Ethernet Bus architecture, all the transmitters share the same
resources. This means that a transmitter don't have any guarantee
about the available bandwidth for its own use, unless the other
transmitters of the bus restrict their throughput. A simple priority
queuing algorithm will not meet the requirements of the Controlled-
Load service: if one or several transmitters starts overflowing the
link, the other transmitters will see their throughput fall to a
value close to 0. So all the transmitters on the bus must restrict
their maximum throughput value, to a per-transmitter value which
will be called R.
In any case, restricting the throughput of the transmitters will not
avoid collisions, nor packet lost. This implies that the offered
service is still a best-effort service, but if the sum of the
throughputs of all stations is less than the bandwidth of the link,
all the transmitter will statistically see a throughput close to its
limit value R.
There are several ways to limit the rate of a data flow. The most
suited method here is to use a leaky bucket or a token bucket style
filter. As we have to manage several flows, with different level of
QoS, there should be a filter per flow, and the throughput R will be
the sum of the rates of the different filters.
For each transmitter, there is at least one filter, for the standard
best-effort class of traffic plus one filter per privileged flow. As
the TSpec provided for the flows which require a special QoS control
is characterized by a token bucket filter, we propose to implement
the filter for these flows with a token bucket. The filter for the
best effort traffic will also be a token bucket. This presents the
interest, over a single leaky bucket, to allow bursts of traffic,
which minimize the effect of the bandwidth limitation for the usual
traffics (NFS, TCP connections, ...).
For implementation reasons, we define here a token bucket with two
parameters (n,t) were n is the number of token and t is the time
needed for a token to return to the free token pool. The relation
between this definition and the definition given in section 2 is:
b = n and r = n/t.
Packets generated by the applications are classified with respect to
their QoS requirements before being submitted to the filters. These
filters must also insure some flow conformance control. The handling
of best effort flows and that of privileged flows is, of course,
quite different. Packets from the best-effort flow are stored in a
Horlait, Bouyer Expires January 2000 3
Draft-horlait-clep-00.txt July, 1999
queue before being submitted to the filter. If the queue overflows,
the packets are simply discarded.
Packets from a privileged flow have to be handled in a different way
as the specification of the controlled load service requires that
packets which don't conform to the TSpec should be handled as best-
effort packets. However, they can't be added at the end of the best-
effort queue, because there may be a lot of packets waiting in the
best-effort queue, so a non conforming packet would be delayed
significantly, and would probably be discarded by the receiver. So
these packets have to be forwarded as soon as possible, but
shouldn't disrupt the best effort flow. To achieve this, the
following algorithm is used: if a packet from a privileged flow does
not conform to the token bucket filter, it is forwarded as a best-
effort packet if this doesn't create a resource shortage for the
best effort flow. That is to say, if there is more buckets in the
best-effort free bucket pool, than bytes of packets waiting in the
best-effort queue plus the size of the packet to be forwarded.
Otherwise the packet is discarded.
Some applications generate very low rate data flow, with traffic
bursts, but need a much better reliability than that provided by the
best-effort queue when the traffic exceeds the capacity of the token
bucket filter. Examples of such applications are routing protocols,
NTP or RSVP. Such protocols won't work at all with a high packet
loss rate. The generated flow does not require a dedicated QoS
handling with its own token bucket (the generated flow is, however,
somewhat difficult to characterize with a token bucket, because of
its low rate), it just requires a special priority. For this
purpose, two queues are needed before the best effort token bucket,
with different priorities. Figure 1 shows the overall architecture
of a network element implementing our Controlled Load Service.
Best effort --------+ Token Bucket
Flow (low --> |---+ Filter Nbe, Tbe
priority) --------+ | +---+ --------+
+->| |----------> | ---> Medium
Best effort --------+ | +---+ ^ ^ --------+
Flow (high --> |---+ | |
Priority) --------+ | |
N1, T1 | |
Privileged --------+ +---+ | |
Flow #1 --> |----->| |---+ |
--------+ +---+ |
. |
. |
. Nn, Tn |
Privileged --------+ +---+ |
Flow #n --> |----->| |-----+
--------+ +---+
Figure 1: Architecture of the Network Element
Horlait, Bouyer Expires January 2000 4
Draft-horlait-clep-00.txt July, 1999
It is to be noted that the maximum datagram size of the best effort
flow is the MTU of the link, so the Nbe parameter of the best effort
token bucket filter must be greater than this MTU.
4. The CLEP Protocol
The Network Elements implementing the architecture described in the
previous section need to exchange information, in order to adjust
their token bucket parameters. Doing so, they are able to use the
maximum available bandwidth of the underlying link without exceeding
it. This section describes the rules used to compute the parameters
of the token buckets, as well as the network protocol used by the
network elements to keep their states consistent.
From the resource sharing among network elements point of view,
there are only two parameters to take into account: the amount of
resources allocated to best-effort flows, and the amount of
resources allocated to the privileged flows. These resources are
evaluated as allocated bandwidth, so the value exchanged by the
network elements are the rates of the token bucket, defined by
R=N/T.
To compute the reserved and available bandwidth, every network
element needs to know the amount of bandwidth reserved for the best-
effort and privileged flows by all the other network elements. We
call Rbe and Rpriv the rate of the best effort token bucket and the
sum of the rates of the privileged token buckets respectively. These
two parameters are to be exchanged between network elements using
CLEP protocol.
Each network element periodically broadcasts on the link its Rbe and
Rpriv parameters, as well as a flag WM (Wants More) indicating the
need of resource and Rmin, minimum value for Rbe (this value is set
by the administrator of the network element). Each network element
keeps all the received (Rbe, Rpriv, Rmin, WM) parameters in a table
which is used to compute Rfree, the available bandwidth for this
machine. This parameter is computed as
Rmax _ sum(Rbe + Rpriv)
Rfree = -----------------------------------
Number of elements with WM active
In this formula, Rmax is the total available bandwidth of the link.
Another parameter, RfreeBE is also evaluated. RfreeBE is equal to
Rfree if Rbe is less than the average per network element best
effort bandwidth available, and to Rfree-Rmax/100 otherwise. Doing
so, elements that use more bandwidth than the average per network
element bandwidth will decrease their resources consumption, where
others can still increase it.
Horlait, Bouyer Expires January 2000 5
Draft-horlait-clep-00.txt July, 1999
When a change occurs in the table, new values are computed
immediately. If RfreeBE becomes negative, the network element
decreases its Rbe by
Rbe _ Rmin
----------------- * (-RfreeBE + 0.5)
sum(Rbe _ Rmin)
if it is not already at its minimal value (this formula has been
evaluated in order to provide a fair decreasing process). After
computing these values, a broadcast message is send over the
network. As all network elements perform the same calculations,
Rfree becomes positive again except if all available resources are
still allocated.
Given this information, the admission control algorithm for a new
reservation (Dr) is:
- if Dr is less or equal than Rfree, the reservation is accepted,
Rpriv is increased, Rfree is decreased;
- if Dr is greater than sum(Rpriv + Rmin), the reservation is
rejected;
- in any other case, replace Rpriv by (Rpriv + Dr), broadcast a
message with these parameters; the new Rfree should be negative
and a decreasing process of Rbe is started; after a certain time,
if Rfree is still negative, the reservation is rejected; if Rfree
became positive, the reservation is accepted.
A race condition can appear here: if two elements request a new
reservation at the same time, the two reservations may fail where
one of the two would have succeeded. In this case, it is possible to
retry the reservation after a short random delay.
A network element may decide to raise its Rbe if its best effort
queue is (too much) overflowed. In this case, it may raise it up to
Rfree, depending of its own needs and that of other network
elements. A network element may also decrease its Rbe if the local
element does not use all the allocated bandwidth or to redistribute
the best effort bandwidth among other network elements requesting
more resources. This allows the network elements to dynamically use
the available best-effort bandwidth, and to adapt their Rbe to cope
with their needs.
5. Architecture of a CLEP Network Element
In order to use this control method, a network element must
implement some dedicated functions. Mainly, token bucket filters,
packet classifier, CLEP daemon, signaling protocol are base
components of a node. Figure 2 gives an overview of the relationship
between these components.
Horlait, Bouyer Expires January 2000 6
Draft-horlait-clep-00.txt July, 1999
The CLEP daemon is responsible of state data management and is in
charge of computations of token bucket parameters that it sets in
the system. It receives and produces CLEP messages.
Applications can send to the CLEP daemon their QoS requests via a
local interface. This same interface can also be used by signaling
protocols like RSVP that can also issue QoS requests.
The CLEP daemon sets parameter in the packet classifier in order to
adequately route packets from applications to the token bucket
filter and queue corresponding to the traffic class.
The token buckets module receives parameters from the CLEP daemon
and gives back to it statistics on queue length, bucket size, drop
statistics, and so on.
+---------------+
| Applications |
+---------------+
| | +---------------+ Parameters
| +------------>| CLEP daemon |-----------+
| +---------------+ +->+---------------+ V
| | Signaling |--+ | ^ +---------------+
| +---------------+ | +---------| Token Buckets |
| V Statistics +---------------+
| +---------------+ ^
| | Packet | |
+----------------------->| Classifier |--------------+
+---------------+
Figure 2: Functional structure of a CLEP capable node
As far as implementation is concerned, Token buckets as well as
packet classifier are to be implemented where networking protocols
are, that is probably in the kernel. The CLEP daemon, signaling and
applications are in the user space.
6. CLEP protocol Elements
CLEP protocol is using UDP port 580. The message structure is shown
on figure 3. All values are in network byte order.
Vers
Version of the protocol, currently version is one.
W
Wants More flag.
X
Exit flag.
Current value of Rbe
This value is an unsigned integer in bytes per second.
Value of Rmin
Horlait, Bouyer Expires January 2000 7
Draft-horlait-clep-00.txt July, 1999
This value is an unsigned integer in bytes per second. This
parameter is set by the node administrator.
Current value of Rpriv
This value is an unsigned integer in bytes per second. This
field is used to convey the current Rpriv value or the expected
one in case of reservation request.
Value of Rmax
This value is an unsigned integer in bytes per second. This
parameter is set by the administrator of the node and must be
the same for all nodes. This field is used for consistency
check.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ Vers | Unused |W|X|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ Current Value of Rbe |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ Value of Rmin |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ Current value of Rpriv |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ Value of Rmax |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3: CLEP message structure
Two timer are used for protocol control purposes: Tbroadcast and
Tcheck. We set Tbroadcast to 30 seconds and Tcheck to 1 second after
some experiments.
At startup, a network element sets its Rpriv to 0, its Rbe to Rmin,
sends a CLEP message, and starts listening to the UDP port. Under
normal circumstances, without any modifications of local parameters,
a CLEP message is sent every (Tbroadcast _ Delta), where Delta is a
random value in the range 0-1 second. This random value is here to
avoid a synchronization between the network elements.
Every Tcheck, the network element checks its interfaces and increase
or decrease its Rbe if needed and/or allowed. It then sends the new
Rbe value in a CLEP message with, in case of increase, the W flag
set. This flag is also set if the network element requires more best
effort resources than currently available.
Upon CLEP message arrival, the protocol version is checked. If the
version number doesn't match one of the versions supported by the
network element, the message is dropped and an error is logged. If
the Rmax parameter doesn't match that of the receiving element, an
error message should also be logged. If the sending network element
is a new one (it has never sent CLEP messages before), it is added
to the local table, with the content of the message, otherwise the
content of the table is updated with the information of the incoming
message. The new RfreeBE is computed. If it is negative, Rbe should
Horlait, Bouyer Expires January 2000 8
Draft-horlait-clep-00.txt July, 1999
be decreased according to the rules given in section 4. If it has
changed a CLEP message with the new parameters must be sent as soon
as possible. The parameters may not be changed before a delay of
Tbroadcast/2, or if RfreeBE becomes positive again. If RfreeBE stays
negative for more than 3*Tbroadcast/2, an error message should be
logged.
When a network element is to be shut down, it should send a CLEP
message with the X flag set, and all his parameters set to 0. If a
network element information in the local table has not been updated
(no CLEP information received from this network element) in the last
2Tbroadcast seconds, it should be removed from the host table, and
Rfree computed again.
7. Experiments and results
An implementation of this controlled load service using CLEP is
available. The development has been carried out using the NetBSD [5]
operating system version 1.3 and 1.4. The interface between CLEP and
the ISI implementation of RSVP [6]is also running.
In parallel with the actual implementation, we have also developed a
simulator of this system, using NS [7] network simulator.
All this code (simulator, as well as NetBSD code) is available upon
request. Please contact the authors.
8. References
1. Braden, R., et al., Resource ReSerVation Protocol (RSVP) --
Version 1 Functional Specification, 1997 , Internet Engineering
Task Force, RFC 2205.
2. Wroclawski, J., Specification of the Controlled-Load Network
Element Service, 1997 , Internet Engineering Task Force, RFC
2211.
3. Shenker, S. and J. Wroclawski, General Characterization
Parameters for Integrated Service Network Elements, 1997 ,
Internet Engineering Task Force, RFC 2215.
4. Bradner, S., Key words for use in RFCs to Indicate Requirement
Levels, 1997 , Internet Engineering Task Force, RFC 2119.
5. http://www.netbsd.org, NetBSD Operating System, NetBSD Project.
6. http://www.isi.edu/div7/rsvp/, RSVP, Reservation Setup
Protocol, USC Information Sciences Institutes.
7. http://www-mash.cs.berkeley.edu/ns, Network Simulator (version
2), UCB/LBNL/VINT project.
Horlait, Bouyer Expires January 2000 9
Draft-horlait-clep-00.txt July, 1999
9. Acknowledgements
This protocol has been specified, developed and implemented under a
grant from ALCATEL CRC, France.
Thanks to Pascal Anelli from Universite Pierre et Marie Curie,
Laboratoire LIP6 who develop the simulation model of CLEP.
10. Authors' addresses
Eric Horlait
Universite Pierre et Marie Curie
Laboratoire LIP6
8, rue du Capitaine Scott
75015 PARIS
France
Email: Eric.Horlait@lip6.fr
Manuel Bouyer
Universite Pierre et Marie Curie
Laboratoire LIP6
8, rue du Capitaine Scott
75015 PARIS
France
Email: Manuel.Bouyer@lip6.fr
Horlait, Bouyer Expires January 2000 10